Millions of Shoppers + Scanning Millions of Product Barcodes + Millions of Retail Store Locations + More than 40,000 Retailers Delivering Inventory and Price Information = BIG DATA.
ShopSavvy is now three years old and during that time has become the largest mobile shopping community in the world. Delivering real-time pricing, inventory, reviews and product information to millions of users, for millions of products from thousands of retailers is hard work – perhaps harder than it sounds. Late last year we secured a $7M investment lead by Facebook Co-founder Eduardo Saverin with the goal of making our platform more social. Layering a set of social graph data over our existing product data entailed the creation of an entirely new platform that could leverage the latest ‘Big Data’ tools like Cassandra, Hadoop and Mahout – ENTER ‘ProductCloud’.
ProductCloud is ShopSavvy’s Big Data solution for real-time social product data. Before the release of ProductCloud we used traditional LAMP tools for data – tools like MySQL. By Black Friday we had more than 1 billion rows in MySQL – the realistic limit of the technology. Our team has feverishly been building ProductCloud and I am pleased to announce that it is going live as I write this blog post.
Using Cassandra with Hadoop for data storage and processing, ProductCloud is able to iterate over a MASSIVE dataset on demand while concurrently serving data requests and meeting our various internal and external SLAs for companies like CNET, About.com, PriceGrabber and Walmart. Cassandra’s built-in replication suite allows ProductCloud to maintain load during peek times (such as Black Friday or Cyber Monday) as well as during hardware failures across multiple machines, racks or even data centers. ProductCloud maintains HUGE histories of prices, products, scans and locations that number in the hundreds of billions of items.
These Big Data tools make it possible to leverage the massive amount of data we store in order to constantly create new analytics to provide actionable insights into the retail shopping business. ProductCloud’s open architecture allows our team to layer tools like Mahout on top of our platform to enable new features like price prediction, user recommendations, product categorization and product resolution. What sort of data are we talking about? Here are a few stats that boggle the mind (well at least ours):
- More than 240,000,000 product pictures and user action shots
- More than 3,040,000,000 product attributes (color, size, features and so on)
- More than 14,720,000,000 prices from retailers
- More than 100 price requests from ShopSavvy users per second
With the release of ProductCloud our team has laid the groundwork for the ‘socialization’ of ShopSavvy. Adding a social layer to such a rich set of data – data that must be accessed on a real-time basis wasn’t possible before the creation of the Big Data tools found in ProductCloud. Look for social features later this quarter.