Collaborative Filtering at


The authors note: has more than 29 million customers and several million catalog items. Other major retailers have comparably large data sources. While all this data offers opportunity, it's also a curse, breaking the backs of algorithms designed for data sets three orders of magnitude smaller.

The authors present the more traditional collaborative filtering and cluster models methods. These methods, unfortunately, tend to perform poorly on the scale that needs. Techniques available to speed up these methods degrade the results. uses a technique they call item-to-item collaborative filtering. The key to their method is to perform the computationally expensive portion--a product-to-product matrix--offline. This allows high quality recommendations to be generated quickly, in real time.

The IEEE offers the article for purchase online, but it's a little pricey ($18 for non-members). If you want to read it, you may wish to visit the engineering library of your friendly neighborhood university. Or, if you are local, drop me a note and I'll share my copy.


Comments have been closed for this entry.

re: Collaborative Filtering at

I brought up "fake" amazon recommendations last night at the meetup... here are a couple links about them.

Guy faked his own recommendations

The "joke" clothing recommendations

If you read down to the bottom of the WSJ article, it seems pretty clear that the "joke" recommendations started seriously, and they used the joke angle to take the heat off.