Netflix prize offers more than cash
Last year Netflix offered a prize of $1 million to anyone to improve their recommendations to user by 10% or better. Now it seems the gave out more than expected.
When the online DVD service Netflix offered its $1 million prize gave out real data from its users to help improve its recommendation feature on its site. When a user rates a DVD the feature offers other movies that user may like depending on genre, actor, director, etc.
To make sure the contestants’ algorithm would work in the real world data Netflix provided data over 100 million movie ratings made by 500,000 users. To be careful Netflix first asked users if it was OK to use their data for this project than anonymized the personal details of the user.
Now it turns out that two men from the University of Texas at Austin have de-anonymized the data. By cross-referencing data from sources other sites and the Netflix data they were able to identify the anonymous Netflix user.
Aside from popular movies people have unique tastes and tend to share them on sites like IMDB, Amazon, and others that have a similar ratings systems. Sometimes those users rated and review films on similar dates, narrowing the fields.
The duo from Texas explained:
“Given a user’s public IMDb ratings, which the user posted voluntarily to selectively reveal some of his (or her; but we’ll use the male pronoun without loss of generality) movie likes and dislikes, we discover all the ratings that he entered privately into the Netflix system, presumably expecting that they will remain private.”
Be aware that when you share something online you are not just sharing it with “friends” on not just on that site. Your data can be cross-referenced then your anonymity will no longer be as strong as you thought.