Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

With another dataset that is non-anonymous, you can cross-verify with the anonymous data to increase your confidence of who they are. This happened when Netflix released anonymized user ratings. Researchers were able to deanonymize some users by using IMDB ratings.

https://www.wired.com/2007/12/why-anonymous-data-sometimes-i...



Similar thing happened a while back when AOL released 'anonymized' search data - https://en.wikipedia.org/wiki/AOL_search_data_leak

I dug through it just out of morbid curiosity and unfortunately stumbled upon a few searches indicating that a friend privately suffered a miscarriage. This was only possible because I recognized her as the only person that I knew at the intersection of a few other search terms that were correlated to the same 'anonymized' id.

GP is assuming a naive starting point when that is rarely the case.


Youre making up a mythical insecure map of useful info to worthless information. This is not a compelling reason to treat all anonymized data as useful or vulnerable.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: