You are your Metadata: Identification and Obfuscation of Social Media Users using Metadata Information

More research that shows how precise identification cane be on metadata alone.

We demonstrate that through the application of a supervised learning algorithm, we are able to identify any user in a group of 10,000 with approximately 96.7% accuracy. Moreover, if we broaden the scope of our search and consider the 10 most likely candidates we increase the accuracy of the model to 99.22%. We also found that data obfuscation is hard and ineffective for this type of data: even after perturbing 60% of the training data, it is still possible to classify users with an accuracy higher than 95%.

I still don’t think this message is widely understood. When people are told that only metadata is kept on their activities, they assume some level of anonymity. You should assume none.

Posted on July 12, 2018

