The data Facebook doesn't want you to know about

Like many Facebook users around the world, we at Privacy International have been wondering whether any of our own data was sold to Cambridge Analytica. Today, we finally found out: at least one of our staff member’s data was sold. Even though they had never logged into an app called “This Is Your Digital Life” and even though they never agreed or consented to the company’s terms and conditions, our staff member’s data was likely sold to a company that worked on influencing elections around the world. The only reason they are in this dataset, is not because of anything they have done, but because years ago, one of their Facebook friends logged into an app.

All of this is pretty outrageous in and of itself, but it also raises more serious questions about the way that Facebook thinks about privacy on a much more fundamental level. Facebook likes to talk about privacy settings, and people’s control over the content they put on the platform. But Facebook doesn’t like to talk about how exactly this data can be used to profile and target people.

In both hearings before the Senate, Zuckerberg only mentioned two kinds of data: the information that people decide to share on the platform, and the data that is automatically collected about people’s behavior. But there’s a third kind of data: data that is derived, inferred, or predicted from the data that people share and that is recorded about their behavior.

This third type of data can reveal shockingly invasive insights. In contrast to the information that users (more or less) willingly share, the vast majority of users have no idea that such data even exists.

There is no specific field on Facebook that prompts you to type your personality profile, or other psychometric data, yet Cambridge Analytica claims to have possessed such information on millions of people.

Yet, such insights can be generated when data on millions of people are analysed. This is exactly the kind of data that Facebook doesn’t like to mention — and that is also strikingly absent from the user notification we received.

We still don’t know 100% how Cambridge Analytica proccessed the data it obtained — but we do know how profiling based on social media data works. One tool that seeks to allow users to better understand personality prediction was a Facebook app called “ApplyMagicSauce” which was also developed at the University of Cambridge. In fact, in 2014, one of our staff members was one of the many participants that used the app.

The app asked for: our staff member’s public profile, pages they liked, their date of birth and current city. The app then uses that data to predict a huge range of details about our staff member: their gender, their sexual orientation, their political leaning, their intelligence, and also their personality profile.

All of this is information that she had not disclosed or shared directly. Now this is important: even users who are very guarded about their privacy, and would, for instance, never disclose their sexual orientation or political leanings on social media, can still be profiled as these very categories. In the US legal context, the AI expert Kate Crawford has called this “predictive privacy harms”. In European data protection law it’s called profiling.

Profiling practices and the predictive privacy harms they entail are widespread. When Facebook says that it targets people based on the information they share, this is only half the picture. Targeting is not just based the information that people share and that the company records about their behavior — it is also based on the hidden patterns that these data reveal.

Here’s a very simple example: in the U.K. Facebook allows advertisers to marked products to “Commuters.” There is no box on Facebook where people can declare themselves as “commuters” — this is clearly a category that is derived or inferred from data that Facebook automatically collects about people.

Here’s a more challenging example: Facebook’s widely criticized research that showed how advertisers could theoretically target vulnerable teens that feel “stressed,” “defeated,” “overwhelmed,” “anxious,” “nervous,” “stupid,” “silly,” “useless” and a “failure” also shows the power of profiling. Teens on Facebook didn’t tick a box that says “I’m feeling overwhelmed” — these deeply intimate insights were revealed from other data.

As you can see — these are probabilistic estimates, and some of them are factually inaccurate. For instance, our staff member doesn’t identify as male and the test didn’t quite get their academic background.

While these predictions are not 100% accurate (and wrong data can be just as harmful as accurate data, if it gets into the wrong hands), on average, they are accurate enough to reveal uncannily detailed insights. Many of these data points are actually judgements about character. Such data that is potentially interesting to all sorts of organisations. For instance, a high level of neuroticism would not look good for certain kinds of job applications. A low level of conscientiousness, certainly doesn’t look good for someone who wants to get access to finance or credit.

Social media data is not the only data that’s valuable for profiling. All sorts of data can be used to predict all sorts of intimate insights. Here are some examples: With a high level of precision, researchers were able to use knowledge of installed smartphone apps to figure out users’ personal information, including “religion, relationship status, spoken languages, countries of interest, and whether or not the user is a parent of small children” as well as gender. In a study that tracked the cell phone usage (bluetooth, call logs, and SMS) of 26 couples, researchers were able to predict the spending behavior of those couples. We’ve complied more examples here.

Such profiling is dangerous because it completely exceeds individual control. It’s also dangerous because people don’t know how they have been profiled — or whether their profile is biased, wrong or otherwise unfair. An accurate profile can reveal details, like our sexual orientation, that we never decided to disclose in the first place. An inaccurate profile, that misidentifies or misclassifies us can have equally harmful implications.

Most importantly though, it is such in-depth level insights that make targeted advertising (during elections and beyond) so problematic. The more you know about someone the better you can persuade, influence and target them.

Privacy settings alone won’t address the problems that profiling poses. That is why regulating profiling, is perhaps one of the most pressing privacy issue of our time. Using advanced processing techniques, and increasingly also AI methods like machine learning, the scope of what can be predicted from what kinds of data is increasing.

In fact, European regulators have been quite clear about the fact that personal data can be revealed from other data. The U.K.’s Information Commissioner’s Office, for instance, talks about derived, inferred, and predicted data. Also, the upcoming GDPR clearly defines the process of creating such data as profiling. Under EU data protection law, you have to be transparent and have a clear legal basis to process people’s personal data. The restrictions on processing people’s sensitive personal data, including people’s political and religous beliefs, ethnicity or sexual orienttion and health, are especailly stringent. This means that there are limits on how people can be profiled, especially when profiling reveals sensitive data.

In the early days of this ongoing scandal, Facebook indicated that it might apply European data protection standards globally. Together with over 75 consumer and privacy organiations in Europe and the US, we asked for Facebook to fully commit. However, during last night’s hearing, Zuckerberg’s comments were dissapointingly vague, focussing on content, not data.

GDPR is not only about the content that people share online. It is about the way in which organizations process personal data. If Facebook is truly committed to protecting people’s privacy, the company should set an example, by adhering to highest data protection standards for all users.

Privacy International is a UK-based nonprofit working at the intersection of modern technology and rights.

Originally published at https://medium.com/@privacyint/what-zuckerberg-forgot-to-mention-profiling-7b7c596b9823 on

Commentary

The data Facebook doesn't want you to know about

This third type of data can reveal shockingly invasive insights. In contrast to the information that users (more or less) willingly share, the vast majority of users have no idea that such data even exists.