If there’s one thing we learned from the past week’s marathon congressional inquisition of
it’s this: The inner workings of Facebook’s data-harvesting behemoth are so byzantine, that in some ways Mr. Zuckerberg is just as confused as the rest of us about how it all works.
In his testimony before the U.S. House of Representatives Wednesday, Mr. Zuckerberg said Facebook doesn’t store a history of websites its users have visited.
Mr. Zuckerberg later corrected himself, saying Facebook does in fact store a list of visited websites that include Facebook’s tracking code. He added that the list is held temporarily before being converted into “a set of ad interests.”
When testifying before the Senate Tuesday, Mr. Zuckerberg said, “I think everyone should have control over how their information is used.” He also said, “You have full access to understand all—every piece of information that Facebook might know about you—and you can get rid of all of it.”
Not exactly. There are important classes of information Facebook collects on us that we can’t control. We don’t get to “opt in” or remove every specific piece. Often, we aren’t even informed of their existence—except in the abstract—and we aren’t shown how the social network uses this harvested information.
What else Facebook knows
The website log is a good example, in part because of its sheer mass. The browsing histories of hundreds of millions—possibly billions—of people are gathered by a variety of advertising trackers, which Facebook has been offering to web publishers ever since it introduced the “Like” button in 2009. They’ve become, as predicted, a nearly web-wide system for tracking all users—even when you don’t click the button.
When you request and download your data from Facebook—a feature Mr. Zuckerberg repeatedly referred to in answers to questions about control—this stored browsing history isn’t there.
That is reasonable, says Antonio Garcia-Martinez, a former Facebook ad-targeting product manager and current Facebook gadfly. Facebook targets ads based on an abstraction derived from your browsing history——an abstraction such as your interest in golf. When you download your data, Facebook tells you what it thinks your interests are but doesn’t provide the specific evidence for why it thinks that.
“If you downloaded this file [of sites Facebook knows you visited], it would look like a quarter to half your browsing history,” Mr. Garcia-Martinez adds.
Another reason Facebook doesn’t give you this data: The company claims recovering it from its databases is difficult. In one case, it took Facebook 106 days to deliver to a Belgian mathematician, Paul-Olivier Dehaye, all the data the company had gathered on him through its most common tracking system. Facebook doesn’t say how long it stores this information.
There is more data Facebook collects that it doesn’t explain. It encourages users to upload their phone contacts, including names, phone numbers and email addresses. Facebook never discloses if such personal information about you has been uploaded by other users from their contact lists, how many times that might have happened or who might have uploaded it.
This data enables Facebook not only to keep track of active users across its multiple products, but also to fill in the missing links. If three people named Smith all upload contact info for the same fourth Smith, chances are this person is related. Facebook now knows that person exists, even if he or she has never been on Facebook. And of course, people without Facebook accounts certainly can’t see what information the company has in these so-called shadow profiles.
“In general, we collect data on people who have not signed up for Facebook for security purposes,” Mr. Zuckerberg told Congress Wednesday.
There’s also a form of location data you can’t control unless you delete your whole account. This isn’t the app’s easy-to-turn-off GPS tracking. It’s the string of IP addresses, a form of device identification on the internet, that can show where your computer or phone is each time it connects to Facebook.
Location is a powerful signal for Facebook, allowing it to infer how you are connected to other people, even if you don’t identify them as family members, co-workers or lovers. Facebook says it uses your IP address to target ads when you are near a specific place, but as you can see in your downloaded Facebook data, the log of stored IP addresses can go back years.
The new normal?
Google and a host of smaller companies that compete with and support the giants in the digital ad space have become addicted to the kind of information that helps microtarget ads.
That level of precision is at the heart of Facebook’s recent troubles: Just because Facebook uses it to accomplish a seemingly innocent task—in Mr. Zuckerberg’s words, making ad “experiences better, and more relevant”— doesn’t mean we shouldn’t be worried.
Two bills were proposed in the Senate the day after Mr. Zuckerberg’s testimony, one of them bipartisan. Both would create new penalties for data breaches and would require Facebook and the rest of the ad-tracking industry to be more transparent and allow people to opt out easily. Facebook has withdrawn its opposition to the California Consumer Privacy Act, a November ballot measure that includes many of the same provisions. As of May 25, the EU’s General Data Protection Regulation will force all advertisers to proactively ask for permission to capture or use any personal data.
Regulators the world over are coming to similar conclusions: Our personal data has become too sensitive—and too lucrative—to be left without restraints in the hands of self-interested corporations.