The missing clauses in Google’s “Customer Match”

In September Google announced “Customer Match”, a new tool for advertisers to target their existing customer using their email addresses. “Customer match” is almost like Facebook’s “Custom Audiences” but Google and Facebook seem engaged in “a privacy race to the bottom” and Google may have taken the lead.

Targeting email addresses

Advertisers aim at targeting prospects and existing customers. While remarketing offers them the opportunity to target potential buyer, advertisers were so far not able to differentiate between their existing  customers and new prospects. They also lacked the possibility to target their “loyal” clients (i.e. those who have subscribed to a loyalty card) because there is no link between the cookie IDs assigned to their browser by ad-networks and their loyalty card number or even their online customer account. “Custom Audience” and “Customer Match” (thereafter “Customer targeting”) create a bridge between the email addresses  used to create a “Best Buy” account or CVS loyalty card and Google and Facebook accounts.

Via “Customer targeting”, advertiser will be able to pull the information they gathered about your shopping habits and leverage it to target you on Facebook and on Google. The advertiser won’t send directly ads that they want to show you and attach it to your address. Instead they will create group of “audiences” by creating groups of email addresses of  their customers. They will send those hashed email addresses to Facebook (or Google) which will check to see if those hashed email addresses match those of registered users.

Technically, Facebook does not see the email address but just the hash. So if you’re not in their user database they will not be able to know that you’re a “Best Buy” customer. That being said, the technical guarantee may not be sufficient considering the computational  resources of giants like Google and Facebook that could generate many hashes to brute force the hashed email address and retrieve the lists of customers. In fact, in another context Google seems to admit this and required that Google Analytics user don’t send hashed identifier like email addresses or phone numbers.

Therefore the only guarantees are contractual; they are the engagements that Google and Facebook take when they receive email addresses (or phone numbers). Facebook and Google are committed to not retrieving the email addresses of people that are not registered to their services. Similarly their contractual clauses prevents them from keeping those lists of hashed identifiers for more than a week (that would remain largely enough for them to break most of them).

Facebook ToS

Facebook Terms of Service are quite constraining for Facebook itself as they more or less prohibit Facebook from doing anything with the hashed email addresses other than using them to help an advertiser reach its audience. Therefore, Facebook cannot add information to the profile of its users. In fact Facebook specifically forbid appending “Custom Audience” data to users’ profiles. Furthermore, Facebook won’t let an advertiser target the audience of another advertiser. For instance “Target” should not be capable to target “Best Buy” customers. Facebook adopts a data processor position with respect to Custom Audience, the advertiser being the data controller.

Except of "Custom Audience ToS" https://www.facebook.com/ads/manage/customaudiences/tos.php

Except of “Custom Audience ToS” https://www.facebook.com/ads/manage/customaudiences/tos.php

 

Google Customer Match

Google took another approach with its service. Google did not include clauses to prevent them from appending “Customer Match” data to user’s profiles. The restrictions only impact the list of email addresses , but there is no restrinction on the use of the list of matched profiles which can therefore be used by Google.

Customer Match conditions from https://support.google.com/adwords/answer/6276125

“Customer Match” conditions from https://support.google.com/adwords/answer/6276125

 

In fact, Google implicitly admitted that these data will be appended to user profiles when it modified its Privacy Policy in August to include data obtained from partners in Google Accounts data.While the change remained unnoticed then, it became clearly more critical after “Customer Match” was announced.

Change made to google's privacy policy on August 19th

Change made to Google’s privacy policy on August 19th

 

Consequences of Google posture

Google’s decision to include “Customer Match” data in its user accounts will impact user’s privacy and also advertiser’s competition.

  • Since the data will be included in the account, it means that Google will have a more comprehensive view of its users which is a big step to merge offline and online data (also known as data onboarding). This may have significant negative impact as it puts Google at the center of all these data-flows… until Facebook announces its riposte.
  • On the up-side, this could be beneficial for transparency because users could be made aware of the advertisers targeting them if Google shows these data on privacy dashboards (that’s a big if).
  • However, because Google is a data controller with respect to ‘Customer Match”, advertisers may be reluctant to share information about their customers knowing that it could potentially be re-used by competitors or by Google itself. Not only Google could share these data with other advertisers, thus allowing competitors to target each others audience to stir-up the demand and thus the price, but Google could also be tempted to use the data for its direct benefit.

Acknowledgement

Thanks to Armand Heslot for providing feedback on a draft.

Impact of Google privacy policy on web tracking

Google most important privacy policy changes happen almost two years ago. The change was announced as a clarification of the policies which will mainly be used to simplify and improve services. Now that the changes are effective, it is interesting to observe what the consequences of the new policy are and what has changed. In this blog post I focus on Google tracking capabilities and show that the changes allow Google to improve significantly the way it tracks users on the web.

The claim about DoubleClick cookie information

One of the few protective claims Google made in its policy was that “[they] will not combine DoubleClick cookie information with personally identifiable information unless we have your opt-in consent”. Some understood that Google would not combine information from the Google Account with information from DoubleClick ad-network, but that was not the case.

Using information from the Google profile

As a matter of fact, Google has so far combined many pieces of information from its ad network with information obtained from Google profiles. Your age and gender have already been shared with DoubleClick advertisers for many months now as shown on Google Ads Setting page. At the beginning, these data were shared on an opt-in basis through the “+1 personalization page”. It was not obvious that his page controlled how information from your profile was shared with advertiser as this was only mentioned as “+1 and other profile information”.

This page shows part of the information advertisers can use to target you.

The “+1 personalization” (see below) page has been removed when Google announced “ad endorsement” and now the URL of the page redirects to the ad-endorsement page. As a matter of fact, it is no longer possible to opt out of ads on the web be based on your Google profile without opting out of all interest based ads.

This page was buried in Google+ settings and was removed when Shared Endorsement was announced.

This change came with no announcement, because the privacy policy only prevents Google from combining PII from the Google profile.

Ad customization based on visited website

The policy does not prevent Google from associating your visits on websites affiliated to DoubleClick to target your Google profile. As a matter of fact, your Google account can be retargeted by DoubleClick affiliated websites you visited. This feature — called Remarketing list for search ads – lets advertisers retarget previous visitors on Google Search.

Technically, Google cannot recognize when a user visited a site web affiliated to DoubleClick because the domains associated to the cookies are different. When you’re doing a search on Google, Google reads only cookies attached to “google.com” domain, whereas on Google Display Network (i.e. the set of websites with DoubleClick ads) cookies are attached to the doubleclick.net domain. Google knows the DoubleClick cookie ID of people who visited a website on Google Content Network but it does not know their Google ID. This is problematic because when you do a search on Google, you do not reveal you DoubleClick ID but just your Google ID. So when you do a search, Google cannot know if you’ve visited a website which does retargeting.

To solve this, Google redirects your browser from the doubleclick.net domain to the google.com domain. When you visit a website which wants to retarget you, DoubleClick redirects you to google.com domain and Google adds your Google ID to the list of persons who visited the advertiser’s website. Next time you’ll do a search Google will recognize your Google ID and retarget you with ads for the website you visited. The figure bellow explains how Google records that a user visited the website ABC (you can capture the actual frames on worldstore.co.uk).

Through this process, Google associates the list of websites affiliated to Google Display Network (it means with a DoubleClick tag) you visited to your Google ID. Consequently, part your web browsing history (the part containing websites which do remarketing) is actually combined to your Google profile and you cannot review it. Notice that Google never proposed a way to know which website you visited and try to retarget you, but while Google could have claimed that your browsing history was only associated to you “anonymous” DoubleClick ID, it is now attached to your personal Google account.

Summary of what Google can combine with DoubleClick

To summarize, Google cannot combine personally identifying information from your Google account with you DoubleClick cookie information, yet it can:

– Use information from your Google account (age, gender and probably very soon a list of your interests) to personalize ads that you see on DoubleClick affiliated website
– Link visits on DoubleClick affiliated websites to your Google profile and retarget you when you do a search on Google.

In the end, Google privacy policy with regard to advertising is well summarized on this page:

  • “[They] don’t share personally identifiable information with advertisers.
  • [They] don’t allow advertisers to show ads based on sensitive information, such as those based on race, religion, sexual orientation, health or sensitive financial categories.”

In the next page, I consider how Google combines information from Google profile and DoubleClick with data obtained though Google Analytics.

Google’s Ad Targeting under the new Privacy Policy

Google new privacy policy will be effective starting March the 1st. The Electronic Frontier Foundation (EFF) suggests to delete your Web Search History, and I strongly recommend to follow this advice because:

1)      The searches that have been recorded in your Web Search History before March the 1st will be subject to this policy [1].
2)      Advertisers could target ads based on your browsing interests and interests inferred from your Web Search History.

The Good Points

First, I have to say that Google did a remarkable job advertising the new policy: notifications are everywhere. I don’t remember any of the policy updates being that much advertised and then commented.

Another good point is that many privacy policies have been merged in one privacy policy. It is no longer required to have a dozen tabs opened to have a good view of the policy. However, you still need to have an extra tab on the FAQ page with the definitions required to understand the Policy. Google could have used the empty space in the right column to display these definitions (like search result previews).

 

The really bad one

So much for the good points, now let’s discuss the policy itself. The bottom line is this policy would allow advertisers target you based on your web search profile and other interests you expressed in your emails or through your use of Google services. And this list of interests can be combined with the list of interests they built based on your DoubleClick cookie.

Google does not need our Opt-in consent to combine your web search profile to your DoubleClick cookie information. Starting March the 1st, Google could adopt a solution similar to what is deployed by Microsoft to target ads based on your search interests, although a sentence in the policy seems to prevent such use of your data:

“We will not combine DoubleClick cookie information with personally identifiable information unless we have your opt-in consent.”

In fact, it means that your Double-Click cookie will not be linked to your personally identifiable information. So Google can not put your name in front of the list of interests they inferred from your browsing behavior and will not put your name (or any other PII) in the ads you see. Because your Web Search history is likely to be unique, it identifies you and therefore can not be combined to your DoubleClick profile [2].

But your search profile (i.e. the list of interests inferred from your search history) is unlikely to be unique and therefore does not identify you so Google can combine it with your DoubleClick cookie information [2]. I believe they could also include some the of search results you clicked on to retarget you.

Similarly, your age, gender and interests expressed during Gtalk and Gmail discussions (or any other interest that Google could infer but that you would not be the only one to express) could be associated to your DoubleClick cookie. If you have any suggestion to deal with these data, do not hesitate to share it.

[1] See Google Policy FAQ: “Our new Privacy Policy applies to all information stored with Google on March 1, 2012 and to information we collect after that date.”
[2] Google defines Personal information as information “you provide to us which personally identifies you, such as your name, email address or billing information, or other data which can be reasonably linked to such information by Google”.

Searching session cookies and click-streams

In our paper on Google’s session cookie information leakage, Vincent Verdot and I described how to captures SID cookies on a shared network and run the attack with Firesheep (see the previous post).

Nevertheless, there are other ways to capture such cookies. For instance one could use malware to capture search traffic, but the simplest solution remains to search SID cookies.

Redirecting traffic

Using a malware to redirect the traffic of infected computers through a proxy controlled by the attack would allow to capture session cookies. Such infection has recently been detected by Google which displayed a banner on its search page [1]. In that particular case, Google traffic redirection was merely a side effect which triggered the malware detection.

According to Google, a couple of millions of computers [1] were infected by this malware. Attackers could have captured a significant number of session cookies and run attacks described in our paper.

Googling for cookies

The simpler solution to find SID cookies is to search them. Typing the right query in Google provides a list of pages where people published captured HTTP traffic, including SID cookies (also works with Yahoo!).

If you replace your SID cookie by one of the cookies listed in these pages, you will receive the same personalized results than its owner. From these results you can quickly extract a list of visited results, Gmail contacts and Google+ acquaintances.

Not all these results contain full SID cookies and some of the listed SID cookies may have already expired, but this simple search should already provide many valid cookies to test the flaw. I’ve written a Chrome extension to simply replace the SID cookie for the “google.com” domain and quickly test different accounts. Once installed, click on the red button in the upper right corner, past the cookie value and click save.
On Firefox you could use the Web Developer extension to edit cookies (it does not seem to work on Firefox 5.0).

Linking data and PI

By publishing their (apparently innocuous) cookies users indirectly published part of their click-stream and associated it to their email address. Thus they established a public record of having visited these URLs [2], and this record is now linked to their name. From there, their full anonymized click-stream — not reduced to visited search results — could be de-anonymized by a tracking ad-network.

References

[1] Damian Menscher, “Using data to protect people from malware”, http://googleonlinesecurity.blogspot.com/2011/07/using-data-to-protect-people-from.html
[2] Arvind Narayanan, “There is no such thing as anonymous online tracking”, http://cyberlaw.stanford.edu/node/6701

Show me your Cookie and I’ll tell you what you visited

Web Search History Information Leakage

Back in February, I re-discovered a small flaw in Google Search: result personalization leaks the list of results you clicked on. This leak was already known and mentioned in a paper by Castelluccia et al., but several features added by Google made it critical.

  • First there is the possibility (for web search history users) to only view result that have already been visited (visit http://www.google.com/webhp?tbs=whv:1).
  • Second,  with Google Instant it is possible to view visited links quickly without living a trace in the victim Web Search history (the attack is not-destructive).
  • Third, when Google display previously visited search results, it used to provide the query that led to the results searching when he clicked on the result, thus the attacker knew which keywords the victim was usually searching for and could have enter these keywords in a search box to get new results which will suggest new keywords to type and so forth and so on…

The third point has been addressed by Google very recently, when they introduced the new interface with the black top bar.
Vincent Verdot and I wrote a paper about this flaw. In order to conduct an experiment, we’ve been working on a proof of concept and an evaluation tool that we used to gather results.

Proof of concept based on Firesheep

This proof of concept is based on Firesheep (I just added a module and modify the attack launched when a SID cookie was captured). Firesheep is only working with the latest version of Firefox 3.6, do not expect to run it on Firefox 5.
With our version of Firesheep, when a Google SID cookie is captured, the account name appears in the Firesheep sidebar. Double clicking on it starts the attack; double clicking again displays the retrieved list of visited links.

The Evaluation tool

We also designed a Firefox extension which downloads your web search history on your computer, issue a couple of search queries (mostly searching for extensions like: « .com, .fr, .us, .html, www, … ») and see how many clicked links can be retrieved.
We’ve run this experiment with a dozen of account and sent the result to Google. We’ll soon publish the paper as a technical report.

How to protect your Click History

We’ve been in contact with Google Security Team who is working on a fix that should soon be deployed. In the meantime, make sure you’re not logged in your Google account when you’re connected on an unsecured network.
If you do not use Web Search History you may also purge it and disable the feature (visit https://www.google.com/history).
Also, TrackMeNot and Unsearch will reduce the exposition of your click history.

 

Running the Test

If you want to run the test 5 minutes:

  • A Google Account with Web Search History enabled. To check that you activated it, visit https://www.google.com/history; if it asks you to turn on the feature, then you cannot help here. Thanks anyway for trying.
  • Install this Firefox extension (https://unsearcher.org/Test%20Flaw/ad@monitor.xpi), download it and then drag it and drop it in Firefox. Once the extension has been installed, you should restart Firefox.
  • Modify your Google Search preferences (http://www.google.com/preferences?hl=en) to disable “Google Instant” and set the number of returned result to 100 (instead of 10).
  • Sign out and Sign in again on Google.com.
  • In Firefox, click on “Tools-> ADMONITOR-> History”. A first message should appear to inform you that the extension is about to extract your search History. Click on OK and do not close the Firefox window.
  • After five minutes, another message will be prompted to inform you that the test is finished. It’ll tell you where you can find the generated file. A Firefox window should have open (not necessarily taking the focus). You can send me the content of this window via e-mail and we’ll integrate it in our experiment results.
  • You can remove the generated files and uninstall the sid@testextension.

Thanks for helping us.

A follow up on Google Policies

Last year, I started to analyze Google Search and Google Suggest logs retention policies for the NYU Privacy Research Group meetings. To complete this analysis, I’m trying to review policies of other Google services.

‘Personal Information’ vs ‘Information we collect’

While I just started this review, I noticed that Google seems to change the name of the section describing the recorded information. This section is either called:

  • “Personal Information” for +1, Blogger, Buzz, Notebook, Groups, Knol, Moderator, Music, Orkut, Picasa, Power Meter, SafeBrowsing , Sites, Voice, Web History and YouTube
  • “Information we collect” for Advisor, Checkout, Desktop, Gears, TV, Location, Mobile, Toolbar, Trader, Web Accelerator.

My first understanding was that for services that require a Google Account to be used, Google uses the terms “Personal information” otherwise it uses “Information we collect”. But there are several exceptions. For instance, SafeBrowsing does not require an account to be used but Google TV does.

In addition, explicit references to server logs are made in these Personal Information sections while Google does not consider server logs as Personal Information (see their FAQ).

The Knol bridge

A loophole in Knol Privacy Policy allows Google to link your IP address and cookies to your user account. Knol (for Knowledge) is Google’s alternative to wikipedia. You need to have a Google account to contribute to Knol and — like most for Privacy Policy of Google services — Google mentions that it :

‘records information [your] account activity (e.g., storage usage, number of log-ins, actions taken), data displayed or clicked in the Knol interface […] and other log information (e.g., browser type, IP address, date and time of access, cookie ID, referrer URL). If you are logged in we may associate that information with your account.
[emphasis is mine]

This last sentence is unusual and suggest that if you ever logged in and visited Knol, Google can associate your IP address and Cookie IDs to your Goolge Account — and all the personal information attached to it. From that, Google can directly de-anonymized the searches you did when you were not logged in.

A policy template

This loophole is certainly not intentional; this exact sentence appears in many privacy policies . As a matter of fact, this sentence also appears in YouTube and Blogger policies. Therefore we can assume that a same template has been used for services hosting user generated content.

However there are two big differences between Knol and Youtube or Blogger:

  • There is no explicit mention of the server logs in these policies. For these services, Google only mention that their ‘servers automatically record information about your use of the service’.
  • Both Blogger and YouTube have their own domain names, meaning that cookies you send to YoutTube are different from the cookies you send when you’re visiting a Google website. Unlike these services, Knol uses Google domain name. Therefore, you send to Knol cookies that you also send to Google when you are doing a search.

While not dramatic considering Knol relative lack of success, this mistake could have been more critical in the privacy policy of a more popular service.

Introducing Unsearch

Unsearch Capture Screen

My first post introduces ‘Unsearch’, the extension that gave its name to this blog. Unsearch is an extension for the Google Chrome browser that allows to search on Google without leaving traces on Google Search logs or on Google Web Search History. Normally, your searches on Google are recorded and logged with your IP address and cookies. While one can manage his web search history through this interface, it is not possible to manage Google search logs.

Motivations

Analyzing Google Log retention policies we found out that, even after anonymization, some pieces of information in Google search logs might lead to user identification ( for further information, see our paper). When you use Unsearch all pieces of information are deleted from Google servers within 15 days.

Also, one may not want to have all his searches recorded in his web search history. While it is possible to log out or to delete entries from Google Web Search History, I prefer to have the possibility to do ‘off the record’ searches directly on the search page. Especially now that the ‘Log-out’ button is one additional click away and that we have only three seconds before the search is recorded.

Paradoxically, with the new navigation bar, I find it more difficult to notice when I’m not logged into my account. Putting the risk of pseudonym usurpation aside, the account information is – in my opinion – less visible. That’s problematic because I can not remove searches that I did from someone else account.

How it works

When you type keywords in Google Search bar, you get Google Instant results but your query won’t be logged unless you click on the page or remain inactive for three seconds (moredetails).

When you click on Unsearch, it removes the keywords you typed in the search box (so they will not be recorded), but before that, it copies the Google Instant result and pasts it in another tab where your interactions are not monitored: you can browse the results without Google noticing (redirects are removed).

Because you never clicked on ‘Search’, Google won’t log your query and it won’t appear in your web search history either. And since you’re logged into your Google account, you’ll still get personalized search results. This approach is somehow complementary to TrackMeNot which lets user shape their search profile without issuing queries.

Shortcomings

Because Unsearch is based on Google Instant, this feature must be turned on for Unsearch to work. Furthermore, you have to click on Unsearch within three seconds or Google will log the query as a normal search. I tried to find a fix (like adding and removing space in the search bar) but I’m looking for a good solution.

In fact, the main drawback is that Unsearch takes advantage of Google Instant log retention policy. Should this policy change, Unsearch would no longer prevent Google from logging searches. Such change is very unlikely as Google never extended its log retention period or made its ‘anonymizing’ process less effective.

Furthermore, even if this policy is changed, countermeasures exist to prevent Google from logging most of the queries in their entirety (see some possible solution in Unsearch Presentation (.ppt) ).