Implementing cookie consent with “Content Security Policy”

In this post I briefly explain how “Content Security Policy” could be used to enforce the cookie consent regulation by blocking third parties content.

Cookie Consent

EU cookie regulation imposes to website editors to obtain the informed  consent of visitors before setting cookies. Therefore, a website should first check that it has consent before dropping its own (unnecessary) cookies and so should the third parties that are called by the website. While it’s fairly simple for a first party to check that it has obtained consent (e.g. storing the consent in a cookie) third parties are in a different situation because they cannot read the first party cookies to deduce if consent has been granted.
Therefore the responsibility sort of shift to the first parties which are in a position to inform and obtain consent; yet they have to prevent third parties from setting cookies as long as consent has not been obtained. Tag managers are elegant solutions that can be deployed to do that. A less elegant solution is to rely on “Content Security Policy” to prevent external resources setting cookies from being loaded by the browser.

Overview of Content Security Policy

A “Content Security Policy” is a “declarative policy that lets the authors (or server administrators) of a web application inform the client about the sources from which the application expects to load resources”. This policy can be viewed as a white list of resources that can be loaded by the browser when requesting page. Content security policies can be conveyed in two forms, either through an HTTP header or in a http-equiv meta tag in the header of the HTML document .

Implementation of the policy

To comply with the cookie consent regulation, a website may simply use a “Content Security policy” to block any third party from loading content and subsequently setting cookies as long as consent has not been granted. Notice that this solution is not specific to cookies; it prevent all types of resources from being loaded thus effectively preventing all types of fingerprinting by third parties.
A quick “cookie consent” implementation is to set to check when a GET request is received if the “consent” cookie is set and to adapt the “Content-Security-Policy”: if consent has not been granted block all third parties, otherwise set the usual policy.
The easier implementation is to use JavaScript to insert the http-equiv tag in the HTML (although this is not recommended), website editors can just add the following JavaScript tag in their pages and that should do the trick:

if ( document.cookie.indexOf('hasConsent') < 0 ) {
 var hostname = window.location.hostname;
 if (hostname.indexOf("www.") ===0) hostname = hostname.substring(5);
 var meta = document.createElement('meta');
 meta.httpEquiv = "content-security-policy";
 meta.content = "script-src 'self' 'unsafe-inline' *." + hostname + "; img-src *." + hostname + "";
 document.getElementsByTagName('head')[0].appendChild(meta);
 }

A slightly more complicated solution is to set the HTTP header. The code is fairly similar but the complexity depends of the type of server you’re running. If you’re using PHP, you could do it like that:

if(!isset($_COOKIE["hasConsent"])) {
 $allowed_hosts = "*.unsearcher.org";
 header("Content-Security-Policy: script-src 'self' 'unsafe-inline' " . $allowed_hosts . "; img-src 'self' " . $allowed_hosts);
 }

Browser Support

Content security policy is still a working draft document so the feature is not supported equally by all browsers. As far as I can tell, Chrome and Safari implement all the required features, including support of the http-equiv tag. Firefox enforces policies that are set through the header but there is currently no support of the http-equiv tag. Finally Internet Explorer offers only very limited support of CSPs through  iframe sandbox property.

Testing

I’ve used BURP proxy to preview what websites would look like with content security policies blocking third parties, here are the results :

 

Content-Security-Policy: script-src 'self' 'unsafe-inline' *.lemonde.fr *.lemde.fr; img-src *.lemonde.fr *.lemde.fr

Content-Security-Policy: script-src ‘self’ ‘unsafe-inline’ *.lemonde.fr *.lemde.fr; img-src *.lemonde.fr *.lemde.fr

Content-Security-Policy: script-src 'self' 'unsafe-inline' *.nytimes.com *.nyt.com; img-src *.nytimes.com *.nyt.com

Content-Security-Policy: script-src ‘self’ ‘unsafe-inline’ *.nytimes.com *.nyt.com; img-src *.nytimes.com *.nyt.com

Content-Security-Policy: script-src 'self' 'unsafe-inline' *.slashdot.org  *.fsdn.com; img-src *.slashdot.org  *.fsdn.com

Content-Security-Policy: script-src ‘self’ ‘unsafe-inline’ *.slashdot.org *.fsdn.com; img-src *.slashdot.org *.fsdn.com

 

Conclusion

This solution is far from perfect, the main reason is that it is not supported by all browsers. Yet it provides a simple solution for website editors to block third party resources until  consent is obtained. Such solution is complementary to the tags provided on CNIL’s website that can be used to obtain consent before setting Google Analytic first party cookies.

Impact of Google privacy policy on web tracking

Google most important privacy policy changes happen almost two years ago. The change was announced as a clarification of the policies which will mainly be used to simplify and improve services. Now that the changes are effective, it is interesting to observe what the consequences of the new policy are and what has changed. In this blog post I focus on Google tracking capabilities and show that the changes allow Google to improve significantly the way it tracks users on the web.

The claim about DoubleClick cookie information

One of the few protective claims Google made in its policy was that “[they] will not combine DoubleClick cookie information with personally identifiable information unless we have your opt-in consent”. Some understood that Google would not combine information from the Google Account with information from DoubleClick ad-network, but that was not the case.

Using information from the Google profile

As a matter of fact, Google has so far combined many pieces of information from its ad network with information obtained from Google profiles. Your age and gender have already been shared with DoubleClick advertisers for many months now as shown on Google Ads Setting page. At the beginning, these data were shared on an opt-in basis through the “+1 personalization page”. It was not obvious that his page controlled how information from your profile was shared with advertiser as this was only mentioned as “+1 and other profile information”.

This page shows part of the information advertisers can use to target you.

The “+1 personalization” (see below) page has been removed when Google announced “ad endorsement” and now the URL of the page redirects to the ad-endorsement page. As a matter of fact, it is no longer possible to opt out of ads on the web be based on your Google profile without opting out of all interest based ads.

This page was buried in Google+ settings and was removed when Shared Endorsement was announced.

This change came with no announcement, because the privacy policy only prevents Google from combining PII from the Google profile.

Ad customization based on visited website

The policy does not prevent Google from associating your visits on websites affiliated to DoubleClick to target your Google profile. As a matter of fact, your Google account can be retargeted by DoubleClick affiliated websites you visited. This feature — called Remarketing list for search ads – lets advertisers retarget previous visitors on Google Search.

Technically, Google cannot recognize when a user visited a site web affiliated to DoubleClick because the domains associated to the cookies are different. When you’re doing a search on Google, Google reads only cookies attached to “google.com” domain, whereas on Google Display Network (i.e. the set of websites with DoubleClick ads) cookies are attached to the doubleclick.net domain. Google knows the DoubleClick cookie ID of people who visited a website on Google Content Network but it does not know their Google ID. This is problematic because when you do a search on Google, you do not reveal you DoubleClick ID but just your Google ID. So when you do a search, Google cannot know if you’ve visited a website which does retargeting.

To solve this, Google redirects your browser from the doubleclick.net domain to the google.com domain. When you visit a website which wants to retarget you, DoubleClick redirects you to google.com domain and Google adds your Google ID to the list of persons who visited the advertiser’s website. Next time you’ll do a search Google will recognize your Google ID and retarget you with ads for the website you visited. The figure bellow explains how Google records that a user visited the website ABC (you can capture the actual frames on worldstore.co.uk).

Through this process, Google associates the list of websites affiliated to Google Display Network (it means with a DoubleClick tag) you visited to your Google ID. Consequently, part your web browsing history (the part containing websites which do remarketing) is actually combined to your Google profile and you cannot review it. Notice that Google never proposed a way to know which website you visited and try to retarget you, but while Google could have claimed that your browsing history was only associated to you “anonymous” DoubleClick ID, it is now attached to your personal Google account.

Summary of what Google can combine with DoubleClick

To summarize, Google cannot combine personally identifying information from your Google account with you DoubleClick cookie information, yet it can:

– Use information from your Google account (age, gender and probably very soon a list of your interests) to personalize ads that you see on DoubleClick affiliated website
– Link visits on DoubleClick affiliated websites to your Google profile and retarget you when you do a search on Google.

In the end, Google privacy policy with regard to advertising is well summarized on this page:

  • “[They] don’t share personally identifiable information with advertisers.
  • [They] don’t allow advertisers to show ads based on sensitive information, such as those based on race, religion, sexual orientation, health or sensitive financial categories.”

In the next page, I consider how Google combines information from Google profile and DoubleClick with data obtained though Google Analytics.

Facebook may violate the FTC settlement in a few days

Update: Facebook started to show the announced prompt and ask for user consent.

Almost a year after it removed the option for 90% of its members, Facebook informed on Wednesday the remaining 10% that they’ll remove the “Who can search my timeline by name”  setting in a few days. Removing this setting si likely a violation of the 2011 FTC settlement.

Timeline concealed to the public

A month ago Facebook announced that they’ll prompt user to get their consent before removing the setting [1] but they finally decided to just inform users with an email and a very short notice displayed above the News Feed.

fbcomparison

In the mail sent to its members, Facebook argues that when they created this setting “the only way to find [them] on Facebook was to search for [their ]specific name. Now, people can come across [their] Timeline in other ways: for example if a friend tags [them] in a photo, which links to [their]Timeline, or if people search for phrases like “People who like The Beatles,” or “People who live in Seattle,” in Graph Search”. However, I’m confident that some users – including me — are not tagged in public photo, do not like public content and have no friend whose “friends list” is public.

Timelines of these users will not appear in public Graph Search results Facebook and there is no public link that could be used to find them. As a matter of fact, people who are not my friends (or friends of friends)  can’t even know if I have a Facebook account. As for today, the only solution to find my Facebook Timeline is to test the 1.2 billion userID numbers. In addition to be time consuming, this exhaustive search would violate Facebook Terms of Services.

Private vs Nonpublic

A Timeline page is public because any user can load its content but Timelines URLs (i.e. usernames) are not public since not anyone can find them: without the search functionality, it is not possible to retrieve the Timeline associated to a specific user. Timelines URLs are like unlisted phone numbers or Google Docs shared with “anyone with the link”. These documents may not be seen as private but I would not define them as public (i.e. I’d be unpleasantly surprised to see them used in an endorsed advertisement). I do not claim that Timelines are private, only that they are “nonpublic user information” .

Why Facebook could violate the FTC settlement

The FTC settlement does not focus on user private information but cover the entire nonpublic user information (e.g. a user ID to which access is restricted by a privacy setting). Indeed, Section II-A of the 2011 settlement requires that Facebook “prior to any sharing of a user’s nonpublic user information by [Facebook] with any third party, which materially exceeds the restrictions imposed by a user’s privacy setting (s), shall […] obtain the user’s affirmative express consent”.

Facebook will not only remove the possibility to select who can look-up timelines, they will set the setting to its default values “Everyone”. Hence, Facebook will modify settings of users who set it to a more restricted audience. Obviously the two lines message Facebook displayed and the email they sent to the affected members does not offer a valid solution to get an affirmative express consent. So Facebook will certainly violate the FTC settlement in a few days.

[1] Coincidentally, Facebook made this announcement about 5 hours after I tweeted that they should get an informed consent.

 

Google’s Ad Targeting under the new Privacy Policy

Google new privacy policy will be effective starting March the 1st. The Electronic Frontier Foundation (EFF) suggests to delete your Web Search History, and I strongly recommend to follow this advice because:

1)      The searches that have been recorded in your Web Search History before March the 1st will be subject to this policy [1].
2)      Advertisers could target ads based on your browsing interests and interests inferred from your Web Search History.

The Good Points

First, I have to say that Google did a remarkable job advertising the new policy: notifications are everywhere. I don’t remember any of the policy updates being that much advertised and then commented.

Another good point is that many privacy policies have been merged in one privacy policy. It is no longer required to have a dozen tabs opened to have a good view of the policy. However, you still need to have an extra tab on the FAQ page with the definitions required to understand the Policy. Google could have used the empty space in the right column to display these definitions (like search result previews).

 

The really bad one

So much for the good points, now let’s discuss the policy itself. The bottom line is this policy would allow advertisers target you based on your web search profile and other interests you expressed in your emails or through your use of Google services. And this list of interests can be combined with the list of interests they built based on your DoubleClick cookie.

Google does not need our Opt-in consent to combine your web search profile to your DoubleClick cookie information. Starting March the 1st, Google could adopt a solution similar to what is deployed by Microsoft to target ads based on your search interests, although a sentence in the policy seems to prevent such use of your data:

“We will not combine DoubleClick cookie information with personally identifiable information unless we have your opt-in consent.”

In fact, it means that your Double-Click cookie will not be linked to your personally identifiable information. So Google can not put your name in front of the list of interests they inferred from your browsing behavior and will not put your name (or any other PII) in the ads you see. Because your Web Search history is likely to be unique, it identifies you and therefore can not be combined to your DoubleClick profile [2].

But your search profile (i.e. the list of interests inferred from your search history) is unlikely to be unique and therefore does not identify you so Google can combine it with your DoubleClick cookie information [2]. I believe they could also include some the of search results you clicked on to retarget you.

Similarly, your age, gender and interests expressed during Gtalk and Gmail discussions (or any other interest that Google could infer but that you would not be the only one to express) could be associated to your DoubleClick cookie. If you have any suggestion to deal with these data, do not hesitate to share it.

[1] See Google Policy FAQ: “Our new Privacy Policy applies to all information stored with Google on March 1, 2012 and to information we collect after that date.”
[2] Google defines Personal information as information “you provide to us which personally identifies you, such as your name, email address or billing information, or other data which can be reasonably linked to such information by Google”.

A follow up on Google Policies

Last year, I started to analyze Google Search and Google Suggest logs retention policies for the NYU Privacy Research Group meetings. To complete this analysis, I’m trying to review policies of other Google services.

‘Personal Information’ vs ‘Information we collect’

While I just started this review, I noticed that Google seems to change the name of the section describing the recorded information. This section is either called:

  • “Personal Information” for +1, Blogger, Buzz, Notebook, Groups, Knol, Moderator, Music, Orkut, Picasa, Power Meter, SafeBrowsing , Sites, Voice, Web History and YouTube
  • “Information we collect” for Advisor, Checkout, Desktop, Gears, TV, Location, Mobile, Toolbar, Trader, Web Accelerator.

My first understanding was that for services that require a Google Account to be used, Google uses the terms “Personal information” otherwise it uses “Information we collect”. But there are several exceptions. For instance, SafeBrowsing does not require an account to be used but Google TV does.

In addition, explicit references to server logs are made in these Personal Information sections while Google does not consider server logs as Personal Information (see their FAQ).

The Knol bridge

A loophole in Knol Privacy Policy allows Google to link your IP address and cookies to your user account. Knol (for Knowledge) is Google’s alternative to wikipedia. You need to have a Google account to contribute to Knol and — like most for Privacy Policy of Google services — Google mentions that it :

‘records information [your] account activity (e.g., storage usage, number of log-ins, actions taken), data displayed or clicked in the Knol interface […] and other log information (e.g., browser type, IP address, date and time of access, cookie ID, referrer URL). If you are logged in we may associate that information with your account.
[emphasis is mine]

This last sentence is unusual and suggest that if you ever logged in and visited Knol, Google can associate your IP address and Cookie IDs to your Goolge Account — and all the personal information attached to it. From that, Google can directly de-anonymized the searches you did when you were not logged in.

A policy template

This loophole is certainly not intentional; this exact sentence appears in many privacy policies . As a matter of fact, this sentence also appears in YouTube and Blogger policies. Therefore we can assume that a same template has been used for services hosting user generated content.

However there are two big differences between Knol and Youtube or Blogger:

  • There is no explicit mention of the server logs in these policies. For these services, Google only mention that their ‘servers automatically record information about your use of the service’.
  • Both Blogger and YouTube have their own domain names, meaning that cookies you send to YoutTube are different from the cookies you send when you’re visiting a Google website. Unlike these services, Knol uses Google domain name. Therefore, you send to Knol cookies that you also send to Google when you are doing a search.

While not dramatic considering Knol relative lack of success, this mistake could have been more critical in the privacy policy of a more popular service.