Finding a balance between access to info and privacy

Disclaimer: Until September 2017 I was a Technologist at  (CNIL) the French DPA working on the technical aspects of issues like the right to be forgotten.

On June 28, a decision of European Court of Human Rights reanimated the debate about the Right To Be Forgotten. The court rejected the request to delete some content on a website, considering that the right to access to information prevailed. The court made an interesting distinction with the Right To Be Forgotten applied to search engines. Search engines and publishers have different purpose so the ECtHR refers to ECJ decision for the search engines.

Because the decision on the Right To Be Forgotten is sometimes misunderstood, I try to explain why, in my opinion, the Right to be forgotten as proposed by ECJ should not interfere with access to information.

 

The right to be forgotten and public’s right to access lawful information

 

In its decision about the right to be forgotten, the ECJ anticipated the risk that this right could limit access to information and asked for a balance between the personal right to privacy and the public right to access information. In ECJ words, right to be forgotten should not apply if it appeared, for particular reasons, such as the role played by the data subject in public life, that the interference with his fundamental rights is justified by the preponderant interest of the general public in having, on account of its inclusion in the list of results, access to the information in question.”

 

Sure, finding the balance is not always easy, but CNIL and Google tend to agree on cases where the right to be forgotten should not apply. As a matter of fact, in the first case that Google mentions in its post, Google is not the defendant: CNIL is. Indeed, the case were rejected by Google and the concerned persons sent a complaint to CNIL, but CNIL agreed with Google that this was going against interest of the general public in having access to information.

So while many fear that an extended right to be delisted will impede on access to information, this fear is not founded. The ECJ decision clearly strikes a balance between the right to be delisted and the right to access information. The objective of the decision is not to force information to be deleted, it is to prevent unwanted – and often out of context – search results from popping up when you Google someone.  The results are still available on Google and will appear as usual if you search anything but the name for which the results have been delisted. As a matter of fact, the information is not censored, it remains available on the website publishing it but it won’t appear out of the blue when you’re just googling someone’s name. This is somehow coherent with ECtHR decision: publishers are fully covered by freedom of expression but search engines main goal is not to publish information but to compile information about someone (paraphrasing p.97 of ECtHR decision).

The case ECtHR had to judge is an edge case and it seems that when it comes to RTBF applied to search engines, most cases are way more simple. If you want to understand the reality of what is at stake here, have a look at data provided by Google: the top 10 websites for which Google delisted results are social networks. Most of the complaints concern people asking to have comments, pictures and posts removed from social networks. In some cases the content is hosted by Google itself (e.g. Google Plus, YouTube) so individuals have no other remedy than asking Google to delist its own content when it’s inappropriate. Would platforms prefer to directly suppress such content?

Most impacted websites according to Google.

 

Google shall no longer be the only place where we look for information

 

Google no longer oppose the Right to Privacy to Freedom of speech (as it used to) but to the public right to access information. Despite the goal of Google to “organize the world information”, Google shall no longer be the only place where you look for information. Search engines algorithms are not meant to find true information, but to find information provided by authoritative sources. That’s why algorithm ranking failures have been observed repeatedly over a year.

Wall Street Journal’s Jack Nicas listed some failures of the featured snippets which are supposed to be the most trusted results provided by Google. Just a year ago, Google was displeased by top results provided by its algorithm and had to deploy patches hastly . While the company should be lauded for its reactivity, it should be noted that this reaction has been triggered by journalists discussing highly visible problems. Obviously, some reaction had to be triggered, but addressing only the cases reported by the press leaves the main problem pending. In fact, Google picks and chooses the search results that it believes are wrong and thus Google no longer consider the page authority to decide what are the most relevant results, it relies on media attention.  It’s not the first time that Google had to update its algorithm hastily. In 2013, reporters highlighted that mughshot websites were highly ranked by Google and were doing some extorting ex-convicted: individuals had to pay for their records not to appear on top of Google search results. A right to be delisted would clearly have had an interest there, but Google shortcut the regulators and updated its algorithm to « fix » the problem shortly after the NYT reported the matter. Last November, Eric Schmidt declared that Google will “derank” Sputnik and Russia today (two Russian websites).  Another patch developed promptly to postpone problem and hide it in the “next” result pages. Individually, these decisions seem laudable. However the big picture shows Google judging what is right and what is wrong and hide the « wrong » results  by ranking them in such a way that it cannot be discovered without clicking next a couple of additional times.

 

Google is not the Internet (and it’s not the Web either)

If you worried that article with very low ranks will not be seen you’re right, Google never returns more than 1000 search results per query. Practically, if people rarely see the 11th result they never see the 1001th: results following the 1000 results are as good as delisted. So among the millions of results that Google found (and you can see the exact number just below the search bar), Google will only show you 1000. Therefore it’s not because information is not available on Google that you should not keep searching for it.

Google shows only the top 1000 search results ranked by their algorithm.

Google is not the internet. The vast majority of internet websites are hosted by and operated through service providers other than Google. The entities with the technical ability to remove websites or content from the internet altogether are the websites’ owners, operators, registrars, and hosts—not Google. Removing a website link from the Google search index neither prevents public access to the website, nor removes the website from the internet at large. Even if a website link does not show up in Google’s search results, anyone can still access a live website via other means, including by entering the website’s address in a web browser, finding the website through other search engines (such as Bing or Yahoo), or clicking on a link contained on a website (e.g., CNN.com), or in an email, social media post, or electronic advertisement. <- These are not my words, they are the words of Google’s lawyer in the Equustek case (see https://assets.documentcloud.org/documents/3900043/Google-v-Equustek-Complaint.pdf 15 & 16).

The point is that there are other sources of information: other search engines either concurrent or just integrated on news website. If you’re really looking for public information about someone, you should search his name on a newspaper website or on Wikipedia. A Google search may not return the information you’re looking for and is likely to return out of context information.

Wanted: A right to be forgotten

By merely patching PageRank issues, Google postponed the big debate that the ECJ is now forcing it to have. Many individuals have problems with search results about them appearing on Google but cannot expect a miraculous algorithm update to solve it. These people have to rely on the « right to be delisted» to remove offensive, defamatory our outdated and often untrue search results. The right to be delisted decision highlights the need for a balance between right to simply discover information and privacy, while not questioning the access to information on the web. Beyond the big cases reported by journalist and civic society, many individuals have more private problems with defamatory search result page, revenge porn, old comment that they wrote, pictures they should not have shared. There are not enough journalist and civic society associations to cover all these cases and most of them are not willing to have their complaints exposed by journalists.

@vtoubiana

 

Credits: Featured Image  is “Outside the European Court of Justice” by katarina_dzurekova (Creative Commons)

How Google is tracking Safari users on third party sites

A couple of weeks ago, google started to stop redirected users from Google.com to localized versions of the search engine. This rather innocuous change is likely to have effect on the way safari anti tracking protection copes with Google cookies. Indeed, Safari now deletes cookies of sites you have not interacted with over the last 24 hours [1]. If you type Google.com and then are redirected to google.fr, you actually don’t interact with google.com.  So Safari does not give Google a 24h permission to track you on other domains of the search engine.That won’t happen if Google stop redirecting users and just let them on google.com where they will interact with the search bar and other elements.

Why this is ironic

This is not the first change Google made to the way they handle localized domains. Google made an initial announcement, a couple of months ago, to tell users that localized domains would not be that relevant anymore [2]. Now Google seems to be deprecating local URLs to put everything under the google.com domain. This is quite ironic, because up to last year, they were arguing in a French Right to be forgotten case that google.fr and google.com were providing significantly different services [3]. The French court did not follow the reasoning. At that time Google was arguing that only 3% of European were searching on google.com while the remaining 97% were using localized version [4]. Today it seems that it’s actually 60-40 [5].

How Google is tracking Safari

Previously, Google advertising cookies on third party websites were only set from doubleclick.net. Users hardly interact with doubleclick.net so it was likely that Safari would block doubleclick cookies. Since September, Google has also started to set an advertising cookie from its Google.com domain. You can track the change of Google cookies explanation page via webarchive, and it seems that Google added the description of the ANID cookie in September [6]. However, you may not have noticed this new cookie in your browser. Indeed, I did a couple of tests to see this cookie but, oddly enough, Google is only setting this cookie on Safari browser. I did a couple of tests using Chrome and Edge developper tools, to emulate different mobile browsers: all iOS devices had the ANID cookie set, none of the other device did receive the ANID cookie.  Hence, Google is giving a special treat to Safari users, similarly to what Google did in 2012 in bypassing Safari tracking protection.

This may help other advertisers to track you

That being said, Google is only following Facebook here. Both are big first parties that – unlike mostly third party websites – were expected not to be impacted by Safari’s measure. It does not seem illegitimate for Google to take the same stance than its major competitor in the online advertising market. Yet, the ramifications of Google moves are more critical.

Unlike Facebook, Google is a major Web ad-exchange platform. It means that Google is hosting auctions where buyers get an opportunity to show an ad on your browser and to synchronize their cookies with those of Google. So if Google is in capicity to bypass Safari tracking protection and to keep cookies on user’s browsers, it’s likely that it will also benefit to all the ad-auction participants. Through Google cookies, third parties will be in capacity to recognize users even though their own cookies have been deleted by Safari. The stability of the Google cookie will technically allow third parties to track browsers over more than 24 hours.

In some sense, this is worse than before, when Safari was blocking all third party cookies and when Google was only serving ads and hosting auctions from the doubleclick.net domain. If Google is in capacity to leverage this advantage, it could be a significant blow to the competing ad-exchange marketplaces who are not in capacity to track Safari users over more than 24 hours.

 

Vincent Toubiana (@vtoubiana)

[1] “Intelligent Tracking Prevention”, https://webkit.org/blog/7675/intelligent-tracking-prevention/

[2] “Making search results more local and relevant” https://www.blog.google/products/search/making-search-results-more-local-and-relevant/

[3] “Au Conseil d’État, la portée territoriale du droit à l’oubli sur Google”, https://www.nextinpact.com/news/104691-au-conseil-detat-portee-territoriale-droit-a-loubli-sur-google.htm

[4]”Google says non to French demand to expand right to be forgotten worldwide “, https://www.theguardian.com/technology/2015/jul/30/google-rejects-france-expand-right-to-be-forgotten-worldwide

[5] “The end of google.{your country}?” ,https://whotracks.me/blog/google_domains.html

[6] “Types of cookies used by Google”, https://web.archive.org/web/20170909225414/http://www.google.com/policies/technologies/types/

[7] “Google Busted With Hand in Safari-Browser Cookie Jar”,  https://www.wired.com/2012/02/google-safari-browser-cookie/

Cross checking IAB’s numbers

It looks like on Mobile the numbers  IAB’s reports are based on mostly reflect the dynamic of Google and Facebook advertising revenues, not those of average App developpers.

 

Like many people interested by privacy, I keep an eye on #ePrivacy on Twitter. This hashtag, is used by both people advocating for more privacy (consent as the main legal basis, stronger default settings, no cookie walls) and those pushing for legitimate interest, cookie walls, no default settings,…. Both sides have arguments and I’m obviously more receptive to the former group of people but I try to look at the argument of the other side as well. During the last few weeks, IAB Europe representatives quoted several reports that, according to them, show the importance of the ‘data-driven’ advertising in Europe  (I put ‘data-driven’ into quotes because this refers in fact to almost any type of advertising)

I wanted to analyze the number behind the reports, especially behind the report called “The economics of contribution of digital advertising in Europe”. I tried couple of times to get more info about some of these numbers but did not get a response so far. So I had a closer look to the document. It is an update of a previous 2015 IHS report . Interestingly, in this report the definition of publisher is very broad:

”Publishers, or media outlets, who serve audiences by providing them with content, and serve advertisers by supplying them with audiences. In this report, ‘publisher’ includes any type of company where audience attention and content meet.”

So publisher includes search engines (Google , Yahoo, Bing) and social networks (Facebook, Twitter and Linked). It means that the figures showing the publisher revenues are likely to include the revenues of this company which are largely higher than average publisher revenues.

The mobile ecosystem

While the figures are not false, these figures will certainly not correspond to the revenues of average European publishers or app developper. For instance when reading the figure on mobile advertising revenue, the reader may not be aware that Facebook advertising in Europe is 5,4 €bn (see 2016 Facebook earnings) and that during the 4th quarter of 2016, 84% of its revenues comes from mobile. Meaning that Facebook mobile advertising revenue in Europe is about 4.5 €bn, more than a third of the mobile advertising revenue estimated by IAB.

Facebook advertising revenues (source: https://s21.q4cdn.com/399680738/files/doc_presentations/FB-Q4%2716-Earnings-Slides.pdf)

 

Google’s share is even bigger. So it is likely that if your remove Facebook and Google revenues from the figure, you will be left with at most a third of the 11 €bn made in 2016. And so  mobile advertising will not be the main revenue on mobile. As far as I know, there is no such dominant players in the app ecosystem, so app revenues made of paid apps and in-app purchase will be more equally spread.

What’s behind these numbers

I should add that it’s not simple to know what these numbers reflect from an app developer point of view as it just compares the global revenue without show the cut that the developer receive. Considering the number of intermediates, the global market revenue tells little about the revenue received by the developer.

Finally, it’s hard to get more details on these number and to know exactly what they reflect as there several revenue sources for app developpers. For instance a 2014 GigaOM report shows that added In-App purchase and Paid App revenues make up 4,5 €bn  when the IAB report shows that, during the same year, these revenue sources were neck and neck with mobile advertising.

Revenues charts from different sources shows significantly different revenues in 2013

Final words

These reports are quite opaque so it’s complicated to know where these numbers come from. I’m not arguing that IAB Europe numbers are wrong but they may be misunderstood and as probably do not reflect the reality of average app developers who  probably get more revenues from paid-app and in-app purchases than from advertising.

A list of Google services vulnerable to Session hijacking

After finding an information leakage in Google Search, I’ve been curious to see if there were no other pieces of information that could be gleaned from other Google services. To verify this, I visited my Google Dashboard, replaced my SID cookie and clicked on all the HTTP services that were listed.
My first attempt failed as I was systematically redirected to the account page where I was asked to enter my password. I then tried to also spoof the HSID cookie — also sent clear text — but because HSID cookie is an HTTPOnly cookie [1], it cannot be modified by a script or by the user: the cookie can only be modified by the server.

Spoofing HTTPOnly cookie

The best solution I found was to install a local proxy to intercept the HTTP traffic and then modify the cookies (I recommend Burp free edition which does a good job). It is then quite simple to replace the HSID cookie in the sent requests.
This time it worked, I was able to log into two services under with the spoof account:

  • Google Alerts: I was able to view and edit the mail alerts that were configured for the spoofed account.
  • Google Social Content: This service lists all your Gtalk contacts (that means most of the people you chatted with a couple of times).
  • Google Contacts: This is the Gmail contacts manager, it allows you to view, edit and create Gmail contacts. Quite useful if you want to get a list of persons to spam. An attacker could also attempt to replace the mail address of a contact with its own mail address.
  • Google Reader: You could see and edit RSS subscription.
  • Google Maps : You could see the maps associated to the spoof account.

There might be other vulnerable services but I think this list is already quite exhaustive and each of the listed service is likely to provide sensitive information.

Design flaws

Spoofing an unsecured cookie to hijack a session is nothing new. Nevertheless, there are two design flaws that HSID and SID cookies spoofing more critical:

  • These cookies can be used to provide an access to multiple services: when Google created these services, it did not assign a specific cookie for each of them. Therefore a single pair of cookies provides an access to all these services.
  • SID cookies are still valid even after the user logout: if a user thinks his session has been compromised, there is nothing he can do to revoke it. It seems that this was already pointed out 4 years ago .

Conclusion

Google is working on these issues and they should be fixed soon (users are already redirected to encrypted search [2]). Therefore, a next step would be to check if other major Web service providers have a better cookie policy.

Reference:
[1] Jeff Atwood, “Protecting Your Cookies: HttpOnly”, http://www.codinghorror.com/blog/2008/08/protecting-your-cookies-httponly.html
[2] Evelyn Kao, “Making search more secure”, http://googleblog.blogspot.com/2011/10/making-search-more-secure.html