Draft: Google’s Exposure Notification needlessly forces some user to turn on location

Recently I have been testing a few Contact Tracing app on Android and noticed something odd : all apps using Exposure Notification (the API provided by Apple and Google) require that the user enables location (i.e. GPS) while TraceTogether and StopCovid do not require location. According to Google PR, in order to detct Bluetooth devices around an app has to make sure that location is enabled. That would mean that TraceTogether and StopCovid may not work properly when users have disabled the GPS…Turns out they work pretty well on most phones and on Google is needlessly forcing some users to enable GPS on their phone. It’s not trivial to know which phone is affected and Google has no incentive to work on a fix.

Relation between Bluetooth and Location

Although the primary use of Bluetooth is to be able to connect devices that are close to the smartphone (headset, earpiece, …), it could be misused by apps to geolocate the user. Hence, some companies propose to deploy Bluetooth tags for geolocation purposes in stores [2]. To prevent users from being geolocated without their knowledge, Google has integrated protections in Android to warn users that they might be geolocated.

In 2015, Google integrated a protection into Android to prevent this misuse of features: any app scanning with Bluetooth to detect nearby devices is considered a location-aware app [3]. That’s why, starting with Android 6, to be able to scan in BLE, an app must obtain permission he location permission from the user. Therefore, whether it is to locate the user or not, Bluetooth devices cannot be discovered by an app unless the app got the location permission.

  An excess of transparency…

According to Android documentation, for an app to scan nearby Bluetooth devices, it only needs the location. In practice, this permission is not always sufficient.

Extract from the bluetooth documentation

Extrait de la documentation bluetooth

Src : https://developer.android.com/guide/topics/connectivity/bluetooth

Indeed, Google has imposed on second constraint on Bluetooth apps: for an app to be able to scan in Bluetooth, the phone’s location service must be activated. That’s an unexpected privacy protection which may defeat its original purpose: not only the app that wants to scan the BLE will be able to locate the user, but this will also be the case for all applications and services that have already obtained permission to locate the phone.

Figure 1 The BLE scan app asks for the permission to locate Figure 2 The app asks to activate the location 

This second condition is not mentioned in the Android documentation cited above, nor is it completely intuitive. It’s as if, for fear that voyeurs might use infrared cameras to spy on your home, your landlord (here Google) decides to remove all curtains in your condo. Of course, you are more aware of the risks of voyeurism and you adapt your behavior accordingly, but you are also visible to more people. Here the result is the same since all services and applications that have permission to locate you will also be able to access your location and not only the application using the BLE. Most notably Google services, one of them being activated by default on most Android phones, will also get access to your location.

It would have been advisable to discuss this technical choice, which is not insignificant and not necessarily understood by users or developers. As this technical constraint has not been documented, no debate could have taken place on its relevance or on the alternatives. It gets even worse : the behavior is not the same on all Android devices, partly to Google’s fault, and it’s fairly complicated for developers to find out which devices are affected.

    … not always followed by smartphone manufacturers

To add to the confusion, Google changed this constraint without much notice. On Android 8, apps can’t scan Bluetooth when location services are off but on Android 9 & 10, the behavior changed. An app can scan when location services are off if its filtering the result (i.e. it’s not listening everything but look only for devices using a known app/service).

Apparently, not all Android smartphone manufacturers [4] have implemented this constraint the same. For instance, on some smartphones with Android 9, location must be active for an app to be able to detect Bluetooth devices nearby, but on others it will work even with location disabled.

Surprised by this inconsistent behavior, one developer has tested several smartphones and, according to him, Android smartphones with a “lightly” modified Android version (i.e. with a version of Android that has been very little modified by the manufacturer) do verify that localization is enabled, but smartphones using more “customized” versions completely ignore this constraint [5]. 5] One may also wonder about the purpose of smartphone manufacturers who did not include protection when they “customized” Android. Moreover, it is not certain that Google is aware that some manufacturers have modified the Android code to remove this protection.

This fragmentation is problematic for developers since it does not allow them to have a consistent behavior from one Android smartphone to another. Thus, developers should test their apps on multiple smartphones to consider the different cases. The icing on the cake: a smartphone that blocks the BLE scan will not report an error to the app but will only indicate that it did not detect any device nearby. The application will therefore not be able to know if it is because no device is actually nearby or if it is because the OS has blocked the scan. 

Implications for contact tracing apps

Until now, relatively few applications have used BLE scans intensively. But the emergence of contact tracing apps is changing the picture, since they rely mainly on bluetooth to detect contacts. As a result, these apps need to constantly scan the BLE for other devices in the vicinity. However, on some phones, they will only be able to do so if the location service is activated.

    What proportion of users will be affected

It is not easy to estimate on what proportion of devices the BLE scan is blocked in the absence of location. On the one hand, for this protection to be blocking, the user must not have the location constantly active on his smartphone. I did not find an estimate of the number of users who disable the loc on their phones. A small twitter survey tells me that almost 50% of the people following me turn off the loc [6], but this estimate is probably biased. To my knowledge, the closest study is the CNIL barometer according to which 43% of users sometimes disable the loc [7].  It can therefore be considered that a fairly small proportion of users frequently disable it.

On the other hand, the proportion of devices that check that localization is activated before allowing BLE scanning is unknown to my knowledge. Until people were asked to constantly walk around with Bluetooth turned on in the hope of saving lives, BLE scans were rarely used, which explains the lack of interest for this problem.

Out of curiosity, I did a test on my smartphone (a Moto G5 Plus with Lineage) and the device scans well in BLE even with the localization turned off. However, it is difficult to know if this behavior is the exception or the rule.

Contact tracing apps with opposite strategies

To be sure to work perfectly on all Android smartphones, contact tracing applications should use force location activation. In practice, not all of them do this, either by omission (since it seems to work on some devices) or to prevent the user from thinking that he will be localized.

    EN-based applications force to geolocate

All Android contact tracing applications that rely on the Exposure Notificaiton API (proposed by Apple and Google) require the user to activate the localization in order to run [8] (except on Android 11). This summer, Google has been criticized as an attempt to retrieve more location data [9]. In fact, Google could not get around this restriction unless it changed the code of its OS, and it is only on Android 11 that Google can fix it.

By forcing users to enable tracking, Google ensures that contact tracing applications will work perfectly on all devices. But the corollary is that some users will enable location and Google location without really needing to do so (see box). In fact Google could reduce the number of users who needlessly activate location, but clearly it has no incentive to do so. First, Google should use filtered scanning (apparently this should be done as there is a FIXME comment…from August 25th). Second, Google could test on which devices Exposure Notification works with location off. Devices running Android 9+ are likely to be able to perform (filtered) scan with location off but previous version of Android are more likely to require location.

    StopCovid and TraceTogether do not force localization

StopCovid and TraceTogether have a different approach since they can be used even with localization disabled. On my phone, neither of the two apps checked that the localization was active before running scans [11]. Not knowing if it is a bug or a feature, I reported the problem to the developers [10]. The fact that both apps use filtered scanning mean that they are more likely to not be impacted by the location settings.

In cases where the bluetooth scan is blocked, StopCovid should only work halfway :

  • Since your smartphone always transmits in bluetooth, if you have passed by a person who tested positive to Covid, you should be notified.
  • On the other hand, if you have tested positive yourself you will not be able to notify the people you passed when your scanner was not active.

Google Location Services

When an application wants to locate a user, it can either use Android’s open API that uses GPS, or use Google Location Services (GLS) that offer faster and more accurate location. GLS are part of the Google Play Services, the set of proprietary APIs that are available on an overwhelming majority of Android smartphones (outside of China). GLS are enabled by default on most Android devices, and disabling them requires digiing in the smartphone’s location settings.

The main disadvantage of GLS is that they regularly report users’ location to Google. Indeed, as Google explains, “location data from your device is routinely collected and used anonymously to improve the accuracy of the location” [12].

[12] It is important to note a subtlety here: the location data that is sent to Google is not anonymous, only the use that is made of it is anonymous. This description of the data reported by GLS and the use made of it differs significantly from the description that was made until 2018. Until then, Google described the data that was reported as anonymous. For example, the NYOB complaint notes that Google described that “anonymous location data will be sent to Google when the device is turned on.

Extrait de la plainte NYOB

Excerpt from the NYOB complaint

Very little documentation explains what data is sent by GLS. In a response to two U.S. Senators in January 2018, Google stated that GLS retrieve location information associated with temporary and rotating identifiers [13]. 13] This is likely pseudonymized data and depending on the method and frequency with which these pseudonyms are generated, it is more or less easy for Google to find out to whom this location data corresponds.

Popup requesting GLS activation in Google Maps. The explanations (right screen) only appear if you click on the arrow

GLSs are often activated by users. While it is true that they enable faster and more accurate localization, their high usage rate is also explained by the simplicity of their activation: when an app wants to ask the user to activate localization, it is much simpler to do so using the GLS API: a popup appears and the user just has to click on OK. To my knowledge, there is no API that allows to activate the localization so simply without activating GLS.  Disabling Google Location Services requires digging in the phone settings [14].

Exposure Notification benefits from a privileged API

Contact tracing applications that want to be able to use the Bluetooth scanner on all Android devices have, a priori, no choice but to make sure that the phone’s location is active. When location is not active, apps have two options to ask the user to activate it:

    Ask the user to go to the phone settings to activate the location. This involves taking the user out of the app to go to the phone settings and then returning to the app. These operations are tedious and significantly detract from the user experience.

    The second option is to allow the user to activate the localization via a popup that appears in the app. This second option is preferable from a user experience point of view since the user remains in the app and doesn’t have to navigate through the phone settings. Many applications display such a popup when the localization is not active. The trick is that this option is only available with Google Location Services, it is not possible for a developer to make sure that only GPS is activated. By choosing this second option, the developer activates GLS and Google’s collection of pseudonymized location data.

Both options have their drawbacks: the first harms the user experience while the second harms the user’s privacy. But for applications that rely on EN, Google seems to have provided a third option: these applications can activate the location service from the app without activating GLS. This gives them the advantages of both approaches: a smoother user experience that does not systematically activate GLS.

The fact that all applications based on the Exposure Notification API use exactly the same text to describe the location activation indicates that the operation is performed at the Exposure Notification API level and not at the app level. As for GLS Google can change a parameter from an app because Plays Services have the required permissions to do so [15], but other applications do not have access to these features (note that apps using EN do not ask for localization permission either).

Captures of 4 contact tracing applications based on Exposure Notification

Even if location is enabled for the purposes of a contact tracing application, other applications that have permission to access the location and previously enabled Google services (Location History or GLS) will be able to locate the user. However, once again, one may question the relevance of allowing the collection of location data when the user simply wants to use a contact tracing app. Even on devices that require location to be on to perform Bluetooth scan, Google could have gone a step further and deactivated the sending of information to its own services when location was activated from a contact tracing app.

The way forward?

A better solution would be to enable location only on Android devices where it is necessary. Indeed, as we have seen, due to the fragmentation, on a part of the Android devics, location does not have to be activated to allow contact tracing apps to work. It makes no sense to force the activation of location on these terminals, but it is still necessary to be able to identify them, and the API does not necessarily allow this.

Could P2B force Apple to be more transparent about iOS APIs?

The Platform-To-Business regulation started to apply on July 12th. This regulation aims to shed more light on the relations between platforms and companies P2B. The regulation scope includes app stores. In this post I am interested in P2B’s impact on Apple’s app store and iOS APIs.

The regulation “promoting fairness and transparency for business users of online intermediation services” (aka Platform-To-Business or P2B)[1] was adopted in June 2019 and started to apply on July 12th 2020. As its name suggests, this regulation tries to strike a bit of a balance between platforms (i.e. online intermediation services and search engines) and businesses. The scope of the regulation is relatively broad and behind the term online intermediation services, we find hotel booking platforms (e.g. booking, hotels, …), goods marketplaces (e.g. marketplaces of amazon, fnac, …) and application stores. The regulation also covers search engines.

As pointed out by Arcep[2], this regulation is a first step towards device neutrality. Yet, P2B’s impact on device seems limited since APIs are quite explicitly excluded from the scope of the Regulation … except where they are directly connected to an online intermediation service :

Technological functionalities and interfaces that merely connect hardware and applications should not be covered by this Regulation, as they normally do not fulfil the requirements for online intermediation services. However, such functionalities or interfaces can be directly connected or ancillary to certain online intermediation services and where this is the case, the relevant providers of online intermediation services should be subject to transparency requirements related to differentiated treatment based on these functionalities and interfaces [emphasize mind].

Some iOS APIs could be within the scope of the regulation because the possibility for an app to use certain APIs is decided when it is submitted to the App Store

The App Store as an API control point

To develop an application on iOS, developers are restricted in the APIs they can use: only public APIs are documented and open to third-party developers. There are private APIs that offer more functionalities but there are supposed to only be used by Apple applications.

Apple verifies that no private API is used at the time an app is submitted to the App Store. In the past, some apps have been able to bypass this verification but have subsequently been removed from the App Store[4] for violating the App Store’s terms of use[5].

The issue of opening these APIs is important since some of them may give a competitive advantage to Apple’s apps[3] but if they were open to everyone, it could harm iOS security and stability.

Some APIs (both private and public) require explicit permission from Apple (called “entitlement”) at the time the application is submitted to the App Store. If an entitlement called by an apps is not granted, iOS will not let the application use the protected API[6].

Thus, these APIs can be considered as directly connected to the App Store since it is the latter that acts as a control point.

One could then consider that the P2B regulation applies to the use of private or entitlement based APIs. In particular, Article 7 of the regulation requires platforms to describe the differentiated treatments they give:

  1. Providers of online intermediation services shall include in their terms and conditions a description of any differentiated treatment which they give, or might give, in relation to goods or services offered to consumers through those online intermediation services by, on the one hand, either that provider itself or any business users which that provider controls and, on the other hand, other business users. That description shall refer to the main economic, commercial or legal considerations for such differentiated treatment.
  2. […]
  3. The descriptions referred to in paragraphs 1 and 2 shall cover in particular, where applicable, any differentiated treatment through specific measures taken by, or the behaviour of, the provider of online intermediation services or the provider of the online search engine relating to any of the following: […]
(d)access to, conditions for, or any direct or indirect remuneration charged for the use of services or functionalities, or technical interfaces, that are relevant to the business user or the corporate website user and that are directly connected or ancillary to utilising the online intermediation services or online search engines concerned.

P2B’s article 7 could apply to APIs; if these APIs were used for privileged treatment, they could in some cases be subject to more transparency.

However, in as we will see below, this provision may only apply in a few cases.

The UBER case

UBER’s iOS app was given a privileged treatment: in 2017, it was discovered that Apple had granted UBER an entitlement for a private API allowing the app to read the screen content [7]. This exemption shows that it is possible for Apple to selectively offer privileges to certain apps without modifying its OS.

Apple did not particularly explain this decision and if it happened again today it could look like a textbook case for P2B, but surprisingly P2B could not apply. Indeed, P2B only applies to differentiated treatment that benefits the platform operator (either the provider itself or any user company controlled by that provider). Since Uber is not controlled by Apple, this privilege would not have to be justified according to P2B.

The Apple Pay investigation

Last June, DG-Competition opened two investigations about potential abuse of a dominant position by Apple [8]. One of them concerned Apple’s refusal to open the “tap and go” functionality to Apple Pay’s competitors. Indeed, only Apple Pay can benefit from this functionality and competitors consider themselves unfairly disadvantaged.

Unlike Uber, the service put forward (i.e. Apple Pay) is controlled by the company that deploys the online intermediation service. However, P2B only covers processing in relation to goods or services offered to consumers through these online intermediation services, and Apple Pay being integrated into iOS and is not accessible from the App Store.

In this context, it is not clear whether the P2B Regulation can apply.

Using Bluetooth in the background

Contact tracing applications that wished to use Bluetooth to detect proximity have found themselves limited on iOS due to the fact that applications can hardly use Bluetooth when they are in the background. States found themselves forced to make a choice: develop an application that would work less well on iPhone or use the API common to Apple and Google even if it did not correspond to their needs [9] and is only supported with the latest version of iOS (thus excluding 10% of iPhones users [10]).

For many, Apple’s motivation was to protect users’ privacy, but there is no evidence that this objective weighed well in Apple’s refusal. On the one hand, applications that tried to circumvent Apple’s restriction (and were accepted on the App Store) either undermined the security of smartphones by prompting users to leave them unlocked [11] or were more intrusive by using geolocation [12]. On the other hand, all official Apple documentation indicates that the restrictions were motivated by the desire to reduce the consumption of resources (battery [13] and memory [14]) of iPhones.

As of now, Apple’s design motivation remains unknown since the company did not communicate on the subject and therefore never justified its choice. Hence applying P2B regulations could be relevant.

Indeed, Apple allows access to several pre-installed applications from the app store. Even if the applications cannot be installed from the app store, they are nevertheless visible and compete with the services offered by third party developpers [15]. Moreover, some of these applications use Bluetooth even when running in background. This is notably the case of the “Find My” application which uses BLE in the background to allow iPhones to be found [16]. In these circumstances, Apple may have to justify restricting access to the Bluetooth functionality in the background.

A need for more transparency

The debate on contact tracing apps clearly demonstrates the value of transparency: justifying the architectural choices of the OSs. Indeed, Apple did not justify its choice to refuse to open access to BLE to contact tracing applications. This silence has not been criticized since the company has been set up as a white knight protecting privacy against the states. But in the absence of explanation, this posture is questionable. If it turns out that made that Apple’s design is justified by iPhones reserving or is not even justified at all, the debate would not be the same.

Featured image by Jay Goldman
https://creativecommons.org/licenses/by-nc-sa/2.0/

[1] https://eur-lex.europa.eu/legal-content/FR/TXT/HTML/?uri=CELEX:32019R1150&from=EN

[2] « Platform-to-Business » : un premier pas vers les terminaux ouverts dans le règlement européen ! https://www.arcep.fr/larcep/pendant-ce-temps-a-bruxelles.html#c18616

[3] « Apple frees a few private APIs, makes them public » https://www.theregister.com/2017/06/13/apple_inches_toward_openness/

[4] « Apple bans over 250 apps that secretly accessed users’ personal info » https://www.theverge.com/2015/10/19/9567447/apple-banned-apps-youmi-privacy-personal-data

[5] See section 2.1.5 on software requirements https://developer.apple.com/app-store/review/guidelines/#software-requirements

[6] See section 2.3 of ,«iRiS: Vetting Private API Abuse in iOS Applications » par Zhui Deng, Brendan Saltaformaggio, Xiangyu Zhang, Dongyan Xu

[7] « Uber App Has Access to Special iPhone Functions That Can Record Your Screen » https://www.inc.com/business-insider/uber-apple-iphone-features-app-software.html

[8] DG-Comp opened an investigation on access restrictions to NFC features: https://ec.europa.eu/commission/presscorner/detail/en/ip_20_1075

[9] « Two reasons why Singapore is sticking with TraceTogether’s protocol » https://www.tech.gov.sg/media/technews/two-reasons-why-singapore-sticking-with-tracetogether-protocol

[10] Apple’s Exposure Notification API requires iOS 13 whereas the French (StopCovid) and the Singaporean (TraceTogether) apps respectively works with iOS 11 and iOS 10 https://gs.statcounter.com/ios-version-market-share/mobile-tablet/france

[11] « iOS a ‘major hurdle’ to contact tracing app » https://www.innovationaus.com/ios-a-major-hurdle-to-contact-tracing-app/

[12] https://twitter.com/je5perl/status/1248230776287776769

[13] « Because performing many Bluetooth-related tasks require the active use of an iOS device’s onboard radio—and, in turn, radio usage has an adverse effect on an iOS device’s battery life—try to minimize the amount of work you do in the background. » https://developer.apple.com/library/archive/documentation/NetworkingInternetWeb/Conceptual/CoreBluetooth_concepts/CoreBluetoothBackgroundProcessingForIOSApps/PerformingTasksWhileYourAppIsInTheBackground.html

[14] « WWDC 2017: Because while your application is running in the background, the system may terminate it if it needs to free up more memory for the foreground application. » (https://asciiwwdc.com/2013/sessions/703 )

[15] « Apple Dominates App Store Search Results, Thwarting Competitors » https://www.wsj.com/articles/apple-dominates-app-store-search-results-thwarting-competitors-11563897221

[16] « The Clever Cryptography Behind Apple’s ‘Find My’ Feature » https://www.wired.com/story/apple-find-my-cryptography-bluetooth /

Who pay for my data processing?

Last November I submitted a data access request to Google. One month later (which is the maximum delay to provide a first answer), Google informed me that my request would take some time process. Two months later Google offered another answer … and told me that they won’t be able to respond to my request.

The main objective of my data request, which is available bellow, is to obtain information about how I’m targeted on Google services. Ideally, I’d like to know which advertisers are using Google services to target me, how they are targeting me and eventually how much it costs them. Indeed, I’d like to put some “pressure” on the advertiser that rely on platforms which are not compliant with EU regulation. The best way to do it is to stop buying from these companies, hence the need to list them.

The reason I’m targeting advertisers is because they are the only stakeholders which can trigger immediate reaction from platforms like Google and Facebook. At the end of the day, they are the one paying the platforms  and I wanted to know who’s paying for the processing of my data.

Three months to get a PDF

My first observation is about when Google decided to respond: they always use the full extent of the delay allowed by the GDPR (see article 12-3).  While I could have understood that Google waited three months to process my request and retrieved the data, in they came back almost empty handed.

The only element that Google provided me is the list of advertisers that targeted me through “Customer Match” (i.e. the advertiser that uploaded my contact info to target me on Google). As far as I know, there is no simple way to get this data on Google, and indeed Google did not provide my a nice json file as it does on Takeout, but a pdf listing six advertisers. Funny enough, the document is marked “Google Confidential and Proprietary ».

When it comes to ad transparency, Google fares poorly compared to Facebook. The ad section of Facebook tells you a lot about the ads you’ve seen and how you’ve been targeted. The information may not be exhaustive as some ads are still missing, but at least, it’s quite simple to know who’ve targeted you using your contact info whereas Google just oppose a pdf file without much detail.

Nothing on the “Reminder ads”

Google does not offer any answer the the rest of my request. According to the response, the list of advertisers that are retargeting me on Google service is available on the “Ads Settings” page. I see no mention of such retargeting on this page. Actually, the page only contains a list of interests, almostall of them have been inferred from my use of Google services. The remaining of the answer explains me that it “does not maintain a database of every advertisement seen by a specific user. Account holders can visit My Activity to see the websites and apps they’ve visited on which Google ads were shown”.

So Google does not say that they don’t have the data I asked, they simply argue that it’s not sitting in a database. I guess, they know for each ad if I have seen it or not but it would be very resource consuming to go through all ads and check if my “cookie” saw the ad. While I understand that they did not search exhaustively which ads I have seen, they could have answered that sooner, it was not necessary to exhaust the legal delay.

Any idea?

Google did not provide me the data I was looking for but there could be other way to obtain the list of advertisers targeting me. Technically, it is possible to write a browser extension that would collect all the ads that I “see” on Google and would compile a nice page (like AdAnalyst for Facebook).

I will also request data more accurately. For instance, I could ask for each service separately, which advertiser targeted me. I’ll also ask what are the cookie IDs that were associated to my account and then ask which advertiser targeted these cookies.

Image de couverture par Jay Goldman
https://creativecommons.org/licenses/by-nc-sa/2.0/

More on Chrome updates and headers

I’m not the only one who has been unpleasantly surprised by the way Chrome now handles logins on Google services (more on Techmeme). This new feature was unexpected, it was also not announced in Google post about the Chrome update, there is no simple opt-out, it makes the Chrome Privacy Policy outdated and it confusingly as creates different user experiences on Android and on desktop. Indeed, for your browsing activity to be linked to your Google account you must sign-in on the browser and enables browser synchronization. On desktop, signing-in to the browser is almost mandatory but synchronization is off by default, on Android sign-in is off by default, but as soon as you sign-in synchronization is enabled. This are about to get even more complicated as Google introduce a new features that sends data to Google even when synchronization is off.

The only document which announced all changes is the Chrome privacy whitepaper. I doubt many people read it, but it’s a treasure trove of info about Chrome features and interactions with Google services.

Google service integration

As, the Verge already explained, Chrome is turning into the new IE6 (in case you wonder, that’s not a compliment). Not only is Google making some services running faster on Chrome, the browser also sends information only to Google. Indeed, Google developed several headers that are only sent to Google to make the integration with the services even smoother.

You can see headers on Chrome using the developper tool. To see all headers, you’ll have to “Copy all headers” as HAR and past the result in a text editor (thanks to Gunes Acar for the tip), otherwise you’ll see only provisional header.

The x-client-data header

This is probably the most problematic header and I did not see it mention anywhere else than in the whitepaper. Most users are not aware of it but this is header sent with every request sent Google services (and only Google services) to do A/B testing. Google services include most Google domains, including Doubleclick. Even when Google is a third party, the header is sent. Because it’s a header and not a cookie, it is sent even when you block cookies. The only way to not send it is to turn-on “Incognito Mode”.

If you have disabled usage statistics and crash reports, this header might not be used to track you without additional info. But the whitepaper is unclear about how this header is updated to describe “server-side experiments“ so I wouldn’t bet on it.

So not only this header may have some privacy implications, it makes the browser not neutral as it gives more data to Google services.

See https://www.google.com/chrome/privacy/whitepaper.html#variations for more details

Consistency/Connected headers

As mention in the chrome whitepaper, this header is sent to some Google servers. Most of the time, this header does not contain any identifier and is a low entropy cookie.

Only from time to time it includes your Google account number. Looking at Chromium source we get more details about how this header is used: it is used to redirect some action to the browsers UI instead of the content.

It may not have privacy implications yet, but it means that Chrome will benefit from integration with Google services that other services could not have.

New settings

The latest update mentions two features that are only tested on a subset of users. These features are more or less on opt-in basis (see bellow) and send data to Google when you’re signed-in, even when synchronization is off. These data are used to make personalized suggestions and improve the “general browsing experience”.

 

Actually, no need to be in the test group. You can use this feature on Chrome Canary. Once you’ve installed the browser, just sign-in and you’ll see this “opt-in” dialogue. If you accept or click on “Settings”, you’ll have access to the “Sync and Google services” section in the settings page.

Notice that if you click on “Settings”, all settings will be on by default, including sync.

Conclusion

By “forcing” users to sign-in to Chrome and by using custom headers, Google is less and less dependant of third party cookies. I would not be surprised if Chrome started to block third party cookies. Actually this may be in Google financial interest to do that.

Finding a balance between access to info and privacy

Disclaimer: Until September 2017 I was a Technologist at  (CNIL) the French DPA working on the technical aspects of issues like the right to be forgotten.

On June 28, a decision of European Court of Human Rights reanimated the debate about the Right To Be Forgotten. The court rejected the request to delete some content on a website, considering that the right to access to information prevailed. The court made an interesting distinction with the Right To Be Forgotten applied to search engines. Search engines and publishers have different purpose so the ECtHR refers to ECJ decision for the search engines.

Because the decision on the Right To Be Forgotten is sometimes misunderstood, I try to explain why, in my opinion, the Right to be forgotten as proposed by ECJ should not interfere with access to information.

 

The right to be forgotten and public’s right to access lawful information

 

In its decision about the right to be forgotten, the ECJ anticipated the risk that this right could limit access to information and asked for a balance between the personal right to privacy and the public right to access information. In ECJ words, right to be forgotten should not apply if it appeared, for particular reasons, such as the role played by the data subject in public life, that the interference with his fundamental rights is justified by the preponderant interest of the general public in having, on account of its inclusion in the list of results, access to the information in question.”

 

Sure, finding the balance is not always easy, but CNIL and Google tend to agree on cases where the right to be forgotten should not apply. As a matter of fact, in the first case that Google mentions in its post, Google is not the defendant: CNIL is. Indeed, the case were rejected by Google and the concerned persons sent a complaint to CNIL, but CNIL agreed with Google that this was going against interest of the general public in having access to information.

So while many fear that an extended right to be delisted will impede on access to information, this fear is not founded. The ECJ decision clearly strikes a balance between the right to be delisted and the right to access information. The objective of the decision is not to force information to be deleted, it is to prevent unwanted – and often out of context – search results from popping up when you Google someone.  The results are still available on Google and will appear as usual if you search anything but the name for which the results have been delisted. As a matter of fact, the information is not censored, it remains available on the website publishing it but it won’t appear out of the blue when you’re just googling someone’s name. This is somehow coherent with ECtHR decision: publishers are fully covered by freedom of expression but search engines main goal is not to publish information but to compile information about someone (paraphrasing p.97 of ECtHR decision).

The case ECtHR had to judge is an edge case and it seems that when it comes to RTBF applied to search engines, most cases are way more simple. If you want to understand the reality of what is at stake here, have a look at data provided by Google: the top 10 websites for which Google delisted results are social networks. Most of the complaints concern people asking to have comments, pictures and posts removed from social networks. In some cases the content is hosted by Google itself (e.g. Google Plus, YouTube) so individuals have no other remedy than asking Google to delist its own content when it’s inappropriate. Would platforms prefer to directly suppress such content?

Most impacted websites according to Google.

 

Google shall no longer be the only place where we look for information

 

Google no longer oppose the Right to Privacy to Freedom of speech (as it used to) but to the public right to access information. Despite the goal of Google to “organize the world information”, Google shall no longer be the only place where you look for information. Search engines algorithms are not meant to find true information, but to find information provided by authoritative sources. That’s why algorithm ranking failures have been observed repeatedly over a year.

Wall Street Journal’s Jack Nicas listed some failures of the featured snippets which are supposed to be the most trusted results provided by Google. Just a year ago, Google was displeased by top results provided by its algorithm and had to deploy patches hastly . While the company should be lauded for its reactivity, it should be noted that this reaction has been triggered by journalists discussing highly visible problems. Obviously, some reaction had to be triggered, but addressing only the cases reported by the press leaves the main problem pending. In fact, Google picks and chooses the search results that it believes are wrong and thus Google no longer consider the page authority to decide what are the most relevant results, it relies on media attention.  It’s not the first time that Google had to update its algorithm hastily. In 2013, reporters highlighted that mughshot websites were highly ranked by Google and were doing some extorting ex-convicted: individuals had to pay for their records not to appear on top of Google search results. A right to be delisted would clearly have had an interest there, but Google shortcut the regulators and updated its algorithm to « fix » the problem shortly after the NYT reported the matter. Last November, Eric Schmidt declared that Google will “derank” Sputnik and Russia today (two Russian websites).  Another patch developed promptly to postpone problem and hide it in the “next” result pages. Individually, these decisions seem laudable. However the big picture shows Google judging what is right and what is wrong and hide the « wrong » results  by ranking them in such a way that it cannot be discovered without clicking next a couple of additional times.

 

Google is not the Internet (and it’s not the Web either)

If you worried that article with very low ranks will not be seen you’re right, Google never returns more than 1000 search results per query. Practically, if people rarely see the 11th result they never see the 1001th: results following the 1000 results are as good as delisted. So among the millions of results that Google found (and you can see the exact number just below the search bar), Google will only show you 1000. Therefore it’s not because information is not available on Google that you should not keep searching for it.

Google shows only the top 1000 search results ranked by their algorithm.

Google is not the internet. The vast majority of internet websites are hosted by and operated through service providers other than Google. The entities with the technical ability to remove websites or content from the internet altogether are the websites’ owners, operators, registrars, and hosts—not Google. Removing a website link from the Google search index neither prevents public access to the website, nor removes the website from the internet at large. Even if a website link does not show up in Google’s search results, anyone can still access a live website via other means, including by entering the website’s address in a web browser, finding the website through other search engines (such as Bing or Yahoo), or clicking on a link contained on a website (e.g., CNN.com), or in an email, social media post, or electronic advertisement. <- These are not my words, they are the words of Google’s lawyer in the Equustek case (see https://assets.documentcloud.org/documents/3900043/Google-v-Equustek-Complaint.pdf 15 & 16).

The point is that there are other sources of information: other search engines either concurrent or just integrated on news website. If you’re really looking for public information about someone, you should search his name on a newspaper website or on Wikipedia. A Google search may not return the information you’re looking for and is likely to return out of context information.

Wanted: A right to be forgotten

By merely patching PageRank issues, Google postponed the big debate that the ECJ is now forcing it to have. Many individuals have problems with search results about them appearing on Google but cannot expect a miraculous algorithm update to solve it. These people have to rely on the « right to be delisted» to remove offensive, defamatory our outdated and often untrue search results. The right to be delisted decision highlights the need for a balance between right to simply discover information and privacy, while not questioning the access to information on the web. Beyond the big cases reported by journalist and civic society, many individuals have more private problems with defamatory search result page, revenge porn, old comment that they wrote, pictures they should not have shared. There are not enough journalist and civic society associations to cover all these cases and most of them are not willing to have their complaints exposed by journalists.

@vtoubiana

 

Credits: Featured Image  is “Outside the European Court of Justice” by katarina_dzurekova (Creative Commons)

How Google is tracking Safari users on third party sites

A couple of weeks ago, google started to stop redirected users from Google.com to localized versions of the search engine. This rather innocuous change is likely to have effect on the way safari anti tracking protection copes with Google cookies. Indeed, Safari now deletes cookies of sites you have not interacted with over the last 24 hours [1]. If you type Google.com and then are redirected to google.fr, you actually don’t interact with google.com.  So Safari does not give Google a 24h permission to track you on other domains of the search engine.That won’t happen if Google stop redirecting users and just let them on google.com where they will interact with the search bar and other elements.

Why this is ironic

This is not the first change Google made to the way they handle localized domains. Google made an initial announcement, a couple of months ago, to tell users that localized domains would not be that relevant anymore [2]. Now Google seems to be deprecating local URLs to put everything under the google.com domain. This is quite ironic, because up to last year, they were arguing in a French Right to be forgotten case that google.fr and google.com were providing significantly different services [3]. The French court did not follow the reasoning. At that time Google was arguing that only 3% of European were searching on google.com while the remaining 97% were using localized version [4]. Today it seems that it’s actually 60-40 [5].

How Google is tracking Safari

Previously, Google advertising cookies on third party websites were only set from doubleclick.net. Users hardly interact with doubleclick.net so it was likely that Safari would block doubleclick cookies. Since September, Google has also started to set an advertising cookie from its Google.com domain. You can track the change of Google cookies explanation page via webarchive, and it seems that Google added the description of the ANID cookie in September [6]. However, you may not have noticed this new cookie in your browser. Indeed, I did a couple of tests to see this cookie but, oddly enough, Google is only setting this cookie on Safari browser. I did a couple of tests using Chrome and Edge developper tools, to emulate different mobile browsers: all iOS devices had the ANID cookie set, none of the other device did receive the ANID cookie.  Hence, Google is giving a special treat to Safari users, similarly to what Google did in 2012 in bypassing Safari tracking protection.

This may help other advertisers to track you

That being said, Google is only following Facebook here. Both are big first parties that – unlike mostly third party websites – were expected not to be impacted by Safari’s measure. It does not seem illegitimate for Google to take the same stance than its major competitor in the online advertising market. Yet, the ramifications of Google moves are more critical.

Unlike Facebook, Google is a major Web ad-exchange platform. It means that Google is hosting auctions where buyers get an opportunity to show an ad on your browser and to synchronize their cookies with those of Google. So if Google is in capicity to bypass Safari tracking protection and to keep cookies on user’s browsers, it’s likely that it will also benefit to all the ad-auction participants. Through Google cookies, third parties will be in capacity to recognize users even though their own cookies have been deleted by Safari. The stability of the Google cookie will technically allow third parties to track browsers over more than 24 hours.

In some sense, this is worse than before, when Safari was blocking all third party cookies and when Google was only serving ads and hosting auctions from the doubleclick.net domain. If Google is in capacity to leverage this advantage, it could be a significant blow to the competing ad-exchange marketplaces who are not in capacity to track Safari users over more than 24 hours.

 

Vincent Toubiana (@vtoubiana)

[1] “Intelligent Tracking Prevention”, https://webkit.org/blog/7675/intelligent-tracking-prevention/

[2] “Making search results more local and relevant” https://www.blog.google/products/search/making-search-results-more-local-and-relevant/

[3] “Au Conseil d’État, la portée territoriale du droit à l’oubli sur Google”, https://www.nextinpact.com/news/104691-au-conseil-detat-portee-territoriale-droit-a-loubli-sur-google.htm

[4]”Google says non to French demand to expand right to be forgotten worldwide “, https://www.theguardian.com/technology/2015/jul/30/google-rejects-france-expand-right-to-be-forgotten-worldwide

[5] “The end of google.{your country}?” ,https://whotracks.me/blog/google_domains.html

[6] “Types of cookies used by Google”, https://web.archive.org/web/20170909225414/http://www.google.com/policies/technologies/types/

[7] “Google Busted With Hand in Safari-Browser Cookie Jar”,  https://www.wired.com/2012/02/google-safari-browser-cookie/

Cross checking IAB’s numbers

It looks like on Mobile the numbers  IAB’s reports are based on mostly reflect the dynamic of Google and Facebook advertising revenues, not those of average App developpers.

 

Like many people interested by privacy, I keep an eye on #ePrivacy on Twitter. This hashtag, is used by both people advocating for more privacy (consent as the main legal basis, stronger default settings, no cookie walls) and those pushing for legitimate interest, cookie walls, no default settings,…. Both sides have arguments and I’m obviously more receptive to the former group of people but I try to look at the argument of the other side as well. During the last few weeks, IAB Europe representatives quoted several reports that, according to them, show the importance of the ‘data-driven’ advertising in Europe  (I put ‘data-driven’ into quotes because this refers in fact to almost any type of advertising)

I wanted to analyze the number behind the reports, especially behind the report called “The economics of contribution of digital advertising in Europe”. I tried couple of times to get more info about some of these numbers but did not get a response so far. So I had a closer look to the document. It is an update of a previous 2015 IHS report . Interestingly, in this report the definition of publisher is very broad:

”Publishers, or media outlets, who serve audiences by providing them with content, and serve advertisers by supplying them with audiences. In this report, ‘publisher’ includes any type of company where audience attention and content meet.”

So publisher includes search engines (Google , Yahoo, Bing) and social networks (Facebook, Twitter and Linked). It means that the figures showing the publisher revenues are likely to include the revenues of this company which are largely higher than average publisher revenues.

The mobile ecosystem

While the figures are not false, these figures will certainly not correspond to the revenues of average European publishers or app developper. For instance when reading the figure on mobile advertising revenue, the reader may not be aware that Facebook advertising in Europe is 5,4 €bn (see 2016 Facebook earnings) and that during the 4th quarter of 2016, 84% of its revenues comes from mobile. Meaning that Facebook mobile advertising revenue in Europe is about 4.5 €bn, more than a third of the mobile advertising revenue estimated by IAB.

Facebook advertising revenues (source: https://s21.q4cdn.com/399680738/files/doc_presentations/FB-Q4%2716-Earnings-Slides.pdf)

 

Google’s share is even bigger. So it is likely that if your remove Facebook and Google revenues from the figure, you will be left with at most a third of the 11 €bn made in 2016. And so  mobile advertising will not be the main revenue on mobile. As far as I know, there is no such dominant players in the app ecosystem, so app revenues made of paid apps and in-app purchase will be more equally spread.

What’s behind these numbers

I should add that it’s not simple to know what these numbers reflect from an app developer point of view as it just compares the global revenue without show the cut that the developer receive. Considering the number of intermediates, the global market revenue tells little about the revenue received by the developer.

Finally, it’s hard to get more details on these number and to know exactly what they reflect as there several revenue sources for app developpers. For instance a 2014 GigaOM report shows that added In-App purchase and Paid App revenues make up 4,5 €bn  when the IAB report shows that, during the same year, these revenue sources were neck and neck with mobile advertising.

Revenues charts from different sources shows significantly different revenues in 2013

Final words

These reports are quite opaque so it’s complicated to know where these numbers come from. I’m not arguing that IAB Europe numbers are wrong but they may be misunderstood and as probably do not reflect the reality of average app developers who  probably get more revenues from paid-app and in-app purchases than from advertising.

The missing clauses in Google’s “Customer Match”

In September Google announced “Customer Match”, a new tool for advertisers to target their existing customer using their email addresses. “Customer match” is almost like Facebook’s “Custom Audiences” but Google and Facebook seem engaged in “a privacy race to the bottom” and Google may have taken the lead.

Targeting email addresses

Advertisers aim at targeting prospects and existing customers. While remarketing offers them the opportunity to target potential buyer, advertisers were so far not able to differentiate between their existing  customers and new prospects. They also lacked the possibility to target their “loyal” clients (i.e. those who have subscribed to a loyalty card) because there is no link between the cookie IDs assigned to their browser by ad-networks and their loyalty card number or even their online customer account. “Custom Audience” and “Customer Match” (thereafter “Customer targeting”) create a bridge between the email addresses  used to create a “Best Buy” account or CVS loyalty card and Google and Facebook accounts.

Via “Customer targeting”, advertiser will be able to pull the information they gathered about your shopping habits and leverage it to target you on Facebook and on Google. The advertiser won’t send directly ads that they want to show you and attach it to your address. Instead they will create group of “audiences” by creating groups of email addresses of  their customers. They will send those hashed email addresses to Facebook (or Google) which will check to see if those hashed email addresses match those of registered users.

Technically, Facebook does not see the email address but just the hash. So if you’re not in their user database they will not be able to know that you’re a “Best Buy” customer. That being said, the technical guarantee may not be sufficient considering the computational  resources of giants like Google and Facebook that could generate many hashes to brute force the hashed email address and retrieve the lists of customers. In fact, in another context Google seems to admit this and required that Google Analytics user don’t send hashed identifier like email addresses or phone numbers.

Therefore the only guarantees are contractual; they are the engagements that Google and Facebook take when they receive email addresses (or phone numbers). Facebook and Google are committed to not retrieving the email addresses of people that are not registered to their services. Similarly their contractual clauses prevents them from keeping those lists of hashed identifiers for more than a week (that would remain largely enough for them to break most of them).

Facebook ToS

Facebook Terms of Service are quite constraining for Facebook itself as they more or less prohibit Facebook from doing anything with the hashed email addresses other than using them to help an advertiser reach its audience. Therefore, Facebook cannot add information to the profile of its users. In fact Facebook specifically forbid appending “Custom Audience” data to users’ profiles. Furthermore, Facebook won’t let an advertiser target the audience of another advertiser. For instance “Target” should not be capable to target “Best Buy” customers. Facebook adopts a data processor position with respect to Custom Audience, the advertiser being the data controller.

Except of "Custom Audience ToS" https://www.facebook.com/ads/manage/customaudiences/tos.php

Except of “Custom Audience ToS” https://www.facebook.com/ads/manage/customaudiences/tos.php

 

Google Customer Match

Google took another approach with its service. Google did not include clauses to prevent them from appending “Customer Match” data to user’s profiles. The restrictions only impact the list of email addresses , but there is no restrinction on the use of the list of matched profiles which can therefore be used by Google.

Customer Match conditions from https://support.google.com/adwords/answer/6276125

“Customer Match” conditions from https://support.google.com/adwords/answer/6276125

 

In fact, Google implicitly admitted that these data will be appended to user profiles when it modified its Privacy Policy in August to include data obtained from partners in Google Accounts data.While the change remained unnoticed then, it became clearly more critical after “Customer Match” was announced.

Change made to google's privacy policy on August 19th

Change made to Google’s privacy policy on August 19th

 

Consequences of Google posture

Google’s decision to include “Customer Match” data in its user accounts will impact user’s privacy and also advertiser’s competition.

  • Since the data will be included in the account, it means that Google will have a more comprehensive view of its users which is a big step to merge offline and online data (also known as data onboarding). This may have significant negative impact as it puts Google at the center of all these data-flows… until Facebook announces its riposte.
  • On the up-side, this could be beneficial for transparency because users could be made aware of the advertisers targeting them if Google shows these data on privacy dashboards (that’s a big if).
  • However, because Google is a data controller with respect to ‘Customer Match”, advertisers may be reluctant to share information about their customers knowing that it could potentially be re-used by competitors or by Google itself. Not only Google could share these data with other advertisers, thus allowing competitors to target each others audience to stir-up the demand and thus the price, but Google could also be tempted to use the data for its direct benefit.

Acknowledgement

Thanks to Armand Heslot for providing feedback on a draft.

Implementing cookie consent with “Content Security Policy”

In this post I briefly explain how “Content Security Policy” could be used to enforce the cookie consent regulation by blocking third parties content.

Cookie Consent

EU cookie regulation imposes to website editors to obtain the informed  consent of visitors before setting cookies. Therefore, a website should first check that it has consent before dropping its own (unnecessary) cookies and so should the third parties that are called by the website. While it’s fairly simple for a first party to check that it has obtained consent (e.g. storing the consent in a cookie) third parties are in a different situation because they cannot read the first party cookies to deduce if consent has been granted.
Therefore the responsibility sort of shift to the first parties which are in a position to inform and obtain consent; yet they have to prevent third parties from setting cookies as long as consent has not been obtained. Tag managers are elegant solutions that can be deployed to do that. A less elegant solution is to rely on “Content Security Policy” to prevent external resources setting cookies from being loaded by the browser.

Overview of Content Security Policy

A “Content Security Policy” is a “declarative policy that lets the authors (or server administrators) of a web application inform the client about the sources from which the application expects to load resources”. This policy can be viewed as a white list of resources that can be loaded by the browser when requesting page. Content security policies can be conveyed in two forms, either through an HTTP header or in a http-equiv meta tag in the header of the HTML document .

Implementation of the policy

To comply with the cookie consent regulation, a website may simply use a “Content Security policy” to block any third party from loading content and subsequently setting cookies as long as consent has not been granted. Notice that this solution is not specific to cookies; it prevent all types of resources from being loaded thus effectively preventing all types of fingerprinting by third parties.
A quick “cookie consent” implementation is to set to check when a GET request is received if the “consent” cookie is set and to adapt the “Content-Security-Policy”: if consent has not been granted block all third parties, otherwise set the usual policy.
The easier implementation is to use JavaScript to insert the http-equiv tag in the HTML (although this is not recommended), website editors can just add the following JavaScript tag in their pages and that should do the trick:

if ( document.cookie.indexOf('hasConsent') < 0 ) {
 var hostname = window.location.hostname;
 if (hostname.indexOf("www.") ===0) hostname = hostname.substring(5);
 var meta = document.createElement('meta');
 meta.httpEquiv = "content-security-policy";
 meta.content = "script-src 'self' 'unsafe-inline' *." + hostname + "; img-src *." + hostname + "";
 document.getElementsByTagName('head')[0].appendChild(meta);
 }

A slightly more complicated solution is to set the HTTP header. The code is fairly similar but the complexity depends of the type of server you’re running. If you’re using PHP, you could do it like that:

if(!isset($_COOKIE["hasConsent"])) {
 $allowed_hosts = "*.unsearcher.org";
 header("Content-Security-Policy: script-src 'self' 'unsafe-inline' " . $allowed_hosts . "; img-src 'self' " . $allowed_hosts);
 }

Browser Support

Content security policy is still a working draft document so the feature is not supported equally by all browsers. As far as I can tell, Chrome and Safari implement all the required features, including support of the http-equiv tag. Firefox enforces policies that are set through the header but there is currently no support of the http-equiv tag. Finally Internet Explorer offers only very limited support of CSPs through  iframe sandbox property.

Testing

I’ve used BURP proxy to preview what websites would look like with content security policies blocking third parties, here are the results :

 

Content-Security-Policy: script-src 'self' 'unsafe-inline' *.lemonde.fr *.lemde.fr; img-src *.lemonde.fr *.lemde.fr

Content-Security-Policy: script-src ‘self’ ‘unsafe-inline’ *.lemonde.fr *.lemde.fr; img-src *.lemonde.fr *.lemde.fr

Content-Security-Policy: script-src 'self' 'unsafe-inline' *.nytimes.com *.nyt.com; img-src *.nytimes.com *.nyt.com

Content-Security-Policy: script-src ‘self’ ‘unsafe-inline’ *.nytimes.com *.nyt.com; img-src *.nytimes.com *.nyt.com

Content-Security-Policy: script-src 'self' 'unsafe-inline' *.slashdot.org  *.fsdn.com; img-src *.slashdot.org  *.fsdn.com

Content-Security-Policy: script-src ‘self’ ‘unsafe-inline’ *.slashdot.org *.fsdn.com; img-src *.slashdot.org *.fsdn.com

 

Conclusion

This solution is far from perfect, the main reason is that it is not supported by all browsers. Yet it provides a simple solution for website editors to block third party resources until  consent is obtained. Such solution is complementary to the tags provided on CNIL’s website that can be used to obtain consent before setting Google Analytic first party cookies.