Today’s web advertising ecosystem heavily relies on continuous data collection and tracking that allows advertising companies, as well as data brokers to continuously profit from collecting a vast amount of data associated to the users. In May 2018, the General Data Protection Regulation (GDPR)  changed the rules on consent, shaking the tracking and advertisement industry in its practices. The ePrivacy Directive, amended in 2009 (ePD, also known as “cookie law”)111The upgrade of the ePD into a regulation is currently under discussion.  made it mandatory to collect user’s consent before any access or storage of non-mandatory data (not strictly necessary for the service requested by the user) In case of websites, the consent is usually presented in the form of cookie banners, or cookie notices that inform the user of data collection and should provide a meaningful choice on whether to accept or reject such collection. The website visitors in the European Union observe such banners on many websites they visit today.
Various research studies looked into detection and measurement of web tracking technologies that perform silent data collection without user’s explicit consent [42, 38, 17, 1, 37, 34, 40]. Several some recent works [35, 8, 48, 44] have been measuring the impact of GDPR on the web tracking and advertising ecosystem. Researchers  observed a 22% drop in the amount of third-party cookies before and after the GDPR, but only a 2% drop in third-party content. Other works recently measured the prevalence of cookie banners and showed that the amount of banners increased over time  after the GDPR. Legal scholars and authorities and computer science researchers independently noticed that some banners do not allow users to refuse data collection, and raised this in various studies [8, 33, 2, 50]. Several recent works [47, 48, 44] measured the impact of choices set in cookie banners on tracking: upon accepting and rejecting the consent proposed in a cookie banner, researchers evaluated the number of cookies set in the browser and the number of third party tracking requests across websites. Latest work  evaluated whether the design of cookie banners made an impact on how users would interact with them.
Although many research efforts took place after the GDPR to detect and analyze cookie banners and their impact on tracking technologies and on the users, no study has analyzed what actually happens behind the user interface of cookie banners yet. It is unclear how to meaningfully compare the interface of the banners shown to the users to the actual consent that banners store and transmit to the third parties present on the website. Our work is motivated by the following questions:
Do banners actually respect user’s choice made in the user interface? Do banners silently register a positive consent even if the user has not made their choice? Do they nudge the user to accept everything by pre-choosing a positive consent?
Answering such questions, ensuring a proper functionality and legal compliance of a cookie banner is usually left to be handled by the website publisher and is completely obscure for the website visitor.
In reaction to the GDPR, the European branch of the Interactive Advertising Bureau (IAB Europe), an advertising business organization, produced the Transparency and Consent Framework (TCF)  to structure the practices of actors of the tracking and advertisement industry regarding consent collection. Notably, they introduced the notion of Consent Management Providers (CMPs) – actors in charge of collecting consent from the end-user, and redistributing this consent to advertisers. Figure 1 shows a typical example of a cookie banner implemented by a CMP that allows the user to agree or disagree with five predefined purposes of data processing.
Contributions. Thanks to the open specification of the TCF, we perform the first systematic comparison of the consent chosen by the users and the consent stored by the CMPs, which is further transmitted to third-party advertisers present on a website. With our analysis of consent, we are able to measure both the GDPR and the ePD compliance of cookie banners implemented in the TCF. Our main contributions are:
We design an automatic method to detect the presence of a cookie banner developed by a Consent Management Providers (CMP) (Section IV-B). We automatically detect 1 426 websites with such banners.
We develop and use a methodology to intercept the consent stored in the browser (Section IV-C). By analyzing the content of consent, we bring transparency by assigning the responsibility for each consent on the companies behind CMPs and publishers.
By collaborating with legal scholars, we thoroughly analyze GDPR, ePrivacy directive and other legal texts to identify four legal violations specific to cookie banners: Consent stored before choice, No way to opt out, Pre-selected choices and Non-respect of choice (Section III).
We develop a method to evaluate regulatory compliance of websites (Section IV-E). We quantify the identified violations on 1 426 websites by automatic-, semi-automatic crawls and manual detection (Section VII-A). By analyzing cookie banners’ design on a subset of 560 websites (from countries whose language the authors speak), we find that 236 (47%) websites nudge the users towards acceptance by pre-selecting options, while 38 (7%) websites do not provide any means to refuse consent. By analyzing the consent stored in the browser, we automatically detect 175 out of 1 426 (12%) websites that store a positive consent before user has made any choice in the cookie banner, while 39 out of 560 (8%) websites store an all – accepting consent even if the user has explicitly opted out in the cookie banner interface. In total, we find at least one violation in 305 out of 560 websites (54%).
We measure the problem of escalation of shared consent between CMPs (Section VII-B). The TCF allows different CMPs and publishers to rely on each other’s consent, set in a shared cookie. We observe that 3 websites store a positive consent before user action in the shared cookie, while 20 websites store a positive consent in a shared cookie even if the user has explicitly opted out. Such invalid consent can be reused by any CMP and publisher and therefore escalates non-compliance to other websites.
We quantify third-party requests that transmit consent and that belong to known third-party tracking services (Section VIII). We observe that various third-parties receive consent with third-party requests, where the origin of consent does not necessarily match the CMP present on the website. Such consents are set before user action on 69 websites and despite user refusal on 38 websites. We observe that the number of third-party tracking requests increases both after positive and after negative consent.
To measure compliance, we have designed two tools. Cookinspect is a Selenium and Chromium-based crawler which automatically and semi-automatically visits websites, logs stored consent and intercepts transmission of consent to third parties. Cookie glasses  is a publicly available browser extension for Google Chrome and Firefox that allows users to detect a CMP that implements a TCF banner and see if their choice is correctly transmitted to advertisers by CMPs.
Ii IAB Europe’s Transparency and Consent Framework (TCF)
The third-party advertising and tracking ecosystem contains different actors. Publishers provide free websites to users and include third-party advertising content. Advertisers and trackers collect users’ data and display ads. Finally, users consume free content. With the arrival of the GDPR, it became evident that the different actors of this ecosystem were not equipped to properly collect and exchange user’s consent.
Figure 2 summarizes the updated interaction of CMPs with publishers, users and advertisers. To become a part of the TCF, each CMP and advertiser must register with IAB Europe. As a result, IAB Europe maintains: (1) a public list of CMPs  that participate in the frameworks (alongside identifiers, called CMP IDs), and (2) a Global Vendor List (GVL)  – a public list of registered advertisers (called “vendors”). As of October 25th 2019, there are 117 CMPs in the CMP list 550 advertisers in the GVL list. While registering in GVL, among other information, each advertiser must declare one or more of the five pre-defined purposes for which the data is collected and for which purposes consent will be used – these are the purposes the user usually sees in the interface of the cookie banner (see Figure 1). Table IX in Appendix A shows the full list and description of all predefined purposes.
Ii-1 Consent String
The TCF defines a standard format for consent , called consent string. This string contains advertisers the user consented their data to be sent to, the purposes of data processing the user consent to, and the CMP identifier, along with other information. This format is a slightly-modified version of base64 of an array of values. IAB provides a script to decode this format .
Figure 4 shows a decoding of the consent string “BOX5uluOX5uluCLAAAENB6-AAAAizAAA”, obtained on telerama.fr. The cmpId is an identifier of a CMP from IAB Europe CMP list  responsible for storing a consent string; allowedPurposeIds are the identifiers of the five pre-defined TCF purposes of data processing; and allowedVendorIds are the identifiers of advertisers from the GVL . Note that any third party can identify the CMP that registered the consent string by comparing the cmpId field of the consent string to the public list of CMPs .
Ii-2 Consent Storage
The TCF does not impose any particular mechanism for storing user’s consent in the browser. It only suggests that CMPs use a “first-party service-specific cookie”, without further details .
As one way to implement consent storage, the TCF proposes CMPs to store a consent string in a cookie, named “euconsent” in a subdomain of consensu.org owned by IAB (we call it “shared cookie” in the rest of the paper)333TCF mentions that when website-specific cookie and shared cookies are both defined, the website-specific cookie will be used.. Since each CMP registered in the TCF has access to its own subdomain (e.g. ad.consensu.org), it can host scripts in their subdomain to read and modify the shared cookie.
Ii-3 Consent Sharing
Once consent is stored in the user’s browser, any advertiser (or, more generally, any third party) present on a page can query a CMP to obtain the consent that was given by the user. In the TCF, consent sharing can be done via: standard APIs, a shared cookie, URL-based methods, and a non-standard method (safeFrames). We present each consent sharing mechanism in more details below. Figure 3 graphically presents how each method obtains a consent string.
Standard APIs. The TCF specifies APIs that each CMP must implement – such API allows any third party advertiser present on a publisher website to verify whether a CMP has already stored a consent on a given website. In particular, each CMP in the framework must implement:
an iframe named “__cmpLocator”, that iframes in a third-party position can communicate with using the postMessage API using a __cmpCall function.
Shared cookie. As explained above, CMPs can store consent in a shared cookie on the .consensu.org domain, that any other CMP can read.
Non-standard method: SafeFrames Finally, the TCF proposes an additional non-standard method to share consent: safeFrames. A SafeFrame  is an API-enabled iframe (implemented via specific first-party scripts) that controls the communication between the webpage content and third-party ads. SafeFrame proxy obtains consent by calling to the standard __cmp() function.
Iii GDPR and ePrivacy Violations
In our work, we focus on the European regulatory framework of data protection and privacy. In May 2018, the GDPR  clarified the rules regarding processing of personal data in any environment . Article 4(11) and Article 7 of GDPR have set precise requirements on valid user consent: it must be free, specific, informed, unambiguous, explicit, revocable, given prior to any data collection, and asked in a readable and accessible manner . The ePrivacy Directive (ePD, also known as “cookie law”)  “particularizes” the GDPR with respect to the processing of personal data in the electronic communication sector, such as websites. Due to the ePD [18, Article 5(3)], and according to the European Data Protection Board (EDPB) and Information Commissioner’s Office (ICO, the UK’s DPA), website publishers have to rely on user consent when they collect and process personal data using non-mandatory (non strictly necessary for the service requested by the user) cookies or other tracking technologies [12, 30].
As a result of a deep legal analysis of both the GDPR, ePD and corresponding legal texts and opinions of Data Protection Authorities, we identify four legal violations specific to cookie banners that implement IAB Europe TCF framework.
Consent stored before choice: The CMP stores a positive consent before the user has made their choice in the banner. Therefore, when advertisers request for consent, the CMP responds with the consent string even though the user has not clicked on a banner and has not made their choice.
This practice violates the requirement of prior consent which demands that website publishers need to request consent to users (1) prior to any processing activity of personal data , and (2) before loading tracking technologies according to Article 5(3) of ePD . This requirement is further imposed in the guidance from the EDPB , the ICO  and the CNIL (French Data Protection Authority) . Moreover, the TCF’s policy document explicitly states that “a CMP will generate consent signals only on the basis of a clear affirmative action taken by a user” .
This practice configures a violation of the requirement of unambiguous consent [46, Art.4(11)] stipulating that in order for the user consent to be valid, the user must give an “unambiguous indication” through a “clear and affirmative action” of his choice [18, Art. 5(3)]. Moreover, Recital 66 of the ePD is quite explicit while directing that “the methods of offering the right to refuse should be as user-friendly as possible”. In its guidelines, the EDPB  states that “consent mechanism should present the user with a real and meaningful choice regarding cookies on the entry page”, and that users “should have an opportunity to freely choose between the option to accept some or all cookies or to decline all or some cookies.” In this line, we posit that cookie banner design must offer users an option to either accept or refuse consent.
Pre-selected choices: The banner gives user a choice between one or more purposes or vendors, however some of the purposes or vendors are pre-selected: pre-ticked boxes or sliders set to “accept”.
Preselected choices consist in a direct violation of the requirement of unambiguous consent [46, Article 4(11)]. Recital 32 of GDPR reads further that consent given in the form of a preselected tick in a checkbox does not imply active behaviour of the user and that pre-ticked boxes do not constitute consent. The EDPB  indicates that pre-ticked boxes (or opt-out boxes) configure ambiguous behaviours and does not render a valid consent. The ICO guidance  and the CNIL  observe that “pre-ticked boxes or any equivalents, cannot be used for non-essential cookies”. Finally, the European Court of Justice of the EU judgment (the highest court in the EU) from October 2019  (also known as the “Planet49 GmbH” case), establishes definitely that the consent which a website user must give is not valid if it contains a pre-checked checkbox which the user must deselect to refuse their consent.
Non-respect of choice: The CMP stores a positive consent in the browser even though the user explicitly refused consent.
The goal of our study is to detect violations of the GDPR and the ePrivacy directive in cookie banners that implement IAB Europe’s Transparency & Consent Framework (TCF). We detect all violations defined in Section III on websites that originate from the European Union because a TCF banner is more likely to be observed on EU websites. We also test some other European and international TLDs.
In this section, we describe the website selection and data collection processes, followed by our methods to detect TCF banners and intercept the user consent string. We then explain how we detect GDPR and ePD violations with our methodology. To detect violations at scale, we have built a crawler, called Cookinspect, based on a Selenium-instrumented Chromium, that we use to perform large-scale automatic crawls and semi-automatic crawls (with certain human actions) to audit websites and detect violations at scale. We describe two measurement campaigns done with Cookinspect in Section V.
Iv-a Websites Selection and Reachability
We used Tranco to build lists . Tranco aggregates results from different lists over a month in order to overcome flaws inherent to these lists’ creation: instability, presence of unreachable domains, possible attacks to insert domains, etc555See Appendix C for the lists and options we used.. From the top 1 million list of Tranco of September 20th 2019, we extracted the top 1 000 websites of the TLD of 31 European country and 1 000 websites from three country-independant TLDs: .com, .org and .eu. Altogether, we obtained 28 257 websites (some countries had few websites in Tranco). The second column of Table II shows the number of crawled websites for each TLD.
Since our study is focused on the respect of consent, we decided to respect publishers’ consent regarding bots and crawling on their websites by checking the instructions in each website’s robots.txt file. For each website in a list of 28 257 websites, we first visited the address https://www./robots.txt using Python’s urllib to verify access authorization. If access was denied, we did not crawl the website. As a result, 3 633 (12.86%) websites out of 28 257 refused access to robots in their robots.txt file, so we have removed them from our further analysis.
While testing authorization, we also verified reachability. If loading the robots.txt file failed for a network-related reason, we attempted accessing it through HTTP. If DNS resolution failed for www., we attempted accessing instead. We determined if the website was loaded with a timeout of 10 seconds. If loading timed out 3 times, we considered access not successful. We applied the same criteria when visiting the main page with a Selenium-controlled browser. In total, 1 675 (5.9%) websites were not reachable.
As a result, we have successfully automatically crawled 22 949 websites originating from up to 1 000 websites from each TLD. The resulting 22 949 websites constitute the basis for the investigations that we describe in Section V.
Iv-B Detecting a TCF Cookie Banner
We first automatically detect websites with cookie banners that implement TCF. Cookinspect detects the presence of a TCF banner by checking whether a __cmp() function is defined on a given website (each Consent Management Provider (CMP) must implement a __cmp() function according to the TCF specification , see more details in Section II-3).
To validate our detection method in practice, we made a preliminary crawl on 21 000 websites, and analyzed how often __cmp() and __cmpLocator are used. We observed that all websites except for one that implement __cmpLocator also implement the __cmp() function. We also observed that 20.76% of websites define a __cmp() function but do not implement __cmpLocator. Therefore we can safely use a hypothesis that if the __cmp() function is defined, then a CMP is present on a website. We therefore rely on a presence of a __cmp() function to conclude that a CMP is present on a website. When crawling a website, Cookinspect waits for the website to finish all loading, waits for additional 3 seconds666Experimental tests on 200 websites showed that no more than a 2s delay is necessary., and then verifies whether the window object contains a __cmp function. If a __cmp() function is not detected, Cookinspect does not re-load the page.
As a result, we have automatically detected TCF banners on 1 426 websites and we show further results on the prevalence of TCF banners and CMPs that implement them in Section VI.
Iv-C Intercepting a Consent String
CMPs that implement a TCF banner can use a number of different methods to share a consent string with advertisers present on a website (see Section II-3). In this section, we describe how Cookinspect intercepts the consent strings using all available methods, summarized in Figure 5. Cookinspect relies on several browser extensions that help injecting scripts and collecting the consent string using different methods. Cookinspect contains a Python script that collects all the intercepted consent strings and logs them in a local database.
Standard APIs. First, Cookinspect actively pretends to be an advertiser in a first party position: it inserts its script in the context of the crawled website (method ➊). The injected script is first-party because it runs in the origin of the crawled website (in the origin of site.com in Fig. 5). The injected script makes a direct call to the __cmp() function, and if it obtains a consent string in return, then the script transmits it back to the Python script that logs the consent strings.
Second, Cookinspect pretends to be an advertiser in a third-party position: Cookinspect contains a custom browser extension ➋ that injects a script in all third-party iframes that have the __cmpLocator iframe as a parent (only the children of an iframe with the __cmpLocator identifier are able to query for the consent string). From each such iframe, the injected script sends a postMessage __cmpCall to the __cmpLocator iframe to request for the consent string (method ➋). The script then transmits it back to the extension and further to the Python script for logging.
Shared cookie. A browser extension ➌ of Cookinspect that reads attempts to read the shared cookie, and then transmits it to the extension and back to the Python script.
URL-based methods. To intercept the URL-based methods and obtain consent strings shared with third parties, a custom browser extension ➍ monitors all network requests. According to the TCF, a consent can be transmitted by the URL-based methods inside the gdpr_consent parameter – we therefore log all the requests containing such parameter (method ➍). Although the TCF only mentions GET requests, we also monitor POST requests parameters. We observed that POST requests transmit a consent string on 399 websites (out of 1 426 TCF websites, i.e. 28.0%), while GET requests do so on 764 websites (53.6%).
Since the consent redirecting mechanism always uses HTTP requests to transmit the consent string in the gdpr_consent parameter, we already detect it by intercepting all GET and POST requests that contain such parameter.
Non-standard method. According to TCF specification, safeFrames act as a proxy to the __cmp() function. Cookinspect therefore does not specifically interfere with safeFrames, because it obtains a consent string by making a direct call to the __cmp() function, that is already done with method ➊.
Iv-D Identifying the CMP Responsible for a TCF Banner
To identify a CMP behind a banner, we use the consent string that we obtain from the standard APIs and the shared cookie. We decode the consent string with a public script provided by IAB . The decoded array contains the CMP identifier or ID (see cmpId in an example of decoded consent string in Fig. 4). We map the CMP ID to the public CMP list  to retrieve the CMP company’s name.
Iv-E Detecting GDPR and ePD Violations
To detect violations based on consent strings, we first explain what information we extract from the consent strings, and then explain how we detect violations. Each consent string contains two arrays: an array of allowed advertisers, and an array of accepted purposes. As further we discuss in Section XI, it is unclear how advertisers are supposed to interpret these two sources of information. Purposes for data processing have a strong legal meaning (see Section III), we therefore focus in our study on the purposes stored in a consent string, and do not analyze the array of allowed advertisers.
By using Cookinspect we detect the four major violations of GDPR and ePD presented in Section III. We explain below what methods we have used to detect each violation.
Consent stored before choice: Cookinspect logs all the consent strings by intercepting standard APIs and a shared cookie (methods ➊, ➋ and ➌ in Figure 5). Cookinspect is able to detect Consent stored before choice violation fully automatically while crawling 22 949 websites without performing any user action. If the consent string stored in the user’s browser is empty (have no accepted purposes), then we do not label it as a violation. We therefore consider a violation only if the consent string has one or more accepted purposes (out of five possible purposes in the TCF, see Appendix A) even though no action was performed on the visited website.
No way to opt out: we detect this violation manually by visiting the websites and using Cookinspect to assign the corresponding labels designed for this violation. To identify whether there is an option to refuse consent, we perform the following procedure.
If there is a “refuse” button on the banner, we click it directly. Otherwise, we open the banner’s “parameters”.
In ”parameters” option, we un-tick any purpose-related option (checkbox or slider), independently from the kind of option (including e.g. functional cookies).
If there is a “refuse-all” button, we click it even if options are unticked by default.
When banners propose vendors-related options, we ignore them, because we only analyse purposes stored in the consent string.
Pre-selected choices: we detect this violation manually by visiting the websites and using Cookinspect to assign the corresponding labels designed for this violation. We label a website as violating if it has a ”parameters” option, and in the ”parameters” page there is at least one pre-selected checkbox or enabled slider for at least one purpose.
Non-respect of choice: to detect this violation, we perform a semi-automatic crawl based on a human operator and Cookinspect crawler on 560 websites from French, Italian or English-speaking countries: France, UK, Belgium, Ireland and Italy, and .com websites. We only considered banners written in a language that at least one of the authors speak to make sure we understand the actions we perform. First, a human operator manually refuses a consent on a cookie banner (following the procedure above). Then, Cookinspect logs all the consent strings by using the standard APIs and shared cookie. Some TCF banners may not allow users to untick some of the purposes considering that such purposes do not require consent. Such decision can be criticized by legal experts and policy makers, however we exclude such discussions from our work. Instead, we further consider only consent strings that have all five accepted purposes, to ensure that a violation indeed took place, even if the user couldn’t refuse some of the purposes of data processing in the TCF banner.
V Two Measurement Campaigns
We perform a large-scale study of websites registered in the EU because the TCF is likely to be observed more often on European websites. We perform two measurement campaigns with Cookinspect: a large-scale automatic crawl and a smaller-scale semi-automatic crawl, both conducted in September 2019 from France. The source code of Cookinspect and all the extensions is publicly available so that end users and DPAs can test compliance of publishers and CMPs by themselves (see Appendix B).
Figure 6 provides an overview of the main components of our website auditing process. Table I presents an overview of violations we detect using automatic and semi-automatic crawling campaigns with Cookinspect. For each violation, we show what crawl was used for its detection and which component of the Web application allowed us to detect it.
|Short name||Type of crawl||Analyzed component||Number of tested websites|
|Consent stored before choice||automatic||consent string||1 426|
|No way to opt out||semi-automatic||banner||560|
|Pre-selected choices||semi-automatic (when opting out is possible)||banner||508|
|Non-respect of choice||semi-automatic (when opting out is possible)||consent string||508|
V-a Automatic Crawl
First, we perform a stateless automatic crawl of 22 949 selected websites (see Section IV-A for websites selection) to perform website auditing without human intervention to detect: (1) whether the website contains a TCF banner777We do not test for the __cmpLocator iframe presence, as we found only 1 website among 21 000 on which we could find a __cmpLocator iframe but no __cmp() function during a test run., (2) whether positive consent is stored without any user action (Consent stored before choice violation), (3) whether the website accesses the consent in a shared cookie, (4) presence of third-party tracking requests prior to any user consent.
In a first browser session, we first detect if the website implements the TCF by verifying whether a __cmp() function is defined after load and a 3s delay. If so, we attempt to obtain the consent string using Cookinspect to detect the Consent stored before choice violation. Cookinspect also logs all requests made to third-parties. In a new clean browser session, we place a shared cookie in the browser, and attempt to get it back using a direct __cmp() call and a postMessage.
This crawl takes an average of 11.04s per website. Crawling 28 000 websites takes more than 24 hours. This crawl was made from France on September 20th and 21st 2019.
V-B Semi-Automatic Crawl
From the resulting 1 426 websites that contain a TCF banner, we select 560 websites of French, Italian or English-speaking countries (the languages that the authors speak fluently) from .uk, .fr, .it, .be, .ie and .com TLDs to perform a semi-automatic crawl.
This crawl performs tests requiring interaction with the cookie banner888As the TCF does not specify anything regarding the user’s interface, we don’t have a way to locate a banner and its different elements, and a fortiori to automate banner interaction.. Upon different browser sessions, we give both a positive and a negative consent, to make tests regarding respect of given consent, and setting of the shared cookie. We also note whether banners propose an option to refuse consent, and whether specific parameters are pre-selected in favor of consent-sharing.
On a clean browser session, we load the website. If there is no banner or a broken banner, we stop there. We manually label when the Pre-selected choices or No way to opt out violation is observed (see the complete procedure describing the labelling process in Section IV-E). We then refuse consent on the banner, and observe with Cookinspect what consent is stored by the CMP (including the shared cookie) to detect the Non-respect of choice violation. After 3 seconds, we refresh the page to log all the network requests (to quantify the amount of trackers). We also attempt to obtain the consent string again. Then, on a new session, we repeat this procedure, this time giving a positive consent to the banner. We give the detailed procedure that the human operator to refuse and accept consent on the banner in Appendix D.
Each website takes around 30-40s given a reactive human manipulation. We performed this stateless crawl from France from September 23rd to October 1st 2019.
V-C Verification Procedure
For the Non-respect of choice and the No way to opt out violations, we cross-checked choices made in the banner and whether the banner allows to refuse consent by two human operators to limit errors. The three operators are computer scientists working on web tracking. The semi-automatic crawl is first entirely done by one of the operators. For each CMP on which we observe a Non-respect of choice on at least one website, we select one of these violating websites. We add an equal number of websites on which the violation is not seen in the pool of sites to be verified. Then, a human operator refuses consent on all of these websites, unknowingly of which website was considered a violation by the previous tester. We do the same for the No way to opt out violation, this time testing for the possibility to refuse consent. Then, the third human operator repeats the procedure on a new pool of websites. In case of a conflict, we keep the results of the test returning the least violations, and retest all websites of the considered CMP.
V-D Ethical Considerations
Our data collection process does not involve any human subject. Our study of websites is mostly passive: the only action we perform on websites is, for some of them, to click on manually-selected cookie acceptance buttons. Hence, we do not temper with any distant system. Our large-scale measurement does not present any potential harm for websites, while presenting societal benefits. Moreover, we respect publishers’ consent regarding bots that they express in a robots.txt file (see website selection process in Section IV).
Vi Prevalence of TCF Banners in Europe
We have conducted an automatic crawl of 28 257 websites from 1 000 top Tranco websites for 31 European TLD and 1 000 websites from .com, .org and .eu domains between September 20th and September 23rd 2019. Among reachable and authorized websites, 1 426 (6.2%) had a TCF banner (cookie banner of a CMP implementing the TCF). We show per-TLD details in Table II. The 1 426 websites that have a TCF banner are the target of the following automatic crawls.
|TLD||Number of domains in the Tranco top-1 million list||Number of reachable and allowed domains||Number of domains with a TCF-related cookie banner|
|.uk||1 000||788 (78.8%)||149 (18.9%)|
|.fr||1 000||815 (81.5%)||139 (17.1%)|
|.pl||1 000||858 (85.8%)||129 (15.0%)|
|.it||1 000||824 (82.4%)||123 (14.9%)|
|.es||1 000||800 (80.0%)||113 (14.1%)|
|.nl||1 000||838 (83.8%)||65 (7.8%)|
|.gr||1 000||720 (72.0%)||53 (7.4%)|
|.pt||1 000||793 (79.3%)||52 (6.6%)|
|.de||1 000||881 (88.1%)||56 (6.4%)|
|.ro||1 000||787 (78.7%)||50 (6.4%)|
|.bg||547||449 (82.1%)||26 (5.8%)|
|.fi||1 000||824 (82.4%)||47 (5.7%)|
|.no||1 000||862 (86.2%)||48 (5.6%)|
|.dk||1 000||824 (82.4%)||41 (5.0%)|
|.be||1 000||863 (86.3%)||38 (4.4%)|
|.at||1 000||873 (87.3%)||33 (3.8%)|
|.ie||1 000||769 (76.9%)||25 (3.3%)|
|.cz||1 000||916 (91.6%)||28 (3.1%)|
|.ch||1 000||849 (84.9%)||26 (3.1%)|
|.se||1 000||787 (78.7%)||21 (2.7%)|
|.sk||1 000||879 (87.9%)||14 (1.6%)|
|.hr||627||513 (81.8%)||8 (1.6%)|
|.hu||1 000||794 (79.4%)||6 (0.8%)|
|.lu||186||147 (79.0%)||1 (0.7%)|
|.lt||745||605 (81.2%)||4 (0.7%)|
|.lv||537||420 (78.2%)||2 (0.5%)|
|.si||514||426 (82.9%)||2 (0.5%)|
|.is||358||248 (69.3%)||1 (0.4%)|
|.ee||468||373 (79.7%)||1 (0.3%)|
|.li||105||62 (59.0%)||0 (0.0%)|
|.cy||108||76 (70.4%)||0 (0.0%)|
|.mt||62||37 (59.7%)||0 (0.0%)|
|.com||1 000||711 (71.1%)||97 (13.6%)|
|.org||1 000||735 (73.5%)||16 (2.2%)|
|.eu||1 000||803 (80.3%)||12 (1.5%)|
|all||28 257||22 949 (81.2%)||1 426 (6.2%)|
We extract information from the consent strings to identify the CMP present on a website. As not all websites were setting up a consent string upon our visit (see our methodology in Section IV), and some consent strings contain an incorrect CMP ID, we have been able to identify the CMP company behind a TCF banner for 298 (20.9%) websites in the automatic crawl, and 511 (92.9%) websites in the semi-automatic crawl. We represent the distribution of identified CMPs in the semi-automatic crawl in Figure 7. The most encountered CMP is Quantcast, far beyond OneTrust, Didomi and Sourcepoint.
We have not found any implementation of the version 2 of TCF that came out in August 20th 2019.
Vii Quantification of Violations
In this section, we comment on the results regarding the main violations of the GDPR and the ePD described in section III. These violations concern consent strings obtained using the standard API and shared cookie (see section IV-C). Violations related to consent strings seen in GET and POST requests are shown in section VIII-A, because we cannot attribute their responsibility to the CMP or the publisher.
Vii-a Detected GDPR and ePD Violations
Vii-A1 Overview of Violations
We show a summary of the main violations’ prevalence, depending on the number of purposes in the consent strings, in Table III. As a reminder, we consider violations of Consent stored before choice when we find a consent string with 1 to 5 purposes set, but only when 5 purposes are set for Non-respect of choice.
|Number of purposes||Consent stored before choice||No way to opt out||Pre-selected choices||Non-respect of choice|
|1 to 4 purposes||3.7% (53/1426)||-||-||6.7% (34/508)|
|5 purposes||8.6% (122/1426)||-||-||7.7% (39/508)|
|Total number of violations||12.3% (175/1426)||6.8% (38/560)||46.5% (236/508)||7.7% (39/508)|
|Any violation||54.46% 305/560|
We find examples of websites for all considered violations. We find that 38 (6.8%) websites do no provide any way to refuse consent. 236 (46.5%) websites pre-tick the purpose or vendor options. 175 websites (12.3%) set a consent string with 1 to 5 purposes before any user action. 39 websites (7.68%) set a consent string with 5 purposes even though the user gave a negative consent.
Vii-A2 Quantification per Violation
Table IV shows the results of each violation, grouped by CMP seen performing a violation at least 3 times in the semi-automatic crawl. To compute the results of the Pre-selected choices and Non-respect of choice violations, we only consider websites having a banner proposing a way to refuse consent (508 websites), i.e. we exclude banners having the No way to opt out violations (38 websites), and broken/missing banners (14 websites).
|TLD||Number of websites|
|TLD||Number of websites|
Consent stored before choice: Table V shows results of the automatic crawl per TLD. We observe 175 websites registering a consent string that contains a positive consent even though the user did not perform any action. 122 of them contain all of the TCF’s purposes. This is a striking abuse of the framework, happening on more than 1 in 10 websites using it. Interestingly, according to the TCF specification, the APIs we have used to detect consent string should not return the consent string before the user gives their decision on consent (or consent is retrieved from existing cookies) .
No way to opt out: We observe 38 websites offering no option to give a negative consent. These website take part in a framework about user’s consent collection, but do not actually offer a way to give a negative consent. Collected consent cannot be considered free, as required by the GDPR.
Pre-selected choices: Almost half of tested websites (236 out of 508) pre-select choices. In the Planet49 case  announced few days after we finished the crawling campaigns, the European court of Justice decided that such pre-selected choices leads to an invalid consent.
Non-respect of choice: 39 websites register a positive consent even though the user gave a negative one. This strikingly violates user’s choice, the framework, and the GDPR.
We observe a variety of violations among the different CMPs. Interestingly, violations are often seen on a partial number of websites. This shows that CMPs offer several versions of their banners that behave differently. We further discuss the shared responsibility of violations between CMPs and publishers in section XI.
We give additional presentations of the results (per-country and per-CMPs views) in Appendix E.
Vii-A3 Quantification per Publisher
We observe violations on a wide range of websites. For each violation, we display the lists of top 10 violating websites, ordered by their rank in the Tranco list in Table VI. msn.com, ranked 48 in the Tranco list, stores a positive consent before any user choice, then offers no way to opt out. medicalnewstoday.com, a website about health, does the same, even though medical information is a sensitive category of data. w3schools.com, a popular website providing web development tutorials, displays a banner with pre-selected choices, but registers a positive consent even if the user gets into the trouble of deselecting them. softonic.com, website of a major software developer, registers a positive consent before user action, then displays a banner with pre-selected choices, and finally does not respect the user’s decision.
Vii-B Escalation of Violations with a Shared Consent
Setting a violating consent string in a cookie shared among all TCF websites would constitute an escalation of the problem. We investigate the question: to what extent do websites use the shared cookie? As explained in section IV-C, we try to read it using a browser extension after giving both a positive and a negative consent in the semi-automatic crawl. We observe 126 (22.9%) websites setting the shared cookie.
We then estimate how many websites access and reuse the shared cookie. We place a custom cookie (respecting the specification) in the browser, query the CMP using the standard APIs, and see if the CMP returns the exact same consent string (with no banner interaction). Using this protocol, 62 (4.3%) websites return the same consent strings. This means that CMPs on these websites reuse the shared cookie, even if it has been created by another CMP. This constitutes a lower bound, because CMPs can return another consent string than the one stored in the cookie.
We also estimate how many websites access the shared cookie by studying how many of them use the HTTP redirect mechanism described in section II-3 to do so. We first observe that many consent redirecting domains do not respect the specification. Indeed, during manual inquiry, we find redirecting schemes using different values for the GET parameter specifying the redirection URL. For example, on mirror.co.uk we observed a GET request with gdpr_consent_string parameter instead of gdpr_consent. As we cannot cover these cases exhaustively, we focus on those respecting the specification. The only one we observe doing so (sddan.consensu.org, owned by the SIRDATA CMP) is used on 53 (9.5%) websites during the semi-automatic crawl. This hints that the practice of reading the shared consent cookie is quite common.
We observe 3 websites setting the shared cookie in the Consent stored before choice case, 3 in the Non-respect of choice case with 5 purposes, and 20 (3.9%) with 1 to 5 purposes. Visiting one of the 3 websites on which the cookie is set before any user action on the banner will automatically set a global positive consent cookie. Visiting one of the 20 websites that do not respect user decision will set a global positive consent cookie against the user’s decision. This is particularly troubling in terms of privacy: since this consent is reused among different publishers, it constitutes an escalation of the problem. We discuss this further in Section XI.
Viii Measuring Third Party Requests: Presence of Consent and Third-Party Trackers
In previous sections, we studied violations in consent strings obtained via the standard API and shared cookies, as described in Section II-3. Responsibility of such violations can be attributed to CMPs and publishers (see the discussion in Section XI). However, when we find a non-compliant consent string via a URL-based method, we have no way to know whether that consent string was legitimately transmitted by the CMP or any other third party present on a page.
In this section, we study third-party requests observed in the two crawls. We first analyse the consent strings transmitted via URL-based methods, and then measure how many third party trackers are present on the page before user actions, after acceptance and after refusal of consent.
Viii-a Third-Party Requests with Consent Strings
In this section, we detect the four GDPR and ePD violations (see Section III for description of violations and Section IV describing how we detect them) by analyzing consent strings that we observed in GET and POST requests to third parties. We observed consent strings with positive consent (1 to 5 allowed purposes) in GET or POST requests before any user action 151 (10.6%) websites out of 22 949 websites in the automatic crawl – this indicates websites with Consent stored before choice violation. For the Non-respect of choice violation, we intercepted consent strings in GET or POST requests with 5 purposes on 63 websites (12.4%). To evaluate whether these results are complementary to our previous findings, we count the number of websites in which we see a violating consent string in GET and POST requests, but do not obtain a violating consent string via intercepting the standard APIs or in the shared cookie (see Section IV-C).
Consent stored before choice: Additionally to 66 websites where we observed this violation while intercepting consent string with standard APIs and share cookie, we observed it also on additional 69 websites, where GET or POST requests send a consent string with positive consent (1 to 5 purposes) In means that requests containing violating consent strings are sent while the CMP has not provided a consent string yet.
Non-respect of choice: Additionally to 39 websites where we observed this violation while intercepting consent string with standard APIs and share cookie, we observed it also on 38 websites, where we obtain consent strings with all 5 purposes in GET and POST requests.
We further investigated whether the identifiers of the responsible CMPs (CMP ID) for each consent string obtained via GET and POST request match the CMP IDs obtained from consent strings with standard APIs and shared cookie. We found CMP IDs in GET and POST requests different from the ones found using the standard API on 48 websites. In 37 of them, both CMP IDs found were from valid CMPs, while in remaining 11 websites CMP IDs were set to either 0, 1 or 4095, which do not exist in the CMP public list . It seems very suspicious that consent strings not created by the actual CMP (or even non-existent CMPs!) are sent to third parties.
Viii-B Third-party trackers
We measure the number of third party trackers on websites with TCF banners depending on user consent: before any user action, after a negative consent and after a positive consent. To do so, with Cookinspect we logged every request to third-party domains. From this, we extract domains which are considered trackers in the Disconnect list .
We first measure the number of third party tracking requests without responding to the cookie banner or doing any other action on the website. Then we count third party tracking requests after giving both a positive and a negative consent to the cookie banner (for websites on which it is possible), and reloading the page. Each measurement of trackers is done in a single browser session, on a single page load. These tests are done on the 508 websites on which giving a negative consent is possible in the semi-automatic crawl.
|User action||Number of third party tracking requests||Total number of third party requests|
|Before user action||22.54||35.04|
|After a negative consent||28.78||42.50|
|After a positive consent||39.59||56.75|
Table VII summarizes the results. We observe that giving consent on TCF banners, whether it’s positive or negative, has an effect on the number of included third party trackers. Surprisingly, even refusing consent increases the number of tracking requests. The number of websites having the Non-respect of choice violation (and hence setting a positive consent even if the user refused) is not sufficient to explain this increase. We estimate that some scripts, in order to execute and include content, wait for the __cmp() function to be defined, which should only happen after the user has given their choice to the banner .
|Tracking company||Number of websites||TCF Vendor?|
|Integral Ad Science||258 (50.8%)||✓|
|Casale Media||237 (46.7%)||✓|
|The Trade Desk||217 (42.7%)||✓|
|Horyzon Media||179 (35.2%)||✓|
Table VIII shows the top 10 companies that own tracking domains present on websites after a negative consent (and a page reload). We matched tracking domains to company names using the Disconnect list . We find whether they are part of the TCF by checking if any company name linked to a tracker domain in WebXRay’s database  is present in the Global Vendor List (version 168). Some top trackers belong to vendors which are not part of the IAB framework (Google, Facebook or Amazon), but the rest of them are (eg. AppNexus, The Rubicon Project, comScore, etc.).
During our study, we encountered many unusual cases, detailed in appendix F.
Ix Browser Extension
We publish a browser extension called Cookie glasses , to enable users to see if consent stored by CMPs corresponds to their choice. Users can read information stored in the consent string provided by the CMP in a simple interface. The most important pieces of information present in the consent string are decoded and displayed in a readable format (see Figure 8).
Technically, the extension uses postMessages from the standards API (see ➋ in Figure 5). It is not possible to use direct calls to the __cmp() function because of browser security mechanisms. Our tests show that 79% of TCF websites use the postMessage API. Our extension therefore works on a majority of websites.
Our work has some limitations. First, our scope is limited to banners of IAB Europe’s TCF. Since we do not detect other cookie banners, we only observe a subset of all cookie banners. Besides, our results on the prevalence of TCF banners represent a lower bound on the actual usage of TCF banners, due to a variety of implementations of the TCF. For instance, some banners do not define the __cmp() function on the first page load. In one of its banners (e.g. on aol.com), the Oath CMP redirects the user to another website (of a different domain) to display a consent wall. On this page, the __cmp() function is not displayed. We do not detect such cases in our study. While we detect TCF banners on 17.1% of .fr websites, van Eijk et al.  found that 40.2% of European websites have a cookie banner.
Finally, we only detect violation in client-side consent strings. Yet, exchanges of consent strings can also happen outside of the browser. IAB provides extensions fields  for exchanging consent string in its OpenRTB protocol . This protocol is used between ad exchanges and advertisers for Real-Time Bidding. As such communication happens server-to-server, we cannot observe it with a client-side approach.
Xi Discussion on Legal Compliance
In this section, we reflect on our experiments and our results and comment on open problems that can be addressed by legal professionals or DPAs.
Xi-1 Who is responsible for violations in cookie banners?
It is a complex task to attribute the responsibility of a non-compliant cookie banner on a website to either the CMP or the publisher. CMPs often propose different versions of their banners that have different legal implications, and provide a documentation on customizing the banner. For instance, OneTrust, on its webpage presenting its CMP solution , proposes publishers to “maximize user opt-in with customizable publisher-specific cookie banners […] to optimize consent collection”. We argue that CMPs providing non-compliant cookie banners cannot exonerate themselves and delegate responsibility to the publishers that include them, especially when they claim to provide GDPR- and ePrivacy-compliant consent collection solutions. Conversely, publishers have a part of responsibility if they choose non-compliant banners. Hence, the responsibility of non-compliant cookie banners is shared between CMPs and publishers. CMPs and publishers might even be considered co-controllers, but we leave this discussion to lawyers.
Moreover, it is possible that publishers customize the banner when they host the CMP script in their website, modifying the original behaviour offered by the CMP. In such a case, the responsibility of a violating banner should be attributed to the publisher. Such cases can be detected with extensive case-by-case manual inspection.
Xi-2 Problem of shared consent across publishers
The TCF defines a “global” cookie that is writable and readable by all CMPs (see section II-2). We found such example on letudiant.fr: it obtains the consent string set on the website senscritique.com previously visited by the user101010We show a video of this in attachment .. This behaviour may not be a violation of the GDPR in itself: consent must be specific to a given purpose, not to a publisher. However, it seems suspicious that, even while obtaining the consent string invisibly for the end user, altervista.org still displays the cookie banner to the user. This may be considered as an excessive request (publishers request for a consent they already have) and a lack of transparency or a deception (user is tricked into thinking altervista.org does not have their consent). In fact, in a report about dark patterns, the CNIL already uses the terms “bordering on harassment” to describe the repetitive request for consent on every website, even without any shared consent consideration . The global consent has been criticized by the Privacy International NGO , which denounced the lack of users’ knowledge that consent is global to websites, and that opting out is near impossible. This concept of global consent requires further analysis by legal experts.
Additionally, such a design in the TCF assumes that all the CMPs who use the global consent mechanism trust each other on setting the consent string accordingly to the choices made by the user. But a TCF-wide problem would arise if one publisher or CMP set this cookie incorrectly, violating user’s consent. We found that this was not a hypothetical scenario: we detected 3 websites that set a positive consent in the shared cookie before the user makes any choice in the banner and 20 websites doing so after the user explicitly refuses consent (more details in Section VII-B).
Xi-3 Unclear purposes in IAB Europe’s TCF
The TCF proposes five pre-defined purposes (see Table IX in the Appendix): we leave for discussion to legal professionals whether defined purposes are clear and specific. The CNIL has already pronounced that the TCF defined its purposes in a vague, imprecise way, in the decision against Vectaury .
Xi-4 Unclear semantics of consent string format
Each consent string contains a list of purposes and a list of vendors. The specification does not clearly say how advertisers are supposed to verify their consent, nor how the CMP should populate these fields. What happens if one of the two lists is empty? Should vendors assume they have consent if their identifier is in the vendors array, or should they also check the purposes? The TCF doesn’t specify how vendors should interpret the consent string. Moreover, should the vendors list only be populated with vendors declaring to use consent for the purposes set in the purposes list? Such questions could be an interesting terrain of investigation for DPAs.
Xi-5 Consent strings can be created by anyone
As shown in Section VIII-A, we observe on 37 websites requests to third parties containing consent strings that we suspect are being forged by non-CMP scripts running on the page, because they contain a CMP identifier that doesn’t correspond to the CMP present on the page. Even though the whole purpose of the TCF is to provide a way for actors in the advertisement industry to prove that they received consent from the user, we state that this proof is weak. The consent string’s format does not contain any cryptographic proof that it was created by a given CMP, on a given website, in concordance with the user’s choice. Consent strings can be forged by anyone, as our observation shows.
Xii Related Work
The first lines of research on cookie banners published before the GDPR laid on the legal basis of the ePD and its implementation in various European countries, and were very country-specific. As the GDPR changed behaviour regarding cookies [35, 44], trackers and other third-party content  and cookie banners [16, 8], we indicate whether each work made measurements before or after its release (May 2018).
To the best of our knowledge, the following works measured prevalence of cookie banners after GDPR. Sanchez et al.  studied the impact of the GDPR on cookie setting practices. They find that the GDPR had a global impact, influencing even US-based websites. Similarly to our semi-automatic crawl, they manually refused consent on 2 000 cookie banners to extract statistics such as the number of cookies after consent refusal. Van Eijk et al.  studied the impact on user’s location on cookie banners. Using a crowd-sourced list, they automatically detected cookie banners on 40.2% of European websites. They found that the provenance of the user has not so much impact as the expected audience of a website regarding the prevalence of banners. They also observed important variations between websites of different top-level domains. Degeling et al.  studied characteristics of 31 cookie banner libraries, including several provided by CMPs of the TCF, by installing them locally. They found that 62.2% of European websites displayed cookie banners in October 2018. The authors observed a 16% increase in cookie banners adoption by website pre- and post-GDPR.
Similarly to our No way to opt out violation, some works measured how many banners offer no way to opt out . In 2015, Leenes and Kosta found this issue on 25% of 100 Dutch websites . The same year, the Article 29 Working Party found it on 54% of top 250 websites of 8 EU countries . Vallina et al. found cookie banners offering no option to give a negative consent on 1.36% of porn websites .
Several works measured the influence of cookie banners on the number of trackers or cookies. Before the GDPR, Carpineto et al.  measured how many websites set cookies without displaying a banner. Traverso et al.  measured the number of trackers before and after giving a positive consent on banners on 100 Italians websites. They found an average of 29.5 trackers prior to giving consent. In 2016, Englehardt and Narayanan  found 18 third parties per websites prior to any consent. In 2017, Trevisan et al.  found that 49% of websites installed profiling cookies before user consent, and that 80.5% of websites installing tracking cookies did not wait for user’s consent to do so. After the GDPR came in force, Sanchez et al.  measured the number of cookies after giving a negative consent on banners. Instead, in our work, we measure trackers both before and after giving both a positive and a negative consent.
On the legal side, some regulators have already been active. The French DPA (CNIL) sued an advertisement company the used the TCF, invoking a lack of informed, free, specific and unambiguous consent . They also noted that pre-selecting consent checkboxes was not compliant with the Article 32 of the GDPR. The European Court of Justice’s decision in the Planet49  case recently settled that pre-selecting options was not GDPR-compliant. Finally, the UK’s DPA (ICO)  recently published a report on adtech and Real-Time Bidding, studying both IAB Europe’s TCF and Google’s framework. Among other considerations, they concluded that the TCF lacked transparency and observed a systemic lack of compliance to their data protection requirements.
In this paper, we have systematically studied cookie banners of IAB Europe’s Transparency and Consent Framework (TCF). By collaborating with a researcher in law, we have identified four legal violations of both the GDPR and the ePD and we have detected them on 1 426 European websites that use TCF cookie banners. We have detected at least one of these violations in 54% of websites. Finally, to help users and Data Protection Authorities (DPAs) further investigate these violations, we provide a browser extension called Cookie glasses, that is able to detect some of them.
Beyond violations in the implementations of cookies banners, we believe that the TCF suffers from several problems open for discussion and improvement. First, the consent string format has an unclear semantics, which makes it hard to interpret and use by the third parties that rely on such consent. Second, the TCF does not provide guidelines on which actors who obtain user consent (assuming it was obtained in a compliant way) are supposed to respect it: should the publishers, CMPs or some other actors ensure that the third parties respect consent they received? We believe that European regulators should take a more active stand regarding the implementation of cookie banners: either with supportive actions, such as guidance, or repressive actions, such as regulatory decisions and associated fines.
-  (2013) FPDetective: dusting the web for fingerprinters. In Conference on Computer and Communications Security (CCS’13), Cited by: §I.
-  (2015) Cookie sweep combined analysis - report. Note: https://ec.europa.eu/newsroom/article29/document.cfm?action=display&doc_id=56123, accessed on 2019.11.01 Cited by: §I, §XII.
-  Attachments to the paper (anonymized Dropbox repository). Note: https://www.dropbox.com/sh/fw8ubf23z3ai0ei/AABY4qRO3FKXeGELfiPGFHica Cited by: Appendix B, §I, TABLE VI, §IX, footnote 10.
-  (2016) Automatic assessment of website compliance to the European cookie law with CooLCheck. In Proceedings of the 2016 ACM on Workshop on Privacy in the Electronic Society, Cited by: §XII.
-  Article 2 of the deliberation n2019-093 of july 4, 2019 adopting guidelines relating to the application of article 82 of the law of january 6, 1978 modified to the operations of reading or writing to a user’s terminal (including cookies and other tracers) (corrigendum). Note: https://www.legifrance.gouv.fr/affichTexte.do?cidTexte=JORFTEXT000038783337, accessed on 30 October, 2019. Cited by: §III, §III.
-  Décision n MED 2018-042 du 30 octobre 2018 mettant en demeure la société Vectaury. Note: https://www.legifrance.gouv.fr/affichCnil.do?oldAction=rechExpCnil&id=CNILTEXT000037594451&fastReqId=974682228&fastPos=2, accessed on 31 October 2019. Cited by: §XI-3.
-  (2019) IP report: shaping choices in the digital world. Note: https://linc.cnil.fr/fr/ip-report-shaping-choices-digital-world, accessed on 2019.10.30 Cited by: §XI-2.
-  (2018) We value your privacy… now take some cookies: measuring the GDPR’s impact on web privacy. In Network and Distributed System Security Symposium (NDSS), Cited by: §I, §XII, §XII, §XII, footnote 4.
-  Disconnect-tracking-protection. Note: https://github.com/disconnectme/disconnect-tracking-protection, accessed on 2019.07.16 Cited by: §VIII-B, §VIII-B.
-  Judgement of the court of justice of the EU, Case c-673/17. Note: http://curia.europa.eu/juris/document/document.jsf;?&docid=218462&doclang=EN&cid=8679428, accessed on 2019.10.31. Cited by: §XII, §III, §VII-A2.
-  (2011) Opinion 15/2011 on the definition of consent (WP 187), adopted on 13 july 2011. Note: https://ec.europa.eu/justice/article-29/documentation/opinion-recommendation/files/2011/wp187_en.pdf Cited by: §III.
-  (2010) Opinion 2/2010 on online behavioural advertising, 22 june 2010, WP 171, p. 10. Note: https://ec.europa.eu/justice/article-29/documentation/opinionrecommendation/files/2007/wp136_en.pdf Cited by: §III.
-  (2013) Working document 02/2013 providing guidance on obtaining consent for cookies, adopted on 2 october 2013. Note: https://www.pdpjournals.com/docs/88135.pdf Cited by: §III, §III.
-  (2007) Opinion 4/2007 on the concept of personal data (WP 136), adopted on 20.06.2007. Note: https://ec.europa.eu/justice/article-29/documentation/opinionrecommendation/files/2007/wp136_en.pdf Cited by: §III.
-  (2018) Guidelines on consent under regulation 2016/679” (WP 259 rev.01), adopted on 10 april 2018. Note: https://ec.europa.eu/newsroom/article29/item-detail.cfm?item_id=623051 Cited by: §III, §III, §III.
-  (2019) The impact of user location on cookie notices (inside and outside of the European union). In Workshop on Technology and Consumer Protection (ConPro’19), Cited by: §X, §XII, §XII.
-  (2016) Online tracking: a 1-million-site measurement and analysis. In conference on computer and communications security (CCS’13), Cited by: §I, §XII.
-  (2009) Directive 2009/136/EC of the European Parliament and of the Council of 25 November 2009. Note: https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=celex%3A32009L0136, accessed on 2019.10.31 Cited by: §III, §III, §III.
-  (2018-04) Transparency and consent framework. Note: https://github.com/InteractiveAdvertisingBureau/GDPR-Transparency-and-Consent-Framework, accessed on 2019.05.03 Cited by: §I, §II-1, §II.
-  (2019-06) Global vendor list (gvl). Note: https://github.com/InteractiveAdvertisingBureau/GDPR-Transparency-and-Consent-Frameworkhttps://vendorlist.consensu.org/vendorlist.json, accessed June 2019 Cited by: §II-1, §II.
-  CMP ID 1 is not currently assigned to a Consent Management Provider (CMP). Note: http://advertisingconsent.eu/2019/01/cmp-id-1-is-not-currently-assigned-to-a-consent-management-provider-cmp/, accessed on 2019.09.02 Cited by: Appendix F.
-  CMP list. Note: https://advertisingconsent.eu/cmp-list/, downloaded in 2019.04 Cited by: Appendix F, §II-1, §II, §IV-D, §VIII-A.
-  IAB europe transparency & consent framework policies. Note: https://iabeurope.eu/wp-content/uploads/2019/08/IABEurope_TransparencyConsentFramework_v1-1_policy_FINAL.pdf, accessed on 2019.11.20 Cited by: §III.
-  (2018-04) GDPR consent passing for URL-based services: transparency and consent framework. Note: https://github.com/InteractiveAdvertisingBureau/GDPR-Transparency-and-Consent-Framework/blob/master/URL-based%20Consent%20Passing_%20Framework%20Guidance.md Cited by: Appendix F, §II-3.
-  (2018-02) OpenRTB advisory - GDPR. Note: https://iabtechlab.com/wp-content/uploads/2018/02/OpenRTB_Advisory_GDPR_2018-02.pdf, accessed on 2019.10.16 Cited by: §X.
-  OpenRTB (real-time bidding). Note: https://www.iab.com/guidelines/real-time-bidding-rtb-project/, accessed on 2019.09.16 Cited by: §X.
-  (2014) SafeFrame. Note: https://www.iab.com/guidelines/safeframe/, accessed on 2019.09.16 Cited by: §II-3.
-  (2019) Update report into adtech and real time bidding. Note: https://ico.org.uk/media/about-the-ico/documents/2615156/adtech-real-time-bidding-report-201906.pdf, accessed on 2019.07.10 Cited by: §XII.
-  (2019) Tranco: a research-oriented top sites ranking hardened against manipulation. In Network and Distributed System Security Symposium (NDSS), Cited by: Appendix C, §IV-A.
-  (2015) Taming the cookie monster with Dutch law - a tale of regulatory failure. Computer Law & Security Review 31. Cited by: §I, §XII.
-  (2016) Internet Jones and the raiders of the lost trackers: an archaeological study of web tracking from 1996 to 2016. In 25th USENIX Security Symposium (USENIX Security 16), Cited by: §I.
-  (2018) Changes in third-party content on european news websites after GDPR. Note: Reuters Institute for the Study of Journalism Cited by: §I, §XII.
-  (2015) Exposing the hidden web: an analysis of third-party http requests on 1 million websites. International Journal of Communication. Cited by: §VIII-B.
-  (2013) Cookieless monster: exploring the ecosystem of web-based device fingerprinting. In IEEE Symposium on Security and Privacy (SP’13), External Links: Cited by: §I.
-  (2014) Selling off user privacy at auction. In Network and Distributed System Security Symposium (NDSS’14), Cited by: §I.
-  Consent management publishers advertisers. Note: https://www.onetrust.com/solutions/consent-management-platform/, accessed on 2019.10.15 Cited by: §XI-1.
-  (2019) Cookie synchronization: everything you always wanted to know but were afraid to ask. In The World Wide Web Conference (WWW’19), Cited by: §I.
-  (2019) Most cookie banners are annoying and deceptive. this is not consent.. Note: https://privacyinternational.org/explainer/2975/most-cookie-banners-are-annoying-and-deceptive-not-consent, accessed on 2019.08.12 Cited by: §XI-2.
-  (2012) Detecting and defending against third-party tracking on the web. In USENIX Symposium on Networked Systems Design and Implementation (NSDI’12), Cited by: §I.
-  (2018) French regulator shows deep flaws in IAB’s consent framework and RTB. Note: https://brave.com/cnil-consent-rtb/, accessed on 2019.03.28 Cited by: §XII.
-  (2019) Can I opt out yet?: GDPR and the global illusion of cookie control. In Asia Conference on Computer and Communications Security (AsiaCCS’19), Cited by: §I, §XII, §XII, §XII, footnote 4.
-  (2002) Directive 2002/58/EC of the European Parliament and of the Council of 12 July 2002 concerning the processing of personal data and the protection of privacy in the electronic communications sector (Directive on privacy and electronic communications). Cited by: §I.
-  (2016) Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation). Cited by: §I, §III, §III, §III.
-  (2017) Benchmark and comparison of tracker-blockers: should you trust them?. In Network Traffic Measurement and Analysis Conference (TMA’17), Cited by: §I, §XII.
-  (2019) 4 years of EU cookie law: results and lessons learned. Proceedings on Privacy Enhancing Technologies Symposium (PETS’19). Cited by: §I, §XII.
-  (2019) (Un)informed consent: studying gdpr consent notices in the field. In Conference on Computer and Communications Security (CCS’19), Cited by: §I.
-  (2019) Tales from the porn: a comprehensive privacy analysis of the web porn ecosystem. In Proceedings of the Internet Measurement Conference (ICM’19), Cited by: §I, §XII.
Appendix A Purposes Defined in IAB Europe’s TCF
We reproduce purposes defined in IAB Europe’s TCF in Table IX.
|Purpose number||Purpose name||Purpose description|
|1||Information storage and access||The storage of information, or access to information that is already stored, on your device such as advertising identifiers, device identifiers, cookies, and similar technologies.|
|2||Personalisation||The collection and processing of information about your use of this service to subsequently personalise advertising and/or content for you in other contexts, such as on other websites or apps, over time. Typically, the content of the site or app is used to make inferences about your interests, which inform future selection of advertising and/or content.|
|3||Ad selection, delivery, reporting||The collection of information, and combination with previously collected information, to select and deliver advertisements for you, and to measure the delivery and effectiveness of such advertisements. This includes using previously collected information about your interests to select ads, processing data about what advertisements were shown, how often they were shown, when and where they were shown, and whether you took any action related to the advertisement, including for example clicking an ad or making a purchase. This does not include personalisation, which is the collection and processing of information about your use of this service to subsequently personalise advertising and/or content for you in other contexts, such as websites or apps, over time.|
|4||Content selection, delivery, reporting||The collection of information, and combination with previously collected information, to select and deliver content for you, and to measure the delivery and effectiveness of such content. This includes using previously collected information about your interests to select content, processing data about what content was shown, how often or how long it was shown, when and where it was shown, and whether the you took any action related to the content, including for example clicking on content. This does not include personalisation, which is the collection and processing of information about your use of this service to subsequently personalise content and/or advertising for you in other contexts, such as websites or apps, over time.|
|5||Measurement||The collection of information about your use of the content, and combination with previously collected information, used to measure, understand, and report on your usage of the service. This does not include personalisation, the collection of information about your use of this service to subsequently personalise content and/or advertising for you in other contexts, i.e. on other service, such as websites or apps, over time.|
Appendix B Attachments
In a public repository , we provide files that are relevant to this work: the source files of the Cookie Glasses browser extension, the full list of websites for each violation, screenshots of each website mentioned in this paper, and videos showing examples of GDPR violations. We promise to release Cookinspect after acceptance.
Appendix C Data for Reproducible Research
For the sake of research reproducibility, we indicate all data relevant to this work in Table X.
For selecting the websites, we use Tranco to build lists . Within Tranco, we select the following options: Alexa and Majestic lists. We don’t use The Cisco Umbrella list because it is DNS-based, and may not be representative of web traffic. Likewise, we exclude the Quantcast list because it is based on US traffic only. We also select the option to remove domains flagged as dangerous by Google Safe Browsing
From Tranco’s top 1 million list, we extract the first 1 000 websites of the top-level domain (TLD) of each European country, and 1 000 websites from country-independant TLDs: .com, .eu and .org on 2019.09.20.
|Software - Selenium||python-selenium 3.141.0-1|
|Software - Chromium||chromium 76.0.3809.100-1|
|Operating system||Arch Linux|
|Kernel (result of uname -a)||Linux 5.2.5-arch1-1-ARCH #1 SMP PREEMPT Wed Jul 31 08:30:34 UTC 2019 x86_64 GNU/Linux|
|User-Agent||Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/73.0.3683.103 Safari/537.36|
|Tranco list||https://tranco-list.eu/list/4NKX/1000000, generated on 2019.09.20|
|Disconnect list commit||ba312781 (2019-07-29)|
|WebXRay commit||04c3c8e8 (2019-06-18)|
|Crawling date (automatic crawl)||2019-09-20 - 2019-09-21|
|Crawling date (semi-automatic crawl)||2019-09-23 - 10-01|
Appendix D Procedure for the Human Operators
In this section, we give the precise procedure that human operators had to follow to give a negative and a positive consent on the banners during the semi-automatic crawl.
Secondly, on a second browser session (or directly if there is no option to refuse consent on the banner), we accept tracking by clicking on the “accept” button, or close the banner when it is the only option (we close the banner in all cases).
If the banner does not appear on first load, we reload the website until the banner appears, up to 3 times.
Appendix E Alternative Presentations of the Results
We display results of violations observed in the semi-automatic crawl organized by country in Table XI, and organized by CMP in Table XII (for CMPs seen at least 5 times). This is interesting for DPAs, who can then see which CMPs to investigate in priority. We do not display results for the automatic crawl because we can only identify CMPs providing consent strings before consent in this case (which would introduce a bias, and only concern 21.0% of websites).
|CMP||websites||Consent stored before choice||No way to opt out||Pre-selected choices||Non-respect of choice|
|Quantcast||174||3.4% (6/174)||5.2% (9/174)||37.8% (62/164)||0.6% (1/164)|
|OneTrust||50||74.0% (37/50)||4.0% (2/50)||83.3% (40/48)||8.3% (4/48)|
|Didomi||41||0.0% (0/41)||0.0% (0/41)||39.0% (16/41)||0.0% (0/41)|
|Sourcepoint||34||2.9% (1/34)||0.0% (0/34)||64.7% (22/34)||2.9% (1/34)|
|Evidon||22||4.5% (1/22)||22.7% (5/22)||25.0% (4/16)||25.0% (4/16)|
|iubenda||20||0.0% (0/20)||0.0% (0/20)||0.0% (0/20)||0.0% (0/20)|
|Clickio||14||0.0% (0/14)||0.0% (0/14)||0.0% (0/14)||0.0% (0/14)|
|Oath||12||0.0% (0/12)||0.0% (0/12)||16.7% (2/12)||0.0% (0/12)|
|Triboo Media||10||0.0% (0/10)||0.0% (0/10)||0.0% (0/10)||0.0% (0/10)|
|Commanders Act||10||40.0% (4/10)||0.0% (0/10)||80.0% (8/10)||0.0% (0/10)|
|Axel Springer||10||60.0% (6/10)||70.0% (7/10)||100.0% (3/3)||33.3% (1/3)|
|OneTag||9||0.0% (0/9)||0.0% (0/9)||100.0% (9/9)||0.0% (0/9)|
|Cookie Trust WG.||8||25.0% (2/8)||25.0% (2/8)||60.0% (3/5)||0.0% (0/5)|
|Conversant Europe||7||0.0% (0/7)||0.0% (0/7)||100.0% (7/7)||100.0% (7/7)|
|Ensighten||7||0.0% (0/7)||0.0% (0/7)||100.0% (7/7)||0.0% (0/7)|
|SIRDATA||5||0.0% (0/5)||0.0% (0/5)||0.0% (0/5)||0.0% (0/5)|
|Chandago||5||0.0% (0/5)||0.0% (0/5)||0.0% (0/5)||0.0% (0/5)|
|incorrect CMP ID||9||11.1% (1/9)||11.1% (1/9)||62.5% (5/8)||12.5% (1/8)|
|others||73||11.0% (8/73)||6.8% (5/73)||54.4% (37/68)||29.4% (20/68)|
|No consent string found||40||0.0% (0/40)||17.5% (7/40)||50.0% (11/22)||0.0% (0/22)|
|all||560||11.8% (66/560)||6.8% (38/560)||46.5% (236/508)||7.7% (39/508)|
Appendix F Unusual Cases
We list unusual cases encountered during our whole study.
Multiple banners at once — We observed websites displaying two cookie banners, e.g. psicologiaymente.com or matchendirect.fr. On these two sites, each banner seems to follow different regulation (pre- or post-GDPR). Our guess is that publishers forgot to remove the oldest ones.
Multiple banners on different loads — We encountered one specific website (kayak.fr) displaying 4 different banners under different clean browser sessions. These banners provide different characteristics (consent wall or not, existence of a refuse button, access to more specific configurations). Similarly, public.fr displays 2 different banners when loaded several times with a clean browser: one allowing parameters configuration, and one only providing an accept button.
Specifications not followed — CMPs on some websites do not respect the TCF’s specifications at all. On dominos.fr, the __cmp() function is defined, but only ever returns an empty JSON object. express.co.uk sets 24 purposes in the consent string, even though only 5 of them are defined in the TCF and mentioned on the banner’s text.
Banner not displayed on front page — On some websites, such as gamepedia.com, the banner is not displayed on the front page.
consensu.org’s page — While the consensu.org domain is used for global consent cookie sharing across publishers and for consent redirection through its subdomains, its main web pages https://consensu.org and https://www.consensu.org display a generic park page.
Claiming GDPR does not apply — The URL-based consent passing method specification  includes a parameter called gdpr, used to indicate whether GDPR applies. We observe many queries setting this parameter to 0, claiming that GDPR does not apply. As there are many reasons for the GDPR not to apply to a given script, we cannot decide whether such claims are founded.
Extremely tiresome cases — During our semi-automatic crawl, we manually refused consent. Some banners were extremely hard to configure. For instance, the one on rtl.fr will display 8 purposes separated by lists of hundreds of vendors, making it hard to locate the button to disable each purpose. Furthermore, each vendor in each list is preticked, making it extremely tiresome to disable each of them.
Ambiguity of unticked options — Some banners, e.g. Quantcast’s banner on sciencesetavenir.fr, show unticked options when parameters are opened. However, a negative consent is set upon saving, while a positive consent is set if user accepts without opening the parameters. This can lure users into thinking they have nothing to do to set a negative consent, while they actually have to open the parameters to do so.
No choice before acceptance — Some banners, e.g. Evidon’s banner on ticketweb.co.uk, only give the option to define consent preferences after user has accepted tracking: the banner only displays an “accept” button, and reveals the parameters button once this accept button has been clicked.
No implementation — Some websites display a banner of one of the TCF-affiliated CMPs, but do not implement elements from the specification. For instance, dominos.fr displays a classical OneTrust banner, but does not provide a __cmp() function nor a __cmpLocator iframe. We cannot detect these cases in our automatic crawl.
Wrong CMP id — We observe the following incorrect CMP IDs in consent strings: 1, 0 and 4095 (resp. 155, 45 and 3 websites). As of September 2nd 2019, identifiers in IAB Europe’s public CMP list  range from 2 to 265. IAB Europe stated that CMP ID 1 is incorrect and should not be used , which indicates that this is clearly a violation of the TCF. While some CMPs always return a consent string containing an invalid CMP ID, some CMPs only do so before users give their consent, e.g. Conversant Europe on inc.com.
Broken banner – We observe banners on which either refusing or accessing consent is not possible due to a bug on 6 websites. Ex: olympia.ie
Consent to nonexistent vendors — Some CMP set consent for nonexistent vendors in the consent string. For instance, CMP on mycanal.fr sets vendor IDs from 1 to 2000, even though vendor identifiers go up to a maximum of 670 in the GVL as of 09.2019. We observe this issue on 114 websites in the semi-automatic crawl (20.4% of websites).
HTTP only — 95 (7.2%) TCF-websites only provide an HTTP access. It is worrisome that websites using tracking technologies do so on an unencrypted connection.
Unusual consent verification — While monitoring consent verification made by third parties (using browser extensions to override the __cmp() function to catch direct calls, monitor postMessages, GET and POST queries), we observe third parties not registered in the TCF doing so. We separate the case where third parties are trackers according to the Disconnect list. We observe at least one tracker unregistered in the TCF querying the CMP to obtain consent in 43.9% of websites, and at least one third-party unregistered in the TCF querying the CMP to obtain consent in 55.1% of websites. It is unclear why vendors would verify consent if they’re not registered to the framework.