DeepAI
Log In Sign Up

Beyond content analysis: Detecting targeted ads via distributed counting

Being able to check whether an online advertisement has been targeted is essential for resolving privacy controversies and implementing in practice data protection regulations like GDPR, CCPA, and COPPA. In this paper we describe the design, implementation, and deployment of an advertisement auditing system called iWnder that uses crowdsourcing to reveal in real time whether a display advertisement has been targeted or not. Crowdsourcing simplifies the detection of targeted advertising, but requires reporting to a central repository the impressions seen by different users, thereby jeopardising their privacy. We break this deadlock with a privacy preserving data sharing protocol that allows iWnder to compute global statistics required to detect targeting, while keeping the advertisements seen by individual users and their browsing history private. We conduct a simulation study to explore the effect of different parameters and a live validation to demonstrate the accuracy of our approach. Unlike previous solutions, iWnder can even detect indirect targeting, i.e., marketing campaigns that promote a product or service whose description bears no semantic overlap with its targeted audience.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

02/27/2022

Private Location Sharing for Decentralized Routing services

Data-driven methodologies offer many exciting upsides, but they also int...
08/24/2020

Privacy-preserving targeted mobile advertising: A Blockchain-based framework for mobile ads

The targeted advertising is based on preference profiles inferred via re...
07/24/2019

Measuring ad value without bankrupting user privacy

Recent work has demonstrated that by monitoring the Real Time Bidding (R...
01/29/2022

Challenges and approaches to privacy preserving post-click conversion prediction

Online advertising has typically been more personalized than offline adv...
10/31/2021

Optimal Targeted Advertising Strategy For Secure Wireless Edge Metaverse

Recently, Metaverse has attracted increasing attention from both industr...
11/05/2020

Joint optimisation of privacy and cost of in-app mobile user profiling and targeted ads

Online mobile advertising ecosystems provide advertising and analytics s...
02/09/2018

Long-Term-Unemployed hirings: Should targeted or untargeted policies be preferred?

To what extent, hiring incentives targeting a specific group of vulnerab...

1. Introduction

Targeted advertising offers the possibility of delivering tailored advertisements (in the following ads) to users based on their interests and demographic properties. It has helped the advertising industry reach high growth rates and revenues (23.2% growth and $23.9B in Q1 ’18 in US only) (Bureau, 2018), has created lots of jobs, subsidized the delivery of free services, and funded a lot of digital innovation. Of course, to deliver targeted ads, so-called AdTech companies need to detect users’ interests and intentions which is done by monitoring visited pages, searched terms, social network activity, etc.

In its principle, the concept of targeted (or personalized) advertising appears benign: offering to consumers products and services that they truly care about, instead of irrelevant ones that distract or annoy. It is in its implementation and actual use where controversies start arising. For example, tracking should respect fundamental data protection rights of people, such as their desire to opt-out, and should keep clear from sensitive personal-data categories, such as health, political beliefs, religion or sexual orientation, protected by data protection laws like GDPR (European Commission, 2018) in Europe and the California Consumers Privacy Act (CCPA) (State of California, 2018) in US. Similarly, sensitive demographic groups, like children, should be protected from data collection and targeting as mandated, for example, by FTC’s COPPA (cop, 1998) regulation in US. Unfortunately, this is not always the case, as made evident by the continuous presence of the topic in the news and public debates (Brooks, 2017; Rody-Mantha, 2017), or the conducted investigations and placed fines (Schneiderman, 2016). A direct consequence of concern around privacy is the rise in popularity of anti-tracking and ad-blocking tools (adblockplus.org, 2019; adguard.com, 2019; ghostery.com, 2019; noscript.net, 2019; eff.org, 2018). This surge of software can choke the web of its advertising revenues. To avoid a Tragedy of the Commons triggered by eroding privacy (Laoutaris, 2018), companies offering web services need to gain back the trust of their users. An important step in this direction would be to provide users with the ability to single out and report ads that violate privacy norms and laws.

The networking, measurements, and distributed systems community has been active in the development of a new breed of transparency tools (Laoutaris, 2018) for end users, data protection authorities, and the advertising sector’s own self-regulation initiatives. Initial efforts went to detecting online price discrimination (Mikians et al., 2012, 2013; Hannak et al., 2014; Iordanou et al., 2017) followed by tools for detecting targeted advertising (Carrascosa et al., 2015; Lecuyer et al., 2014, 2015; Parra-Arnau et al., 2017; Liu et al., 2013; Venkatadri et al., 2018). Having the ability to verify the deja vu feeling arising when running into seemingly familiar offerings is important for user empowerment and goes beyond mere curiosity. For example, it can allow a user to check whether marketers have respected his opt-out signals, expressed through Do-Not-Track (EFF, 2019), AdChoices (Alliance, 2019), or any of the numerous self-regulation initiatives put forth by the advertising sector. Moreover, if a user runs into an ad related to sensitive, and hence protected, data category, he should be able to verify whether this has been targeted, or is appearing for other reasons. The latter is not always trivial, as with the so called ”re-targeted” ads displaying products and services visited by the user in the recent past. Behavioral targeting can be more general and, for example, display to a user a pair of sneakers of brand B’ at store S’ because the user viewed a pair of sneakers of brand B at store S. A pair of jeans may then be displayed due to the previously viewed sneakers. Detecting such indirect targeting quickly becomes a guessing game, and is, therefore, of little use to the monitoring and enforcement of data protection laws.

Early work on detecting targeting has employed artificially created “personas”, i.e., browsers scripted to visit certain pages that allude to clear demographic types (Carrascosa et al., 2015; Lecuyer et al., 2014, 2015; Bashir et al., 2016), as “bait” for measuring whether ad delivery channels target these demographic types. In the “offline” version of the same approach, researchers have looked at passively collected web click-streams to detect correlations between the pages visited by real users and the ads delivered to them (Wills and Tatar, 2012; Barford et al., 2014; Liu et al., 2013). While these studies have made important contributions, in general, they have been designed to operate offline, at a low scale, or using simulated personas. In this paper, we aim at addressing the following question: “Can we detect ad targeting with real users, in real time, and at large scale?

Our contributions:

In this paper, we propose a novel, scalable, and real-time ad detection approach, and implement it on real user’s devices. Instead of using “personas” and automated bots to collect ads from artificial visits to pages, we rely on a custom protocol for collecting statistics about the actual ads encountered by real users while browsing online. We show that a surprisingly simple count-based heuristic can detect targeting with high precision. This simple heuristic is based on the observation that targeted ads tend to “follow” specific users across multiple domains, while being seen by relatively fewer users than non-targeted ones. The heuristic can be computed in real-time and in a scalable fashion. Furthermore, it is agnostic to how users’ information was collected, and to how impressions were auctioned and delivered. This “black box” approach only looks for correlations between users and advertisements, and is, thus, robust to detection countermeasures 

(Bashir et al., 2018). Evading it, would require eliminating such correlations, which goes against the spirit and the essence of targeted advertising.

Crowdsourcing is a powerful tool for detecting targeted ads, but requires users to report the ads they encounter in different websites. The second technical contribution of our work is a protocol for exchanging this information in a privacy-preserving manner. We leverage a wealth of previous work on privacy-preserving aggregate statistic computation (Castellucciae et al., 2009; Melis et al., 2016; Kursawe et al., 2011), to compute aggregate statistics required by the ad detection algorithm, while keeping the ads seen by users111Note that targeted ads can reveal user interests (Castelluccia et al., 2012). and their browsing history private. In particular, we design a privacy-preserving protocol to compute distribution of ads seen by users. Different from previous work in this area, we face the challenge of protecting not only the distribution of the ads, but the ads themselves. This problem was recently framed by (Fanti et al., 2016)

as the problem of estimating “unknown unknowns”. The techniques presented in 

(Fanti et al., 2016)

are based on differential privacy and require clients to report their “real” distribution as well as distributions computed using n-grams of the labels (in our case, ad URLs). As such, their technique fits scenarios where labels are short and not random. We take a different approach and propose a technique that is less involved and works with any label size.

We have implemented our count-based heuristic and the privacy-preserving protocol to support it in a distributed system that we call eyeWnder. The third contribution of our work is the deployment, validation, and measurement study executed using eyeWnder.

Our findings: We have been operating eyeWnder live for more than one year with close to 1000 users in order to get feedback on matters of user experience and desired features. This user base includes 100 paid volunteers from FigureEight (FigureEight, 2019) who have agreed to lend their data to our validation efforts. We have also conducted extensive simulation studies in which we could control how advertisements are displayed. Our findings are as follows:

- Our simulations show that it only takes 6 to 7 repetitions of an ad to make it detectable by eyeWnder

. Even with such low repetition frequency, our algorithm achieves a false negative rate of less than 30%. Tuning the algorithm can bring the false negative rate to below 10%, at the expense of requiring around 5 more repetitions. More importantly, false positives, i.e., non-targeted ads classified as targeted, are typically close to zero, and reach up to 2% only in the most extreme corner scenario that we have evaluated. Having very low false positives means that when

eyeWnder

classifies an ad as targeted, then it most probably is. This is fundamental, if the tool is to be used for reporting suspected data protection violations.

- The above simulations are aligned with the results from our live validation with real users and advertising campaigns that are not under our control. Count-based ad detection, even with as few as 100 users, yields high precision, with true positive and true negative rates of 78% and 87%, respectively.

- Privacy-preserving crowdsourcing allows to compute the aggregate statistics required by the ad detection mechanism while keeping user browsing history and received ads private. In particular, the privacy-preserving protocol has a negligible effect on the quality of the computed statistics.

- We have detected several examples of indirect targeting that existing content-based techniques (i.e., semantic overlap between the user profile and the received ad) are unable to detect, as well as signs of advertising bias towards different demographic traits.

- We make available our plugin via the Chrome Store, for wide testing and use from end-users, privacy researchers and auditors222http://www.eyewnder.com/views/index.

2. Background and Requirements

2.1. Types of online ads

In the rest of the paper, we will refer to ads as either targeted or non-targeted. Furthermore, we will distinguish targeted ads between Directly- and Indirectly-targeted ads.

Targeted vs. Non-targeted ads: Targeted ads are selected based on data about the user visiting a website. The exact algorithms used are trade secrets of the different AdTech companies but it is widely accepted that such algorithms rely upon user demographic information (gender and age), geo-location information (GPS coordinates, IP address or Base-station id), behavioral information extracted from user online activity (websites visited, searched terms, social network activity, etc.). The form of targeted advertising based on behavioral attributes is referred to as Online Behavioral Advertising (OBA) whereas ads regarding a previously visited webpage offering products or services (e.g., a website offering hotels) is referred to as Retargeting (Bashir et al., 2016).

Non-Targeted ads are shown irrespectively of the user visiting the website. This class of ads includes static ads (shown to all users visiting a website based on a private deal between advertisers and publishers), as well as contextual ads (ads matching the context/topic of the website, e.g., a sports ad appearing on a sports website).

Direct vs. Indirect targeting: Direct targeting is the most obvious form of targeting. In this case, an advertiser interested in selling products of a certain category (e.g., fishing products) targets users tagged as interested in such category (i.e., fishing). Most of previous work has focused on analyzing this type of targeting (Carrascosa et al., 2015; Lecuyer et al., 2014, 2015).

Indirect targeting is applied when marketers may target a certain group of users with offerings that have no direct semantic overlap with the category this group has been tagged with. For instance, it has been reported that fans of the Walking Dead TV series were targeted with pro Donald Trump material (Bradshaw, 2016). This would be an example of indirect targeting, since the targeted user group (Walking Dead fans) has no immediate semantic overlap with the advertised offering. To the best of our knowledge, eyeWnder is the first proposal that tackles indirectly targeted ads.

2.2. Requirements

Next, we enumerate the most important requirements of a system for real-time detection of targeted ads. Such requirements have emerged from reviewing the limitation of existing approaches found in the literature. In Section 9 and Table 3 we provide an extensive comparison between our solution and existing proposals.

Generality: The detection mechanism should be able to analyze any web-based display ad. It should not be limited to a specific ecosystem (e.g., Facebook advertising) where more information about the user or the ad may be available, or platform (e.g., AdChoices initiative (Alliance, 2019)). Moreover, detection should work independently of the tracking mechanisms used for collecting user data, interests, and intentions (be it via cookies, browser fingerprinting, or user contributed information such as searches or social network activity).

Precision: The mechanism should allow untrained users for auditing, and potentially reporting, offending ads. Therefore it is essential to have a high detection precision in terms of True Positives (TP) and True Negatives (TN). Validation should be carried out using publicly available data, i.e., without requiring special access to silo-ed data from AdTech delivery channels, Telcos, marketing campaigns, or other gatekeepers of such information that, generally speaking, have no incentive to make it public.

Simplicity: The detection method should be as simple as possible, so that it can be implemented as a distributed system with most of the functionality and code running on the end-user device (browser).

Real time operation: A user should be able to request auditing of a particular ad appearing in his browser, and the system should respond within at most few seconds.

Scalability: The detection method should be able to handle a large number of users (in the order of tens of thousands) without special requirements on the back-end, including CPU load, memory, storage, and bandwidth consumption. Resource consumption must be limited to “control plane” rather than “data plane” tasks, to make the method scale.

Ad-fraud avoidance: Differently from previous work (Carrascosa et al., 2015; Lecuyer et al., 2014, 2015), the method should avoid fake visits to pages, since fake visits contribute to fake ad impressions and click-fraud.

Detection of indirect targeting: The detection method should be able to detect both direct and indirect targeting.

User privacy protection: The method should not jeopardize the privacy of end-users by, e.g., requiring them to share sensitive data such as their browsing history or ads seen.

3. Ethical considerations

We have obtained ethical approval from our institutions and the funding agencies to conduct this research. Moreover, we have obtained explicit consent from users, before installing our extension through the FigureEight platform interface (FigureEight, 2019), to collect and process the anonymous data used in this paper.

4. A Count-based algorithm

In this section, we describe a count-based algorithm for detecting targeted ads using only frequency counts of impressions seen by users across different domains. The algorithm is inspired by simple observations on how targeted ads behave, namely 1) targeted ads tend to “follow” targeted users across multiple domains, and 2) targeted ads are seen by relatively fewer users than non-targeted ads.

4.1. Algorithm description

Our algorithm is simple and is based on the above remarks. By observation (1), if a given ad is targeting a specific user , that user is likely to encounter across multiple domains. Therefore, the algorithm counts the number of different domains () where user has seen , and labels the ad as targeted if that number crosses a threshold that we call . Similarly, by observation (2), if is targeting , most likely very few other users will see during their browsing activity, i.e., only users that share similar interests with . Therefore, the algorithm counts the number of different users () that have seen and labels as targeted if that number is below the threshold. Note that the algorithm annotates an ad as targeted if both conditions holds.

Given an ad , the number of domains where a user has seen , along with the corresponding threshold are dependent on user and, thus, can be computed locally. On the other hand, computing the number of different users that have seen , as well as the threshold requires a global view of the system.

4.2. Algorithm details

Threshold estimation:

A fundamental design choice of our algorithm is how to compute the and

thresholds. We empirically evaluated different options based on several moments of the distributions (the mean, the median, the standard deviation, and possible combinations thereof). We eventually settled for the mean of the distribution since it offered the best trade-off between accuracy and the data we require from our users. For the sake of clarity, we do discuss all the alternatives we have considered but, in Section 

7.2.3, we demonstrate the performance difference for a few of these options and show why we settle for the mean.

In order to be able to set the threshold, we require a minimum amount of information from users. Similar to (Silverman, 1986), we require that users have visited at least 4 domains that serve ads within the last 7 days. If this minimum requirement is not met, our algorithm refrains from making a guess for lack of sufficient data.

Time-window selection: Our algorithm operates in time intervals in order to update the threshold, since users can continuously receive and report ads during their normal browsing sessions. We select a time window of one week based on the following observations. First, users tend to browse differently during weekdays and weekends (Boughareb and Farah, 2011). Therefore, we consider a week as a natural time period where both weekdays and weekend are captured, allowing us to study users’ behavior in both types of days. Second, we consider how targeted ads behave, which aggressively follow the user for a few days and gradually fade-out over time. Thus, a window of seven days is sufficiently large to capture targeting and allows to collect enough historical information for the algorithm to operate. In fact, we directly contacted 4 large DSP platforms and confirmed that the majority of ad-campaigns they serve last a week or more. Note that the threshold is per individual user, thus, can be updated in real-time within each user’s browser.

5. The eyeWnder system

In this section, we describe the high level components of the system and the information flows between them.

The high level architecture of eyeWnder is depicted in Figure 1. The system consists of four components: the browser extension instances, a back-end server, a centralized database, and a crawler server.

Figure 1. Architecture overview of the eyeWnder system and the information flow between entities.

Browser extension: The extension performs the following functions: (1) Collects information about the ads rendered to the user. (2) Reports information to the back-end server (Figure 1, arrow 1) through the privacy-preserving protocol of Section 6. (3) Classifies ads as targeted or non-targeted, by leveraging the algorithm of Section 4.1.

The actual algorithm that classifies ads as targeted vs. non-targeted consists of just a few lines of JavaScript code. Identifying and collecting info about ads is more involved. Indeed, different ad delivery channels use a multitude of techniques for delivering display ads. Several of them go at great lengths to make programmatic detection of their code difficult, in an attempt to evade ad-blocking software. Our extension runs an ad-detection algorithm to automatically detect ads within a page, and a landing page detection algorithm to infer the pages where ads lead once clicked. Our ad-detection algorithm is similar to the one of AdBlockPlus (adblockplus.org, 2019). Our goal, however, is just to analyze an ad, and not to block it.

In order to avoid click-fraud, the landing page detection algorithm does not click on the ad, but rather applies a series of heuristics to discover the landing URL. In particular, the algorithm examines <a> HTML tags to extract the URL from the href field; alternatively, the algorithm looks for onclick events and extracts the URL if it exists333In some cases the onclick event is redirected to a JavaScript function instead of a URL redirection.. In case of JavaScript code, we run a regex to detect any URL-like strings within the script text. In all the above cases, if the detection algorithm finds a URL that does not belong to well-known ad networks, we consider the URL as the ad landing page. Otherwise, we refrain from resolving the URL in order to avoid click-fraud. A similar technique was used in  (Javaid et al., 2017; Parra-Arnau et al., 2017). In case of ads with randomized landing page URLs (e.g., malicious ads (Thomas et al., 2015) or customized dynamic ads (Amazon Technologies, 2016)), we use the ad content (i.e., the image URL, etc.) to uniquely identify the same advertisement across different impressions. To identify ad networks that use randomized landing page URLs we use the methodology described in (Butkiewicz et al., 2015).

Back-end server: The server maintains the counters and computes the corresponding threshold. By leveraging the privacy-preserving protocol of Section 6, the server receives blinded reports from the extensions, aggregates them, and extracts an estimate of the number of users that have seen an ad—thereby computing the counters. The server also computes the threshold by applying the methodology of Section 4.2; the computed threshold is then distributed to clients (Figure 1, arrow 5).

Database: We use MySQL to store system metadata, such as, active users within the system, historic anonymized data reported by users, etc.(Figure 1, arrow 2). We also store aggregated data that we need for evaluation purposes.

Crawler server: This component is used only for evaluation purposes, and is responsible for collecting ad-related data on specific webpages. The crawler is controlled by the back-end server and can be instructed to visit specific websites upon request (Figure 1, arrow 3). Specifically, the crawler server visits audited pages to collect ads with a clear browsing profile (empty browser cache and an empty set of cookies). These ads are then used for deciding whether eyeWnder has indeed classified accurately an ad as targeted (in which case the crawler should not encounter it during a visit). The crawler can launch multiple instances of a clean profile browser with the eyeWnder extension installed, and store the detected ads directly into the database (Figure 1, arrow 4).

6. Privacy Preserving Protocol

In this section, we detail a privacy-preserving protocol that allows the back-end server to compute the counters while clients keep the ads they have seen private.

We leverage techniques used in many proposals for privacy-preserving aggregated statistics (Castellucciae et al., 2009; Melis et al., 2016; Kursawe et al., 2011). The basic idea is that each user blinds his report before sending it to the server. Blinding factors are agreed upon by all users and are such that if the server aggregates all reports, the blinding cancels out and the aggregate statistic (e.g., the sum of all reports) becomes available to the server. In a nutshell, one can think of the blinding factors as additive random shares of

. If users report a vector of values, they should compute separate blindings for each position of the vector. One fundamental assumption underlying the design just described is that all parties in the system (

i.e., users and the server) can enumerate the whole set of elements to be reported. In our scenario, this set—we denote it by —includes all ads seen by at least one of our users. Therefore, its size may be large and, most importantly, users may not be able to enumerate it. For example, user Alice may not know what ads have been seen by user Bob. In such settings, one could use synopsis data structures for multi-sets that admit aggregation. For example count-min-sketches (Graham and Muthukrishnan, 2005) (CMS) or spectral bloom filters (Cohen and Matias, 2003) can be used. In this work, we use CMS as they allow us to bound the probability of error, as well as the error itself.

6.1. Count-min-sketch

A CMS X is a bi-dimensional array with rows and columns, where is the number of elements to be counted. All the cells of the sketch are initialized to 0 and pairwise-independent hash functions are chosen.

The encoding of an element is done by calling X.update() that increments by , for .

The estimated frequency of element is retrieved by calling X.query() that outputs such that:

  1. with probability

where is the true frequency of element .

In eyeWnder, each user encodes the set of ads he has seen in a CMS data structure, and blinds each cell before sending it to the server. The server aggregates all CMSes so that all blindings cancel out, and obtains the aggregate CMS encoding the multi-set of ads seen across all users. If users share an additive random share of for each cell of the CMS, this design allows for a privacy-preserving aggregate statistics framework for scenarios where the set of elements to be counted is (a) large and (b) not enumerable by each party.

This design is similar to the one shown in (Melis et al., 2016), that in turn, extends techniques presented in (Kursawe et al., 2011). However, the clients in (Melis et al., 2016) can enumerate the set of elements to be reported and, therefore, (Melis et al., 2016) presents a less challenging scenario.

We also face an additional challenge, that, to the best of our knowledge, has not been addressed by existing privacy-preserving aggregate statistic techniques. Data structures like spectral bloom filters or CMS allow to query for the estimate frequency of a given element . That is, the querier (i.e., our server) must enumerate all the ads encoded in the aggregate CMS, in order to learn the distribution of ads as seen by users. Of course, the server cannot do so and our privacy goal forbids clients from sending the URLs of the ads to the server. We overcome this problem as follows.

We map the URL of an ad ID in by means of a pseudo-random function (PRF). The latter is keyed to prevent the server from computing the mapping on its own. In particular, given an ad URL , its ad ID is computed as where is a PRF and is a cryptographic key. For each ad seen by a user, the extension computes the corresponding ad ID and encodes it in the CMS. Note that without knowledge of , it is not possible to relate ad URL to its identifier .

Rather than hard-coding the key in the extension, we borrow from previous research on Oblivious Pseudo-Random Function (OPRF) (Freedman et al., 2005) and introduce an additional server to help clients mapping ad URLs to ad IDs. The oprf-server holds the secret key and aids clients to compute for a given ad URL . As the name suggests, the server is “oblivious” to the input of the PRF so that remains private to the user.444We note that in order to avoid a single point of failure, mapping function can be distributed to multiple servers by defining as the XOR of the output of multiple OPRFs, each computed with its own secret key. While such a design choice requires an additional server, we note that previous work uses a similar approach in order to improve the overall security of an application (Everspaugh et al., 2015; Keelveedhi et al., 2013). In a real-world deployment, the oprf-server may be instantiated by already-deployed trusted-third parties such as certification authorities, EFF, etc.

In practice, we have to (over)estimate in order to minimize collisions when mapping an ad URL to an ad ID. However, by overestimating , the server is likely to query the CMS for ad IDs that correspond to none of the encoded ads (i.e., a false positive). Nevertheless, later we show that a CMS is robust to false positives by design.

In the following, we provide details of the protocol.

Blinding factors: We borrow from Kursawe et al. (Kursawe et al., 2011) who have shown how a set of users can agree on random shares of . In particular, let be the number of users and be the number of elements to be blinded (e.g., the number of cells in a CMS). Also, denote by , the private and public key of user , respectively. Here, is a generator of a cyclic group of order , where Computational Diffie Hellman is hard. Assume that the public key of each user is available to all other users in the system, e.g., by means of a public bulletin board like an online forum555The board may be as well hosted at the back-end server.. At round , user generates a blinding factor for the -th cell as:

where is equal to if , or to otherwise. Note that each user can locally compute his blinding factors by simply using the public keys of all other users. Note also that for any , we have , i.e., is an additive random share of .

OPRF: We leverage the RSA-based OPRF in (Jarecki and Liu, 2009). Given an RSA triple where N is the product of two distinct primes of sufficient length and are such that , the PRF on input is defined as where is a hash function mapping arbitrary strings to elements of , and is a hash function mapping elements of to strings of arbitrary length . The oprf-server generates the RSA triple (via a suitable key-generation algorithm) and publishes while keeps private. Given an ad URL , the client issues a request as . That is, the client maps to and blinds the result by multiplying it with a random group element raised to the -th power. The server “signs” the request by computing . Finally, the server recovers and outputs . Note that since . The protocol guarantees that the server learns nothing about whereas the client learns nothing about , and it is a PRF under the one-more RSA assumption (Jarecki and Liu, 2009).

CMS computation: User starts with an empty CMS . For each newly received ad , the user engages in an OPRF protocol with the oprf-server to obtain the corresponding ad ID , and encodes in the CMS by calling . When asked to report its CMS, the user blinds each cell of with the blindings computed as shown above. That is, the client computes the blinded CMS by computing for where is the number of cells in the CMS. Finally, the client sends to the server.

Aggregation and unblinding: Finally, the server aggregates all received CMSes by computing where the sum is cell-wise and, for each ad ID it obtains the estimated frequency by querying X.query().

Fault-tolerance: Our aggregation technique requires that all users report their blinded CMSes, so that the server can aggregate them and cancel out the blindings. If a user fails to report its CMS, aggregation of the remaining ones at the server results in a CMS with random noise in each of its cells. In order to tolerate missing reports, the server and the clients who have sent their reports must go through an additional round of interaction to “adjust” their blindings and cater for the blindings of the non-reporting clients (see  (Melis et al., 2016)). The protocol takes only two rounds—one where the server reports the list of “missing” clients and another round where the clients send their CMSes obfuscated with the updated blinding factors.

Security: The security of our scheme follows from the security of the protocol to compute blindings proposed in (Kursawe et al., 2011). We tweak the protocol by using an OPRF to map ad URLs to a set that is enumerable by the server. By the security provisions of the OPRF protocol, given an ad ID , it is impossible to retrieve its original URL without knowledge of the key held by the oprf-server.

We stress that our protocol remains secure against an honest-but-curious server, and this is an assumption common to many other systems that address privacy issues when computing statistics on crowdsourced data (Castellucciae et al., 2009; Melis et al., 2016; Kursawe et al., 2011).

7. Evaluation

In this section, we first evaluate the overhead of our privacy preserving protocol. Second, we present a controlled simulation study to assess the robustness of our algorithm and the effect of key parameters on its performance. Third, we present a live validation of the entire system with real ads and the 100 real users recruited via FigureEight (FigureEight, 2019).

7.1. Performance and overhead of the privacy preserving protocol

Figure 2. The effect of the privacy preserving protocol on the computation of the #User distribution and its threshold for three different weeks.

First, we assess the communication overhead due to the CMS and compare it with the average communication overhead if clients were to upload their contributions in cleartext. We fix and to and assume the size of a cell in the CMS to be 4 bytes. The size in bytes of the CMS totals to 185, 196, and 207KB, for an input size666The number of ads to be counted. of 10k, 50k, and 100k, respectively. If users were to report their contribution in cleartext, each user would simply report a vector of URLs. Since our dataset suggests that users see 35 unique ads on average, the communication overhead for the average user amounts to roughly 3.5KB (assuming 100-characters URLs using Unicode encoding). Nevertheless, some users with a large number of ads to report can see their communication overhead rise up to hundreds of KB (some users in our dataset reported around 250 unique ads).

Next, we assess the communication and computation overhead required to compute the blinding factors. Exchanged data between the server and a client amount to 0.38MB and 1.9MB for 10k and 50k users, respectively. The computation time at the client totals 30 seconds for 1k users and a sketch of size  5k. Our results are in line with results reported in (Melis et al., 2016). We stress that both operations are carried out once per week and can run in the background.

Also, the time to map the URL of an ad to its ID, using the oprf-server, is always less than ms and requires exchanging two group elements (e.g., 1024 bits each). We stress that the mapping is done once per (unique) ad. It can be carried out as ads are received and results can be stored locally so that they are available when the CMS must be computed.

Finally, we empirically show the effect of the privacy preserving protocol on the computation of the distribution and its threshold. Figure 2 compares the distribution computed with cleartext reports versus the distribution computed using the privacy-preserving protocol based on (blinded) CMSes. The figure shows the difference using data from three different weeks in our dataset. Furthermore, the figure reports the threshold value computed on cleartext data, and the threshold value computed on the outcome of the privacy-preserving protocol. The latter leads to a slightly higher threshold and this is due to the collisions that may happen when mapping ad URLs to ad identifiers.

To sum up, the overhead due to privacy-preserving protocol does not impose an unbearable toll on users. The protocol to map ad URLs to identifiers is very lightweight and the mapping is carried out as new ads are encountered. The protocol to report the CMS requires a few (i.e.2 or 3) MB of data to be exchanged, assuming 50k users. This is done once per week and the communication complexity scales linearly with the number of users. Further, the error on the estimated distribution due to the privacy-preserving protocol is small as shown in Figure 2.

7.2. Controlled simulation study

The count-based algorithm of Section 4.1 and its automated parameter tuning described in Section 4.2 can lead to false negatives, i.e., targeted ads classified as non-targeted, and false positives, i.e., non-targeted ones classified as targeted. In this section, we study the circumstances and the frequency of such misclassification via controlled simulation experiments. Out of the two metrics, the most important are the false positives. This has to do with a major use case for eyeWnder, which is to report illegal targeting, as explained in the introduction. Therefore, when eyeWnder classifies an ad as targeted we want this to be precise with high probability so that investigations are not triggered by mistake. On the other hand, failing to detect a targeted ad is relatively less important since no investigation is launched in this case.

7.2.1. False negatives

As described earlier, our algorithm uses the fact that the same ad “follows” a user across multiple domains as an indication of targeting. But how intense should this following be before our algorithm has a chance to detect it? If a targeted ad appears only once, then certainly it is indistinguishable from any non-targeted ads. Considering the other extreme, if it appears in all the pages visited by a user, then it obviously becomes easy to detect. Next, we study the effect of the Frequency Cap i.e., the number of repetitions (or re-appearances) of a targeted ad on the ability of our algorithm to detect and classify it correctly. Notice that this parameter is not known to us, nor uniform across advertisers and therefore we study its effect in our simulation model by assigning to it different values. The Frequency Cap is used by advertisers to avoid annoying targeted users with too many repetitions of the same ad. In our validation study appearing later in Section 7.3 we demonstrate that our system is indeed robust to the magnitude of values used in practice, but remaining unknown to us.

7.2.2. False positives

The same ad can be encountered in multiple websites without being targeted. Such is the case, for example, of large-scale “brand awareness” campaigns paid by large corporations to display their offering in many mainstream and even niche websites without targeting any particular individual. Such ads will appear to be chasing a user across websites, when in reality they are not. Of course, our algorithm also checks to see that the ad is not seen by many other users of eyeWnder, but non uniform user interests can lead to misclassification as shown next.

7.2.3. Simulation results

We have built a custom simulator, based on (Burklen et al., 2005), capable of simulating users, websites, and ad campaigns. The main parameters of the simulator and pre-set values for our basic configuration, are depicted in Table 1.

Variable name Value
Number of users 500
Number of websites 1000
Average user visits 138
Average ads per website 20
Percentage of targeted ads 0.1
Table 1. Simulation configuration parameters
Figure 3. False Negatives % Vs. Frequency Cap using two different thresholds (Mean, Mean+Median) for both variables (, )

Figure 3 shows that even few repetitions of a targeted ad make it detectable by our algorithm with high probability. Setting the and thresholds according to the Mean value of the corresponding counters (see Section 4.2) brings false negatives to below 30% with just 6-7 repetitions of an ad. Using as threshold the Mean+Median value requires a higher number of repetitions for detection but drops false negative even further to 10%. Our experiments with real users in the next section are in agreement with these results. In those experiments we set the thresholds using Mean to make sure we can have detection even with a smaller number of repetitions for a targeted ad.

Regarding false positives, we have run several different simulations in which a subset of users visits a subset of sites that happen to be running large static campaigns. These users get to see the same ad in different websites not because they are being targeted but because the ad happens to be in all the websites they visit. If the remaining users (the majority) visits different websites, then misclassification may appear for the initial subset of users. Still, this happens with probability below 2% in more than 30 different parameter configurations that we have tried. This result is also validated by our experiments with real users. False positives can be further reduced by grouping users in more homogeneous groups in terms of browsing patterns (e.g., geographically or based on age group, etc.).

7.3. Live validation of eyeWnder

Figure 4. The evaluation tree for precision performance of the eyeWnder classification of targeted ads.

Our simulation results have shown that when eyeWnder classifies an ad as targeted then this is accurate with very high probability (false positives ¡2%), whereas when it classifies an ad as non-targeted the accuracy is again high (false negatives 10-20%). In this section we want to validate the above results with a live experiment involving real users and campaigns. This is inherently difficult due to the lack of publicly available ground truth about which ads are targeted and which not (see Requirements in Section 2.2). The situation becomes even more challenging when the evaluation has to be performed for a system used by real end-users (as opposed to offline evaluation based on network traces), over a variety of ad delivery channels (as opposed to a single channel, e.g., facebook ads (Arrate et al., 2019; Ribeiro et al., 2019; Venkatadri et al., 2018)). Still in this section we perform such a live validation and show that it leads to consistent results with those of our simulation study.

7.3.1. Datasets

We use three different datasets. The first dataset is created with the help of a crowdsourcing platform named FigureEight (FigureEight, 2019). We collected data during three consecutive weeks from a population of 100 users with varying level of activity. The dataset includes in total 6743 ads. We call this dataset the “eyeWnder dataset” and we use it as input to the count-based algorithm that classifies the collected ads. In addition, we also ask the FigureEight users to label the ads that they receive as targeted or not. For this purpose, we instruct the users to provide labels to all the ads they receive while surfing the web. We call this subset of labeled ads the “F8 dataset”.

The last dataset we use is collected by the crawler during the same three weeks. It includes several visits and ad collections to any website in which eyeWnder has classified an ad777Note that for evaluation we are using full information on our test users after having been granted full consent. The privacy preserving protocol is developed for the actual operation of the system beyond evaluation.. We call this dataset, “CR dataset” and use it to identify statically placed ads as done in (Carrascosa et al., 2015; Parra-Arnau et al., 2017).

7.3.2. Methodology

Figure 4 presents the work-flow of our evaluation methodology. The idea is to compare the classification derived by eyeWnder against the classification from the crawled dataset (denoted by CR in the figure), the classification from a content-based detection heuristic (CB)888We have adapted the methodology of (Carrascosa et al., 2015) to operate with real users instead of personas. For each user, we have selected the most significant categories of the pages he visits to create his profile. We used categories appearing at least T times in different websites. We have used T = 20 since in the context of this paper we are seeking precision rather than recall. We classify an ad as targeted if the main category of its landing page (as obtained from AdWords) matches one of the categories in the user profile., and the classification from FigureEight users (F8). These comparisons yield False Positives (FP), False Negatives (FN), True Positives (TP), True Negatives (TN), as well as UNKNOWN rates. As we will explain shortly, results derived by comparing against the crawler are correct with high probability (namely FP(CR), TN(CR)). Results derived by comparing against the content-based heuristic or FigureEight are likely correct (namely TP(CB), FN(CB), TP(F8), TN(F8), FP(F8), FN(F8)). The justification for the above is that the content-based heuristic and FigureEight are themselves subjective means of classifying an ad. Users have limitations in detecting bias or discrimination (Plane et al., 2017), whereas the content-based heuristic is just another heuristic whose validation faces the same challenges with the validation of eyeWnder. Since, however, the heuristic is a reasonable one, and the users have been selected carefully, we assume that their classifications will be more right than wrong. Ads falling under UNKNOWN are ads for which it is impossible to assess precision by the previous means. We handle them separately in Section 7.3.3. Next, we traverse the evaluation flow-chart of Figure 4 from top to bottom, starting from the right branch.

Ads classified as targeted: If an ad is classified as targeted by eyeWnder, we check to see if the crawler happens to see the same ad. If this happens, then we have a false positive with high probability, since targeted ads should not be encountered by a crawler having empty browsing history.

If the crawler does not see the ad, then we proceed to check if the ad shares any semantic overlap with the user, using the methodology described in (Papaodyssefs et al., 2015). If it does, then, and only then, we check to see if the content-based heuristic also classifies it as targeted. We do this check in order to identify the set of ads in which eyeWnder and the content-based heuristic have a chance to agree. Such ads can only be Direct targeted ads because the content-based heuristic can only detect those (see Section 2.1). In our evaluation, we check for semantic overlap using the content-based heuristic which implies that if the ad has semantic overlap with the user, eyeWnder and the content based heuristic will agree by default999It might appear as redundant to check both for semantic overlap and agreement with the content-based heuristic, since we use the latter also for semantic overlap. We have kept both stages for generality, since semantic overlap could be checked with alternative methods than our content-based heuristic. yielding a likely true positive.

If an ad does not share semantic overlap with the user, we check to see if FigureEight users have tagged it. Notice here that FigureEight users have classified only a subset of the ads that they have seen. If they have not provided a classification, then the ad is added to UNKNOWN. Otherwise, if their classification agrees with eyeWnder, we have a likely true positive, else a likely false positive.

Ads classified as non-targeted: If an ad that eyeWnder classifies as non-targeted is seen by the crawler, then this produces a true negative. If the ad has semantic overlap with the user, then, as explained before, the content-based heuristic will classify it as targeted, thereby yielding a likely false negative for eyeWnder. If the ad does not have semantic overlap with the user, we check to see if FigureEight users have classified it. If they have not, it goes to UNKNOWN. If they have classified as targeted, it produces a false negative for eyeWnder. If they have classified it as non-targeted, then we have a likely true negative for eyeWnder.

7.3.3. Dealing with the unknown

Due to the lack of ground truth, a high rate of classified ads (148 targeted and 4219 non-targeted) have ended up in the two UNKNOWN groups of Figure 4, for which we cannot evaluate precision using the crawler, the content-based heuristic or FigureEight. In this section, we perform extra analysis to resolve non-targeted UNKNOWNs into likely-TN or likely-FN, and the targeted UNKNOWNs into likely-TP or likely-FP.

– Non-targeted UNKNOWN: In this case, both count-based and content-based classification have identified these ads as non-targeted. However, these ads were not manually tagged by FigureEight users. Hence, we select a random set of 200 of these ads and manually inspect them. In particular, we consider the profile of the user receiving the ad, in order to manually determine if the ad is targeting such a profile.

– Targeted UNKNOWN: This group includes 148 ads that eyeWnder classified as targeted, but CB and FigureEight users did not. Hence, they may be either FP or some form of targeting (e.g., indirect) that escapes CB or FigureEight users. A preliminary manual inspection showed that several of them seemed to be retargeted ads (Bashir et al., 2016). To verify this, we manually visited the landing page associated to each ad, and afterwards we visited some of the domains where the ad re-appeared according to our dataset. The experiment was set up for testing the repeatability of the suspected retargeting. When the experiment led to retargeting, it meant that our initial guess was correct, and we considered the classification to be a likely TP.

For the remaining ads, we evaluated if they could be indirect OBA ads. To this end, we performed a correlation analysis between the topics of the ad’s landing page and the profiles of the users receiving that ad. If there exists statistically significant correlation between some of the ad’s and the user’s topics, but these topics are not semantically overlapping, then we interpret it as likely indirect OBA ad, and the classification is a likely TP. Some examples of ads that we identified as indirect OBA are the following: (1) Male users who exhibit interest in computers, electronics, cars, etc., but receiving ads from a dating website (perhaps a classic indirect targeting of single male users). (2) Users with interest in computers, electronics, and programming, but receiving ads from KFC, a famous fast food restaurant. (3) Users with interest in websites related to beauty products, fitness, body care, etc., receiving ads related to seafood. Finally (4) users who showed interest in governmental websites, internet services, insurance services, etc., receiving ads related to real estate and housing.

7.3.4. Results

The rates in Figure 4 along with the results from our analysis of unknown ads reveal that the overall rate of likely TPs is 78%. From these, 10% have been identified by CB or FigureEight users. The rest are associated to retargeting or indirect OBA. Our analysis also reveals a TN rate of 87%. In particular, 27% are highly probable TNs, since these ads have been marked as non-targeted by both, eyeWnder and the crawler. The remaining 60% are likely TNs. This percentage has been derived from our manual inspection of non-targeted UNKNOWN.

Based on these results, we claim that our method meets the high precision requirement defined in Section 2, which guarantees that, for instance, users willing to report a privacy incident related to an ad have high confidence that the ad is indeed targeted.

We acknowledge that there is still room for improvement, since, as the evaluation shows, eyeWnder sometimes fails to detect ads classified as targeted by CB (416) or F8 (91). This is mainly due to two factors. First, as we further discuss in Section 10, we have evaluated eyeWnder with just a few users, which is close to a worst-case analysis given the crowdsourced nature of the system. Second, eyeWnder has been configured to maximize the precision rather than recall for reasons discussed in Section 2. We are confident that with a bigger dataset, our existing count-based detection would perform better. We could also come up with more elaborate heuristics to improve performance. Our objective in this paper, however, is to make the point that even with few users, simple count-based detection yields precise classification for both directly and indirectly targeted ads.

Evading detection of targeted ads: Finally, in this paragraph we discuss the situations where advertisers are intentionally trying to evade the detection of targeted ads. Our system is robust to that since there is a lot of inherent randomness in the current way ads are delivered, yet eyeWnder identifies such ads with good accuracy even with few hundreds of users. For an adversary to defeat detection it will have to effectively give up targeting or do it very mildly which would, effectively, go against the very idea of targeting and his business model.

8. Socio-economic biases

In this section, we leverage the social, economic and demographic information we have from the volunteers and the datasets of Section 7.3.1, to look for potential socio-economic biases of ad targeting.

Variable OR SE Z-val P>—z— 95% CI

Gender

female 0.255 0.407 -3.356 8e-4 0.107-0.539
male 0.174 0.383 -4.566 5e-6 0.076-0.348

Income

30k-60k 1.446 0.145 2.538 0.0111 1.088-1.924
60k-90k 1.521 0.187 2.249 0.0245 1.052-2.187
90k-… 0.525 0.343 -1.878 0.0603 0.257-0.996

Age

20-30 1.031 0.407 0.075 0.9404 0.488-2.450
30-40 1.428 0.405 0.880 0.3790 0.679-3.388
40-50 1.964 0.422 1.599 0.1098 0.899-4.788
50-60 0.745 0.489 -0.601 0.5475 0.291-2.022
60-70 2.654 0.477 2.044 0.0409 1.069-7.087

OR: Odds Ratio, SE: Standard Error, CI: Confidence Interval

, , ,

Table 2. Logistic regression modeling for targeted ads

8.1. Logistic regression analysis

The demographic data provided by our volunteers include information such as gender (), age (), income level (), employment status (), etc. We want to investigate these independent, nominal and ordinal factors, and how they associate with the type of advertisement delivered to the participating users. In practice, the type of ad is the dependent variable () in our model, receiving a binary status of “static” or “targeted” advertisement. Using these input data, we perform a binomial logistic regression on the 4 independent variables to model the 1 dependent variable, in the form: .

In reality, we tested several configurations for the model, including pairwise interactions between all independent factors, as well as removing incrementally each one of them to test if the model improved its predictive performance. In fact, in the case of “employment status”, it was removed from the model as it was deemed non-useful with an likelihood ratio test, with non-significant impact in the final produced model. Here, we only report the experimental setup that yielded a logistic regression model with most statistically significant results, in form: , considering 0-30k and 1-20 the base levels for income and age, respectively.

Figure 5. Predicted probability for a targeted advertisement to be delivered to a user, vs. three independent variables with statistically significant levels.

8.2. Findings

The results of the logistic regression are shown in Table 2. Also, in Figure 5 we plot the effects of the three variables and all their corresponding levels, with respect to the expected probability for a user to receive targeted ads. From these results, we make the following observations:

– Gender bias: From our analysis, we find statistically significant effect on the factor of gender. In particular, women are more likely to be targeted by advertisers than men, and this result is significant at .

– Economic bias: Our results show that as the income level of a user increases from to , and from to euros, he or she is more likely to be targeted by advertisers, and this is statistically significant at . However, when the income level becomes too high (i.e., above euros), the expected probability of targeted advertising is reduced. We conjecture that advertisers profile users to construct economic capacities for each one, and adjust their targeting campaigns accordingly to optimize click-through and sale rates. Online advertisements targeting very wealthy users may have proven to be less profitable than other income brackets, and thus, such users are less targeted.

– Age bias: From the odds ratio scores, it appears that increasingly older users are more probable to be targeted by advertisers. However, some of the age brackets are less populated by ads and users and thus, our results are not statistically significant, but only demonstrate a consistent trend.

9. Related Work

(Datta et al., 2014) (Barford et al., 2014) (Liu et al., 2013) (Carrascosa et al., 2015) (Lecuyer et al., 2014) (Lecuyer et al., 2015) (Parra-Arnau et al., 2017) eyeWnder
Fake Impressions
Click-fraud *
Privacy-preserving
Real-users
Personas
Operates in Real-time
High scalability
Operates Offline
Topic-based
Correlation-based
Count-based

Negative, Positive, Neutral, * Unspecified - (Within the context of this work)

Table 3. Comparison of characteristics of main targeted ad detection solutions.

Table 3 summarizes and compares existing proposals for detecting targeted advertisements. From a methodological view, these solutions can be grouped into topic-based (Barford et al., 2014; Liu et al., 2013; Carrascosa et al., 2015; Parra-Arnau et al., 2017), and correlation-based detection (Datta et al., 2014; Lecuyer et al., 2014, 2015).

Topic-based solutions perform content-based analysis to extract the relevant topics on a user’s browsing history and the ads he receives. Then, using different heuristics and statistical means, targeted ads are identified as those having topics that share some semantic overlap with the user’s browsing history. Topic-based detection could, in principle, be applied to real users, as we have done for evaluation purposes in Section 7. Existing work, however, has only used it in conjunction with artificially constructed personas, i.e., robots that browse the web imitating very specific (single-topic) demographic groups (Barford et al., 2014; Carrascosa et al., 2015), or to emulate real-users offline using click-streams (Liu et al., 2013).

The only topic-based solution meant to be used by real users is MyAdchoice (Parra-Arnau et al., 2017), which has been implemented in the form of a browser extension. This extension is available only under request, and based on the information reported in the paper, it has been only used in a beta-testing phase by few tens of friends and colleagues. Independently of the specific pros and cons of individual solutions, topic-based detection presents some common limitations. The most important being that it can only detect direct interest-based targeted advertising. It is unable to detect other forms of targeting based on demographic or geographic parameters, as well as indirect targeting (see Section 2.1 for definitions).

Correlation-based solutions treat the online advertising ecosystem as a blackbox and apply machine learning and statistical methods to detect correlations between the browsing behavior and other characteristics of a user (OS, device type, location,

etc.) and the ads he sees. For instance, XRay (Lecuyer et al., 2014) and Sunlight (Lecuyer et al., 2015) create for each persona several shadow accounts. Each shadow account performs a subset of the actions performed by the original persona. By analyzing the common actions performed by shadow accounts receiving the same reaction from the ecosystem (i.e., the same ad), the authors can infer the cause of a targeting event. AdFisher (Datta et al., 2014) uses similar concepts to find discrimination practices, for instance, in the ads shown to men vs. women. As with topic-based detection, this technique presents important challenges related to scalability and practical implementation. Moreover, they are not suitable for real-time targeting detection. With the exception of (Parra-Arnau et al., 2017), no previous work has been implemented as a tool for end-users. Most of them, including (Parra-Arnau et al., 2017), rely on content-based analysis, thereby suffering from scalability issues and inability to detect indirect targeting.

Parallel to efforts from the research community, the European Interactive Digital Advertising Alliance (EDAA) has developed YourAdChoices (Alliance, 2019). It is a self-regulation program in which companies that deliver targeted ads voluntarily add an icon that, if clicked by a user, offers some form of explanation of why the user received the ad. This technique scales and works in real time. It only works, however, with companies participating in the program. It also assumes full trust on the reported explanations, something that has been challenged by recent works (Datta et al., 2014). eyeWnder offers the opportunity to conduct independent audits, which are useful for end-users, data protection authorities, as well as for the credibility of ad-choices and related self-regulation initiatives.

10. Conclusions and future work

We have showed that a simple count-based heuristic can detect targeted advertising without having to perform complex content-based analysis, i.e.without having to understand semantically webpages or received ads. To be able to run such an algorithm, one needs a crowdsourced database of how many users have seen each ad. Such a crowdsourced database can be built efficiently and without jeopardizing user privacy, i.e., without requiring users to report the actual ads they have seen, nor the websites where they encountered them. Our count-based heuristic and privacy preserving crowdsourced approach have been implemented in a first of its kind distributed system called eyeWnder, which allows users to audit any encountered ad impression in real-time to check whether it is targeted or not. We have developed a detailed validation methodology for the difficult problem of assessing the accuracy of an ad detection method using only publicly available information.

Crowdsourcing simplifies the ad detection problem. Any crowdsourcing method, however, has two Achilles’ heels: privacy risks and bootstrapping its user-base. We have addressed the first and taken only very preliminary measures for the second. For example, we circulated it among other researchers, and enlisting some users for pay. This has permitted us to conduct a preliminary evaluation of eyeWnder and show that the count-based approach is indeed promising. Our current effort is to scale up our user-base. To do so, we will use traditional means, e.g., seeking more exposure through media, or getting help from data protection authorities to enlist users. Scaling up the user-base will help us refine our count-based ad detection method, evaluation, and probably yield many more interesting findings. This, however, remains a task for future work.

Acknowledgements

The research leading to these results has received funding from the European Union’s Horizon 2020 Research and Innovation Programme under grant agreements No 653449 (project TYPES), No 786669 (project CONCORDIA), No 786741 (project SMOOTH) and Marie Sklodowska-Curie grant agreement No 690972 (project PROTASIS). The paper reflects only the authors’ views and the Agency and the Commission are not responsible for any use that may be made of the information it contains.

References

  • (1)
  • cop (1998) 1998. Children’s Online Privacy Protection Act (COPPA). https://www.ftc.gov/enforcement/rules/rulemaking-regulatory-reform-proceedings/childrens-online-privacy-protection-rule. (1998). [Online: accessed 30-January-2019].
  • adblockplus.org (2019) adblockplus.org. 2019. Adblock Plus. Surf the web without annoying ads! https://adblockplus.org/. (2019). [Online: accessed 30-January-2019].
  • adguard.com (2019) adguard.com. 2019. Adguard AdBlocker browser extension. https://adguard.com/en/adguard-adblock-browser-extension/overview.html. (2019). [Online: accessed 30-January-2019].
  • Alliance (2019) Digital Advertising Alliance. 2019. YourAdChoices. http://youradchoices.com/. (2019). [Online: accessed 30-January-2019].
  • Amazon Technologies (2016) Inc. Amazon Technologies. 2016. Personalized Landing Pages. https://patentimages.storage.googleapis.com/72/fb/61/7dd01f31aca7be/US20160180389A1.pdf. (2016). [Online: accessed 30-January-2019].
  • Arrate et al. (2019) A. Arrate, J. G. Cabanas, A. Cuevas, M. Calderon, and R. Cuevas. 2019. Large-scale analysis of user exposure to online advertising on Facebook. IEEE Access (2019).
  • Barford et al. (2014) P. Barford, I. Canadi, D. Krushevskaja, Q. Ma, and S. Muthukrishnan. 2014. Adscape: Harvesting and Analyzing Online Display Ads. In Proc. of the Conference on World Wide Web, (WWW’14).
  • Bashir et al. (2018) M. Bashir, S. Arshad, E. Kirda, W. Robertson, and C. Wilson. 2018. How Tracking Companies Circumvented Ad Blockers Using WebSockets. In Proc. of the Internet measurement conference, (IMC’18).
  • Bashir et al. (2016) M. Bashir, S. Arshad, W. Robertson, and C. Wilson. 2016. Tracing Information Flows Between Ad Exchanges Using Retargeted Ads. In Proc. of the USENIX Security Symposium, (SEC’16).
  • Boughareb and Farah (2011) D. Boughareb and N. Farah. 2011. Toward a Web Search Personalization Approach Based on Temporal Context. In Proc. of Digital Information and Communication Technology and Its Applications.
  • Bradshaw (2016) P. Bradshaw. 2016. It’s official: The Walking Dead helped elect Donald Trump. https://www.theguardian.com/commentisfree/2016/nov/23/walking-elect-dead-trump-immigration-zombies. (2016). [Online: accessed 30-January-2019].
  • Brooks (2017) C. Brooks. 2017. Invasion of Privacy: What Consumers Think of Personalized Online Ads. http://www.businessnewsdaily.com/4632-online-shoppers-personal-ads.html. (2017). [Online: accessed 30-January-2019].
  • Bureau (2018) Interactive Advertising Bureau. 2018. IAB Internet Advertising Revenue Report. https://www.iab.com/wp-content/uploads/2018/11/REPORT-IAB-Internet-Advertising-Revenue-Report-HY-2018.pdf. (2018). [Online: accessed 30-January-2019].
  • Burklen et al. (2005) S. Burklen, P. J. Marron, S. Fritsch, and K. Rothermel. 2005. User Centric Walk: An Integrated Approach for Modeling the Browsing Behavior of Users on the Web. In Proc. of the 38th Annual Symposium on Simulation, (ANSS’05).
  • Butkiewicz et al. (2015) M. Butkiewicz, D. Wang, Z. Wu, H. V. Madhyastha, and V. Sekar. 2015. KLOTSKI: Reprioritizing Web Content to Improve User Experience on Mobile Devices. In Proc. of the 12th USENIX Conference on Networked Systems Design and Implementation, (NSDI’15).
  • Carrascosa et al. (2015) J. M. Carrascosa, J. Mikians, R. Cuevas, V. Erramilli, and N. Laoutaris. 2015. I Always Feel Like Somebody’s Watching Me: Measuring Online Behavioural Advertising. In Proc. of the Conference on Emerging Networking Experiments and Technologies, (CoNEXT’15).
  • Castelluccia et al. (2012) C. Castelluccia, M. Kaafar, and M. Tran. 2012. Betrayed by Your Ads!: Reconstructing User Profiles from Targeted Ads. In Proc. of the Conference on Privacy Enhancing Technologies, (PETS’12).
  • Castellucciae et al. (2009) C. Castellucciae, A. C-F. Chan, E. Mykletun, and G. Tsudik. 2009. Efficient and Provably Secure Aggregation of Encrypted Data in Wireless Sensor Networks. ACM Trans. Sen. Netw. 5 (2009).
  • Cohen and Matias (2003) S. Cohen and Y. Matias. 2003. Spectral Bloom Filters. In Proc. of the International Conference on Management of Data, (SIGMOD’03).
  • Datta et al. (2014) A. Datta, M. C. Tschantz, and A. Datta. 2014. Automated Experiments on Ad Privacy Settings: A Tale of Opacity, Choice, and Discrimination. CoRR abs/1408.6491 (2014).
  • EFF (2019) Electronic Frontier Foundation EFF. 2019. Do Not Track. https://www.eff.org/issues/do-not-track. (2019). [Online: accessed 30-January-2019].
  • eff.org (2018) eff.org. 2018. Privacy Badger blocks spying ads and invisible trackers. https://www.eff.org/privacybadger. (2018). [Online: accessed 30-January-2019].
  • European Commission (2018) European Commission. 2018. Data protection in the EU, The General Data Protection Regulation (GDPR); Regulation (EU) 2016/679. https://ec.europa.eu/info/law/law-topic/data-protection/. (2018). [Online: accessed 30-January-2019].
  • Everspaugh et al. (2015) A. Everspaugh, R. Chatterjee, S. Scott, A. Juels, and T. Ristenpart. 2015. The Pythia PRF Service. In Proc. of the 24th USENIX Security Symposium.
  • Fanti et al. (2016) G. C. Fanti, V. Pihur, and Ú. Erlingsson. 2016. Building a RAPPOR with the Unknown: Privacy-Preserving Learning of Associations and Data Dictionaries. PoPETs (2016).
  • FigureEight (2019) FigureEight. 2019. Human-in-the-loop for Machine Learning. https://www.figure-eight.com/. (2019). [Online: accessed 30-January-2019].
  • Freedman et al. (2005) J. M. Freedman, Y. Ishai, B. Pinkas, and O. Reingold. 2005. Keyword Search and Oblivious Pseudorandom Functions. In Proc. of the Second International Conference on Theory of Cryptography, (TTC’05).
  • ghostery.com (2019) ghostery.com. 2019. Ghostery - Cleaner, faster, and safer browsing. https://www.ghostery.com/. (2019). [Online: accessed 30-January-2019].
  • Graham and Muthukrishnan (2005) C. Graham and S. Muthukrishnan. 2005. An improved data stream summary: the count-min sketch and its applications. Journal of Algorithms 55 (2005).
  • Hannak et al. (2014) A. Hannak, G. Soeller, D. Lazer, A. Mislove, and C. Wilson. 2014. Measuring Price Discrimination and Steering on E-commerce Web Sites. In Proc. of the Internet measurement conference, (IMC’14).
  • Iordanou et al. (2017) C. Iordanou, C. Soriente, M. Sirivianos, and N. Laoutaris. 2017. Who is Fiddling with Prices? Building and Deploying a Watchdog Service for E-commerce. In Proc. of the Special Interest Group on Data Communication, (SIGCOMM’17).
  • iWnder (2019) iWnder. 2019. iWnder Simulator. Withheld for double blind review. (2019).
  • Jarecki and Liu (2009) S. Jarecki and X. Liu. 2009. Efficient Oblivious Pseudorandom Function with Applications to Adaptive OT and Secure Computation of Set Intersection. In Proc. of the 6th Theory of Cryptography Conference on Theory of Cryptography, (TTC’09).
  • Javaid et al. (2017) H. Javaid, H. K. Khalil, Z. A. Uzmi, and I. A. Qazi. 2017. Online Advertising Under Internet Censorship. In Proc. of the 16th ACM Workshop on Hot Topics in Networks, (HotNets’17).
  • Keelveedhi et al. (2013) S. Keelveedhi, M. Bellare, and T. Ristenpart. 2013. DupLESS: Server-Aided Encryption for Deduplicated Storage. In Proc. of the 22th USENIX Security Symposium.
  • Kursawe et al. (2011) K. Kursawe, G. Danezis, and M. Kohlweiss. 2011. Privacy-Friendly Aggregation for the Smart-Grid. In Proc. of the Privacy Enhancing Technologies, (PETS’11).
  • Laoutaris (2018) N. Laoutaris. 2018. Data Transparency: Concerns and Prospects. Proc. of the IEEE 106 (2018).
  • Lecuyer et al. (2014) M. Lecuyer, G. Ducoffe, F. Lan, A. Papancea, T. Petsios, R. Spahn, A. Chaintreau, and R. Geambasu. 2014. XRay: Enhancing the Web’s Transparency with Differential Correlation. In Proc. of the USENIX conference on Security Symposium, (SEC’14).
  • Lecuyer et al. (2015) M. Lecuyer, R. Spahn, Y. Spiliopolous, A. Chaintreau, R. Geambasu, and D. Hsu. 2015. Sunlight: Fine-grained Targeting Detection at Scale with Statistical Confidence. In Proc. of the Conference on Computer and Communications Security, (CCS’15).
  • Liu et al. (2013) B. Liu, A. Sheth, U. Weinsberg, J. Chandrashekar, and R. Govindan. 2013. AdReveal: Improving Transparency into Online Targeted Advertising. In Proc. of the Workshop on Hot Topics in Networks, (HotNets’13).
  • Melis et al. (2016) L. Melis, G. Danezis, and E. De Cristofaro. 2016. Efficient Private Statistics with Succinct Sketches. In Proc. of 23rd Annual Network and Distributed System Security Symposium, (NDSS’16).
  • Mikians et al. (2012) J. Mikians, L. Gyarmati, V. Erramilli, and N. Laoutaris. 2012. Detecting Price and Search Discrimination on the Internet. In Proc. of the Hot Topics in Networks, (HotNets’12).
  • Mikians et al. (2013) J. Mikians, L. Gyarmati, V. Erramilli, and N. Laoutaris. 2013. Crowd-assisted Search for Price Discrimination in e-Commerce: First Results. In Proc. of the Conference on emerging Networking EXperiments and Technologies, (CoNEXT’13).
  • noscript.net (2019) noscript.net. 2019. NoScript Firefox Extension. https://noscript.net/. (2019). [Online: accessed 30-January-2019].
  • Papaodyssefs et al. (2015) F. Papaodyssefs, C. Iordanou, J. Blackburn, N. Laoutaris, and K. Papagiannaki. 2015. Web Identity Translator: Behavioral Advertising and Identity Privacy with WIT. In Proc. of the 14th ACM Workshop on Hot Topics in Networks, (HotNets’15).
  • Parra-Arnau et al. (2017) J. Parra-Arnau, J. P. Achara, and C. Castelluccia. 2017. MyAdChoices: Bringing Transparency and Control to Online Advertising. ACM Trans. Web 11 (2017).
  • Plane et al. (2017) A. Plane, E. Redmiles, M. Mazurek, and M. Tschantz. 2017. Exploring User Perceptions of Discrimination in Online Targeted Advertising. In Proc. of the USENIX Conference on Security Symposium, (SEC’17).
  • Ribeiro et al. (2019) F. N. Ribeiro, K. Saha, M. Babaei, L. Henrique, J. Messias, F. Benevenuto, O. Goga, K. P. Gummadi, and E. M. Redmiles. 2019. On Microtargeting Socially Divisive Ads: A Case Study of Russia-Linked Ad Campaigns on Facebook. In Proc. of the Conference on Fairness, Accountability, and Transparency, (FAT*’19).
  • Rody-Mantha (2017) B. Rody-Mantha. 2017. Mobile targeting: Where’s the line between cool and creepy? http://mediaincanada.com/2017/03/20/when-it-comes-to-targeting-how-much-is-too-much-for-canadians/. (2017). [Online: accessed 30-January-2019].
  • Schneiderman (2016) E. T. Schneiderman. 2016. A.G. Schneiderman Announces Results Of “Operation Child Tracker”. https://ag.ny.gov/press-release/. (2016). [Online: accessed 30-January-2019].
  • Silverman (1986) B. W. Silverman. 1986. Density Estimation for Statistics and Data Analysis. Chapman and Hall/CRC.
  • State of California (2018) State of California. 2018. California Consumer Privacy Act – Assembly Bill No. 375. https://leginfo.legislature.ca.gov/faces/billTextClient.xhtml?bill_id=201720180AB375. (2018). [Online: accessed 30-January-2019].
  • Thomas et al. (2015) K. Thomas, E. Bursztein, C. Grier, G. Ho, N. Jagpal, A. Kapravelos, D. Mccoy, A. Nappa, V. Paxson, P. Pearce, N. Provos, and M. A. Rajab. 2015. Ad Injection at Scale: Assessing Deceptive Advertisement Modifications. In Proc. of the 2015 IEEE Symposium on Security and Privacy, (SP’15).
  • Venkatadri et al. (2018) G. Venkatadri, A. Mislove, and K. P. Gummadi. 2018. Treads: Transparency-Enhancing Ads. In Proc. of the 17th ACM Workshop on Hot Topics in Networks, (HotNets’18).
  • Wills and Tatar (2012) C. E. Wills and C. Tatar. 2012. Understanding What They Do with What They Know. In Proc. of the Workshop on Privacy in the Electronic Society, (WPES’12).