Most of the services and applications deployed on the web require dynamic interactions between clients and servers maintained by first- and third-party providers in order for clients to display content to the user. Such interactions include, amongst others, advertisement delivery for online advertising (OA) [1, 2] applications, checking of certificate revocation in TLS , checking of compromised login credentials , and any other content that is requested directly by the user. In almost all situations, the client is required to provide information that reveals certain aspects of the user profile to the content provider.
To enable behavioral-based targeting in OA, users are expected to provide subtle characteristics related to their own preferences and browsing history. Revealing such characteristics significantly compromises their privacy. However, devising an Internet devoid of such advertising would inevitably become an expensive place to inhabit for users. Spending in OA markets was valued at 378 billion USD in 2019 
, whilst Google and Facebook were estimated to depend on advertising for 83% and 99% of their revenue, respectively[6, 7, 8]. Clearly, website operators would lose a huge chunk of their income.
Thus, with the OA industry likely to stay, the focus has turned to trying to reimagining it in a way that it is possible to deliver ad-based content to client devices without compromising user privacy. While some approaches have considered only targeting ads based on contextual information (such as the current webpage that is being viewed), such systems do not give high utility to advertisement providers . Therefore, finding new private solutions that allow behavioral targeting based on inherent user preferences is likely to lead to much more relevant ads, and subsequently higher user engagement.
Unfortunately, existing approaches to maintain private behavioral targeting either end up delivering entire databases to clients to perform all computation locally — which usually results in huge bandwidth and performance costs  — or relying on semantic definitions, such as -anonymity, where user privacy is both much harder to quantify, and vulnerable to compromise if the clusters themselves leak too much information . To bridge the gap between ensuring user privacy, whilst maintaining the utility and low cost of Internet usage, we design and implement .
Overview of . is a practically efficient, privacy-preserving advertisement targeting and delivery system that is intended for deployment in client browsers and a wide range of web applications. Advertisement targeting is done locally on the client device using commonly-used tools for locality sensitive hashing, based on contextual and behavioral data obtained from the client. Once a client’s targeted ads are determined, the client then retrieves them from the untrusted proxy that handles the advertisement database via a private information retrieval (PIR) protocol . See Figure 1 for a general overview of the system.
The technical challenge of our system is ensuring that a client’s ad preference is not leaked to the ad provider, while at the same time ensuring that the system has a high rate of conversions and keeping the overall monetary cost of running the system minimal. The usage of a PIR protocol ensures that nothing about the client query is revealed to an honest-but-curious advertisement proxy. Additionally, we highlight the applicability of our system by devising a framework for building a useful, cost-effective, and high-performance advertisement delivery network for all participants. We can choose an underlying PIR scheme such that can deliver advertisements to a client in seconds from a database of advertisements. For instance, this allows us to build applications that allows clients to asynchronously query advertisements based on behavioral and contextual browsing, and then displaying ads to the user without having to wait for webpages for a prohibitively long time. Moreover, this is all achieved without revealing any of the private characteristics or preferences of the client that are used in ad targeting to the ad-delivery server (beyond what is revealed in higher layers, such as in a standard HTTPS request).
Overall, is orders of magnitude cheaper than delivering the database to each client locally and performance is similar to existing solutions that still require some information to be leaked to the proxy [9, 11, 12]. We achieve this by making modifications to state-of-the-art single-server PIR protocols that make using such schemes in our setting viable. Such protocols are notoriously heavy on computation, and so we devise a bucketization mechanism that allows clients to retrieve multiple advertisements in a single PIR query based around locality-sensitive hashing techniques . Furthermore, unlike previous systems, our architecture does not require hardware support , centralization of client preference processing , reliance on external anonymization networks , or higher client-side storage and computation .
Finally, while is primarily targeted towards the OA use-case, we believe that the fetching model may be of use to other applications. In essence, provides the capability for private and asynchronous fetching of indexed content in the Internet setting.
I-a Our Contributions
The formal contributions in this work follow.
A generic and modular framework for fully private advertisement delivery: We develop a generic framework, , for online advertisement delivery to clients, whilst maintaining absolute privacy in the honest-but-curious model: no locally computed preferences are ever learnt by the content provider. Our framework is generic in that it can be instantiated using any PIR scheme. The generality of our approach may have independent value in other content-delivery scenarios.
Optimizations in Private Information Retrieval: For our application, we choose to implement using the OnionPIR  stateless single-server PIR scheme, that is based on SEAL Fully Homomorphic Encryption . We improve the OnionPIR scheme by having the server bucketize their ad database relative to a locality sensitive hash function such as . This enables multiple ads for the same category to be retrieved using a single PIR query. In order to make this work, we must utilize a property of the OnionPIR scheme to embed multiple PIR responses into a single FHE ciphertext.
Practical implementation of end-to-end protocol: To demonstrate performance and applicability, we implement and show that we improve on the naive end-to-end system of delivering entire ad catalogs by several orders of magnitude, as well as being comparable to solutions that result in categorically higher leakage profiles. Overall, can deliver ads within seconds to a client when considering an ad database with entries. These improvements translate into very low financial running costs: million saved compared with the naïve approach, rendering it practical for everyday use.
Reporting conversions and other metrics. In this work, we only cover the targeting and delivery portion of a full advertisement network. Recognizing that advertisement networks are usually driven by determining click-through rates for each ad, we note that it is necessary for clients to report which ads it has interacted with back to the network, and that this should be done whilst maintaining their privacy. In Section V, we discuss how it is possible for our delivery pipeline to be integrated as a generic module into many existing privacy-preserving advertisement reporting mechanisms [17, 9, 18].
Real-time advertisement auctions. Our construction is unable to support real-time advertisement auctions. Unfortunately, practical systems supporting this functionality do so at the cost of revealing which ads are queried for — even if the link to which client has retrieved them is not maintained .111Revealing such information may still be enough to produce linkability between client profiles and queries, depending on what other public information is known outside of the system. While there are advantages in enabling such functionality, we focus in this work on providing absolute privacy for client queries (i.e., not even revealing to the server which ads are queried). See Section V for a more detailed discussion.
Ii-a Online Advertising
In online advertising (OA), most systems first profile and track users based on their actions across different websites and then display relevant ads to the targeted user based on their behavior and other contextual information. Specifically:
behavioral ad targeting requires matching ads to users based on their (usually private) preferences;
contextual ad targeting is performed relative only to the actions performed by a user and relative to the content that they request.
The advantage of contextual ad targeting compared with behavioral is that the targeting mechanism does not rely on any private user data. However, ads are demonstrably less relevant in the contextual setting, since behavioral-based approaches can be used to solicit ads that are likely to match a given user’s profile. In principle, for any targeting mechanism to guarantee high utility, it must use both contextual and behavioral data in order to display the most relevant ads. Such utility is usually measured in terms of an ad conversion rate or click-through rate. This rate measures the ratio of users that interact with each advertisement.
Clearly, performing behavioral targeting comes at the expense of a user’s privacy. A user’s behavioral traits, when revealed either alone or combined with other garnered or public information, reveal a non-quantifiable amount of information about a user’s personality, traits, and habits. It goes without saying that such information should be kept private from third-parties at all times. A user’s right to ensuring their data privacy is one of the key cornerstones of the European General Data Protection Regulation (GDPR).
As a result, the OA space has seen a number of recent innovations that attempt to maintain the lucrative business of Internet advertising, whilst providing much better privacy guarantees for clients. Overall, ad-delivery systems should satisfy the following requirements:
Full privacy: All client preferences that are used for retrieving the appropriate ad content for the user should be kept private from the content provider.
High utility content: Clients should receive advertisements that reflect their own private characteristics and interests.
Minimal delivery latency: Clients should receive ad-content in a timely manner so that the content is fresh, and relevant (with regards to their changing preferences).
Low bandwidth usage: Bandwidth usage should be kept at a minimum to keep server costs low, since such systems are likely to be used by very large numbers of clients.
High rate of database updates: Due to the nature of ad auctions, the system should tolerate a rapidly changing server-side database.
Configuration flexibility: The building blocks of the framework should ideally allow configuring components for varying efficiency/functionality trade-offs. This flexibility should be supported by the cryptographic primitives that are used.
Ii-B Threat Model
In , we assume that proxy server and advertisers are honest-but-curious. Specifically, we assume that the proxy server does not alter the catalog of the ad on its own, alter or reject the client query, or hold the delivery of ad after query. Similarly, we assume that the advertisers upload valid ads that they want to display to the clients.
Unlike other multi-party computation systems, does not require the proxy server and the advertisers to be non-colluding. The security of holds even if the proxy server is run by the advertiser.
Though it is possible to generically compile into a maliciously secure system , generally such systems suffer from high overhead and multiple rounds of communications. Thus, we briefly explore some practical mechanisms for ensuring robust deployments against a proxy that attempts to act maliciously in Section V.
Ii-C Comparable approaches
The desire for private advertisement targeting and delivery has been discussed in academic literature across various points in the last decade [11, 12, 17, 9]. However, in terms of practical deployments, there are few examples of systems that have been trialled and used [1, 2]. Below, we discuss two particular ad-delivery pipelines (and targeting philosophies) that have been developed with the goal of providing advertisements to client devices, whilst keeping their contextual data private.
In 2020, Google proposed a new system called Federated Learning of Cohorts (FLoC) .222Following extensive analysis of privacy pitfalls related to FLoC , the expected roll-out has since been delayed until late 2023 . At a high level, the system constructs cohorts of users that share a combined set of interests and preferences (Figure 2). In order to develop such cohorts, the browser keeps track of the browsing history of the user, and also inherent characteristics (such as the user device) and assigns the user to one of the global cohorts. As the user browsers the web, the browser shares the user cohort with websites and advertisers. The advertisers can then use this information to show relevant ads to the user. To maintain privacy each cohort must contain a pre-determined and large number of users. The idea is that, within the cohort, the user remains anonymous, and so they receive privacy guarantees essentially amounting to -anonymity. A central administrator or proxy counts the number of users in each cohort and, if required, merges smaller relevant cohorts.
Even though the approach utilized in FLoC would appear to be robust against existing privacy issues in OA via third-party tracking, it induces new issues of its own . Firstly, sharing cohort IDs with the advertisers allows the advertisers to enough information about users to derive sensitive information about the user’s character anyway. Secondly, it has been shown that even learning benign information about a user, such as movie reviews, can be enough to learn their personal and sensitive information such as political ideologies. Thirdly, when combining this information with other publicly available information, the privacy loss is likely to be dramatically worse. Finally, the user’s privacy is entirely dependent on the cohort that they fit into and this management is handled by a centralized proxy. As a result, FLoC’s pitfalls are liable to give advertisers and trackers information significant identifying and sensitive information about a user’s profile .
Local preference computation.
Building web services that do not reveal user-specific traits to websites and third-party requires a combination of local computation, client storage, and a private content delivery mechanism. A simple approach to achieve privacy preserving content delivery is to download the full database from the service provider and selecting the desired items locally (Figure 3). Such systems preserve privacy since the user does not leak to third parties any information as to which content it is requesting from the database. However, these systems require very large bandwidth overheads as the client must fetch a database that is potentially gigabytes in size.333For example, assuming that each client downloads the ad an ad database of approximately GB (KB ads entries).
Unfortunately, without any better privacy-preserving alternatives many large scale systems have adopted this approach and pay huge bandwidth price. Most notably, projects such as the Brave browser444https://brave.com/brave-ads/, have been forced to turn to deliver entire ad catalogs to users in order to maintain privacy preserving ad-delivery functionality . Such mechanisms are almost impossible to maintain as the advertisement database grows, as well as the number of clients and frequency of profile updates increase.
Ii-D Private Information Retrieval
We use Private Information Retrieval (PIR) as a generic functionality for retrieving advertisements from an untrusted server. To guarantee that the system can be used in the Internet setting, we must choose the PIR protocol carefully to ensure that runtimes, bandwidth, and associated financial costs are kept to a minimum. While numerous advancements have been made in recent PIR literature, many schemes still have unacceptable overheads for our application. We discuss the reasons behind which scheme we choose here, and provide a more detailed background on the state-of-the-art PIR literature in Section VI-A.
Stateless single-server PIR. In prior single-server basic PIR schemes to generate a response, the server has to perform a linear amount of computation over the database [23, 15]. Additionally, these schemes are based on computational cryptographic assumptions. Therefore the server has to perform at least one cryptographic operation per database element. These schemes are usually the easiest to deploy, requiring no management of state or database updates, and no trust assumptions to be made by the client. Furthermore, while such schemes usually have relatively high computational overhead (compared to two-server or stateful PIR schemes), they are still relatively practical. For example, to retrieve an element from a database with one million entries using the OnionPIR scheme , it takes seconds and less than of communication.
Stateful single-server PIR. Stateful single-server PIR scheme reduces computational overhead . Specifically, the client first interacts with the server in the offline phase to retrieve a hint. Then in the online phase, the client can use the hint to perform cheaper PIR queries. As a result, the server has to perform only a sub-linear number of online expensive cryptographic operations555The server still performs a linear number of PRF evaluations. The major drawback of this scheme is that to retrieve a hint, the client has to download the whole database locally. Each hint allows making a bounded number of cheap online queries, therefore the client also has to perform this step repeatedly.
Two-server PIR. In two-server (or generally information-theoretic) PIR schemes, the database is replicated on two non-colluding servers. The server computation does not involve performing any cryptographic operation. Therefore, these schemes are relatively cheaper than the single-server variants. Stateful two-server PIR schemes have further reduced the amortized server computation . Therefore, in terms of computation, these schemes are ideal for applications where low latency is required. However, the assumption of two or more non-colluding servers makes these schemes unsuitable for many practical applications.
Final choice of scheme. Comparing each of the available schemes, we find that two-server schemes introduce unacceptable overheads in terms of both implementation and running complexity, since such schemes require non-colluding servers that both process PIR queries. Therefore, we focus only on single-server schemes.
While stateful single-server schemes enjoy amortized performance advantages over stateless schemes, we find the bandwidth costs prohibitive for setting up a production OA system. Such costs arise due to the need to send large portions of the ad database clients in the initial step. Moreover, such schemes suffer dramatically when considering the potential for database updates that render previous client state redundant. In such cases, such an approach would require careful implementation of the necessary fresh downloads of state.
With these concerns in mind, we turn to the state-of-the-art in single-server PIR, known as OnionPIR . We find that this scheme is performant enough for our application, when compared with existing approaches [1, 2]. OnionPIR requires a server to perform 10 seconds of computation per ad query (for an ad database of elements), where each client is likely to make 10 queries per 3 hour period . Such querying can be performed asynchronously to client browsing, based on an up-to-date locally computed approximation of the client’s interests and preferences.
In summary, we implement our private ad-delivery system on top of the OnionPIR scheme . However, we stress that our approach is generic and that the explicit choice of PIR scheme can be made independently, in order to optimize the overall system for the desired application. We provide a further examination of using alternative PIR schemes in Section V.
Ii-E Overall PIR Framework
As mentioned above, we only consider stateless single-server PIR. Such a scheme involves a client with a secret index , and a server holding a database , that the client wants to learn the record from. The query process of the PIR protocol consists of a single message from the client to the server, and a single response from the server to the client. The client then uses this response to recover the record. Formally, PIR is defined using the following algorithms.
: A protocol executed by the client and the server. The client takes as input parameter , describing number of records that can be stored by the client. The server takes database as input. The client’s output is the initial state and server gets no output.
: An algorithm that is executed by the client. It takes as input index and current state and outputs an encrypted query to be sent to the server.
: An algorithm that is executed by the server. It takes as input an encrypted query and the database , and outputs an encrypted reply .
An algorithm that is executed by the client to extract a record from server’s reply.
For any PIR protocol, we have the followings requirements.
Correctness: Informally, the correctness of PIR requires that the client that queries learns the desired entry
with high probability.
Security: The security of PIR requires that the server learns no information about the clients queried index .
Ii-F Locality-Sensitive Hashing
In , we use locality-sensitive hashing (LSH) to enable batching multiple advertisements of the same category together. This increases performance of our system by allowing clients to retrieve multiple ads that are likely to be interesting to them in a single PIR query.
To implement this mechanism we use , a LSH function that hashes similar items into the same buckets with high probability . has been extensively used to detect potential duplicate content across websites in Google’s webpage crawler since 2006 . Moreover, Google’s FLoC proposal intended to use as a LSH for ad-targeting purposes .
In , and similarly to FLoC, we use
to classify the user’s input to a specific category. Specifically, the classification will be performed on the client’s local device. The client profile will be the input and the output of thewill be mapped to a category using pre-defined mapping.
Iii Technical Details
There are three main entities in our system:
the client browsing websites:
the advertiser who wants to display ads on the client device;
the proxy server, which acts as an intermediary between clients and the advertisers.
The proxy hosts ads from different advertisers and the client fetches these ads directly from the proxy. The proxy also simplifies preserving the user privacy because clients never interact with the advertisers directly. Throughout this section we will assume that the proxy and advertisers are honest-but-curious, i.e. they follow the protocol and try to learn extra information from the client messages and other side channels. See Section II-B for more details about our threat model.
Private Data. The main goal of our system is to hide a user’s private data, while still allowing the advertisers to show ads relevant to them. Unlike many previous approaches, our system does not reveal any information about the private data. In our system, any data related to a user’s browsing activity is considered private. This includes: visited websites, content clicked, searched keywords, store browser cookies. Any data that is extracted using the user’s browsing activity is also private. We say that protocols that leak none of this data to the proxy provide full privacy.
Server Proxy. Figure 4 shows the storage and API for the proxy server. The proxy is assumed to enjoy large storage and computational capabilities. In our system, we assume that the proxy is run by an independent entity relative to the advertisers. However, it is important to note that the security of our system assumes an untrusted proxy that is attempting to learn more about clients based on their queries.
The proxy stores two kinds of data in its storage.
: Dynamic storage for advertisements. The ads are indexed by categories. Each category consists of multiple ads. The catalog has can be thought of as a hierarchical structure with categories and sub-categories. Specifically, each ad is associated with a particular category and each category could, in turn, be linked to a category high up in the hierarchy.
: At a high-level this data-structure maps categories to their indexes in the . The data structure is parsed as a tree. Each internal node in the tree consists of the category name and address of its children. The leaf nodes additionally contain category index in . This data structure provides a search operation that takes bit-string as input and outputs node that could be reached by using the bit-string.
Ad catalog. In Figure 5 we give an example of such a data structure. The example catalog consists of three categories each having relevant ads. These ads could potentially belong to different advertisers. The catalog has three levels of a category hierarchy. Note that each category level is more targeted than the higher level. For example, one of the categories in level-1 is targeting people with an interest in cooking. However, in level-2 the categories are targeting large population interest in health or information technology. Similarly, at level-3 there is only one category targeting the whole US population. Note that the exact relationship between the categories is not fixed and could be decided by the proxy and the advertisers combined.
Figure 5 also shows an example based on the catalog. All the nodes in the tree could be accessed using a bit string of size three.
Client-proxy interaction. Figures 4 and 6 represent the storage and the function calls used by the proxy and the client, respectively. The client stores and a small state . The proxy on the other hand stores the entire . Note that the size of is significantly smaller than . For leaves, the total size of is bits. Concretely, in our implementation the size of is approximately KB while consists of about MB of data. We assume that the proxy maintains an updated copy of , which any client can download opportunistically.
Iii-a Protocol Flow
In Figure 7 we show the high-level flow of our system. Here we define that flow in detail.
The advertisers upload their ads to the Proxy. Where each entry is a tuple , where is the operation add, remove, update, consists of ad identifier and the auction/matching logic, and is the ad data. The advertisers could also specify the category associated with each ad. In case of , will be empty and will contain ad id only. The proxy then calls function to update the data structures.
The client will ask the proxy for the . To achieve this the client will send the proxy last download time and the proxy will send back the most recent copy of the tree. If the client has never downloaded the tree before, then it will send as the download time and the proxy will send the whole tree to the client.
The client will locally call a function on her preferences (e.g., browsing history) to get a relevant category. Like Google FLoC this step is implemented using the function that takes as input the preferences and outputs a category as a binary string.
The client will locally call function with category and as inputs and get as output, which represents the index of category in proxy’s . If the client’s category is not present in , the function outputs the index of the immediate sibling. Considering the example given in Figure 5, on input category string of the functions will output the index corresponding to the string .
Using the index as input, the client will locally call function and sends the output query to the proxy. The proxy will then call function with inputs and and send the output to the client. function directly calls and function directly calls .
The client then locally calls function, based on , to retrieve a set of ads from the proxy response. The client then stores these ads as a local state .
Finally the client calls function, which picks the relevant ad from the state. This function takes into consideration the matching logic associated with each ad. Note that each call to this function consumes some part of state as ads are displayed to the user.
In this section, we report on the performance and scalability of in terms of bandwidth, client-side computation and server-side computation. We measure the effectiveness of our system to meet the performance requirements laid out in Section I. Such metrics include the size of the server’s database; the number of concurrent queries from the client; the required bandwidth usage; and, finally, associated the monetary cost of implementing such a system in common hardware. In the ad catalog, each category consists of relevant ads. Also, each client query results in fetching all of ads. The client then consumes these ads from the local state. Note that this design choice considerably improves the overall performance of because the cost of each client-server interaction is amortized over ads. We assume that per day each client sends queries. It means that each client is shown around ads per day. We ran our experiments times and report their averages below.
Iv-a Bandwidth Evaluation
|Number of Ads||Communication Size (KB)|
Figure 8 represents the total communication volume between the client and the server. In the communication size is independent of the database size. In total, the communication size is KB, which includes KB of data from the client to the server and KB of data from the server to the client. This communication is mainly due to the query and response size of OnionPIR. OnionPIR has the smallest communication volume among all the single-server PIR schemes. However, to further improve the communication bandwidth, we could utilize multi-server PIR schemes. We avoid adopting that architecture as it requires distributed trust among the multiple servers, which is not a suitable setting for ads.
We further believe that the small communication size of our system makes it an ideal candidate for bandwidth-constrained clients, such as mobile phone users.
Iv-B Runtime Latency
|Number of Ads||Computation (sec)|
In Figure 9, we present the time it takes for the server to deliver ads relevant to a single client. Our system is capable of delivering ads in a given category in around seconds, even when has more than categories. In our experiments, we also found that around of computation is due to underlying PIR operations. Without using a different PIR scheme, improving this further requires improving underlying crypto primitives. While improvement may potentially be made using alternative PIR schemes in multi-server or stateful models, we highlight issues with these approaches in Section V.
Iv-C Server Cost for Running
|Number of Ads||Server Cost (US cents)|
In Figure 10, we calculated monthly per-user monetary cost of running . The computation and network cost of an Amazon EC2 t2.2xlarge instance, rented at per core hour666https://aws.amazon.com/ec2/spot/pricing/ and nine cents per gigabyte of data transfer.777https://aws.amazon.com/ec2/pricing/on-demand/
Consequently, in per user monthly bandwidth is only cents. In terms of computation, the monthly per-user cost of serving an ad bucket from a of size for a user that makes 10 queries of 30 ads per day is cents. According to some published estimates , Google’s monthly ad revenue is approximately 5 USD per user. This means that Google, using , would operate with a profit margin relative to their per-user revenue, whilst serving ads in a completely privacy-preserving manner.
Financial comparisons. We compare the financial costs of running with the trivial solution of sending the entire ad database to each client. We assume bandwidth usage costs based on retrieving a new ads database every two hours, where around clients are online at any one time. This mirrors the deployment scenario used in . Overall, we expect the total cost to amount to cents per client. This cost arises from assuming that each client downloads the ad an ad database of approximately GB (KB ads entries). Thus, the financial costs of running are a reduction compared with the trivial solution. Moreover, this improvement scales identically as the database increases.
Finally, we note that the eventual cost of running is at least larger than the concurrent work of AdVeil .888Costs are extrapolated somewhat due to differing hardware usage and thus this is only an approximation. However, AdVeil also provides categorically weaker privacy guarantees, and relies on expensive anonymizing proxies such as Tor. We provide a more detailed comparison with all related work in Section VI.
Reporting ad conversions privately. Ad interaction reporting is an important aspect of the online advertisement ecosystem. An accurate ad interaction reporting is essential to ensure that i) advertisers are billed as a function of the number of interactions their ads have over time; and that ii) publishers are paid fairly. With , the ad distribution party does not have visibility of how users are interacting with ads in the system. In addition, the users should not trivially report the ad interaction to the advertisers and ad distribution party. Doing so would render useless the efforts to protect the user privacy by hiding the ad fetching request patterns with . Thus, when using our system, it is impossible for the ad distribution party to assemble an ad interaction report that can be used to bill advertisers.
However, there are multiple protocols that can be deployed in parallel with to provide privacy preserving ad reporting. Adnostic  proposes a system based on homomorphic encryption and zero-knowledge proofs to provide secure and private ad reporting. The authors of Privad  introduce an entity called the Dealer that is responsible for anonymizing user interactions with ads and respective billing. In , the authors leverage an additively encryption scheme that enables privacy-preserving ad reporting at scale. Finally, THEMIS  proposes a private ad reporting mechanism based on an homomorphic encryption, a threshold signature scheme, and a peer-to-peer network. In terms of existing practical deployments of privacy-preserving reporting ad conversions, the Safari browser currently tracks conversions locally and then report these to a server without revealing the user identity .
All of the approaches highlighted above could be used as a privacy-preserving reporting layer that embeds the as the targeting and delivery layer, without impacting any of our original privacy guarantees.
Real-time advertisement auctions. Unlike systems such as AdVeil , we are unable to build real-time advertisement auctions into the ad-delivery pipeline. This is only possible in previous work because the untrusted proxy that is used is able to see which ads are queried, and unlinkability of ad views to user profiles is maintained via an anonymizing proxy (see Section VI for more details).
In , the untrusted proxy learns nothing from a client query. Therefore, ad auctions are only possible to the extent that ad groupings for the locality-sensitive hash function that is used () can be decided apriori. While this is a regression when compared with other systems, we believe that the increased user privacy in our system is of paramount importance. Moreover, in theory such auctions may be possible to run within FHE circuits, although such capabilities are still far beyond the realm of practical web applications.
Alternative PIR schemes. In , we do not use these approaches due to their inherent assumptions that are not suitable for the application.
Stateful PIR: Patel et al.  introduced single-server stateful PIR where the client retrieves some helper data in the offline phase and use it to make the online PIR queries. Their protocol has substantially reduced the amortized computation cost over vanilla stateless PIR. However, their scheme requires the client to download the entire database in the offline phase. For applications like online advertising, where the database is potentially large, it is impractical to download the entire database.
Batched PIR: Batched PIR allows the server to answer a batch of PIR queries at a lower cost than answering each query separately. This general strategy is adopted by various protocols [30, 31, 23, 14, 32]. We remark that this approach is not always applicable in because the client access only one index at a given time.
PIR with Pre-processing: Another direction is PIR with pre-processing, first proposed by Beimel et al. . In their scheme, the server first performs a linear pre-processing step; after that, the server’s work per query is sub-linear. However, this scheme has an exponentially large storage overhead for the server. In other words, their scheme requires the server to store a separate copy of the pre-processed database for each client.
Preventing malicious server activity. While we only consider a honest-but-curious server in our construction, we note that a malicious server could arbitrarily alter their input to try and learn more about the client queries. An example of such an attack would see the server host bad files at certain points in the database, that cause the client to abort the protocol in some way that the server can recognize. The server may then be able to try and leak which indices the client is querying by carefully choosing which files to corrupt, and then inspecting whether an abort occurs.
One way around this is for the server to prove to the client in zero-knowledge that each entry of the database is well-formed, although such a solution is likely to be prohibitively expensive for the client. A more practical alternative is for the server to either have its database input certified and signed by a third-party, and then verifying that the server input is signed correctly before proceeding.999In addition, one could avoid verifying a signature over each record using a cut-and-choose approach that only verifies a random selection of records. Such a solution could also be integrated with a public verification procedure, where clients are able to inspect the public database at an independent trusted location, and then verifying that the server input is the same as the one that it has seen previously.101010Note that, in our application (and PIR in general), the server database is considered to be public information, and so this does not impact any security guarantees. Finally, implementation-specific countermeasures could include careful handling of client aborts, to make sure that they are indistinguishable to the server from client successes.
Other content-fetching applications. Similarly to online advertising systems, there is a wide range of web applications that require users to fetch content from third parties. From revoked certification checks performed by web browsers , to map111111https://maps.google.com and weather applications121212https://www.accuweather.com, the browser is often required to issue queries to third-party providers that leak information about the user’s profile (such as cookie state) and behavioral aspects. For example, when the browser fetches the forecast from a weather service, the service provider learns the location that the user is interested in; this is likely the location where the user is or plans to be.
Vi Related Work
Vi-a Private Information Retrieval
As mentioned previously, private information retrieval protocols are proposed in single-server and multi-server models.
The first single-server scheme was proposed by Kushilevitz and Ostrovsky . In their scheme, the database is represented as a high-dimensional hypercube. the client’s request is encrypted under additive homomorphic encryption. The scheme has a request size of and response size of . Later on, several works have extended this scheme using different cryptographic assumptions. For example, Cachin et al.  proposed a PIR protocol based on -Hiding assumption, Chang ’ scheme is based on Pailer homomorphic encryption and Lipmaa  uses the Damgard-Jurik encryption scheme . All of these schemes have improved the request and response size of the original protocol. However, it has been observed that these schemes in practice often perform slower than downloading the entire database when the network bandwidth is a few hundred kilobytes per second .
Recently more practical single-server schemes have been proposed that display promising performance for various applications. Aguilar-Melchor et al.  present XPIR with good computation cost. But, their protocol has a very high request and response size. SealPIR  addresses the request size bottleneck by introducing a novel query compression technique. This results in a significant reduction in request size however their response size is similar to XPIR. Ali et al.  gives a protocol that improves upon SealPIR’s response size. However, their scheme has a higher computational overhead than SealPIR and XPIR. Recently, Mughees et. al.  has proposed OnionPIR. OnionPIR has significantly reduced the response overhead of SealPIR while keeping the computation comparable. Concretely, the response overhead is only x over the insecure baseline.
Due to its performance advantage over previous schemes, we have used OnionPIR as the underlying PIR scheme for . The OnionPIR scheme excels in terms of the request and response sizes (i.e., communication) but the computation overheads remains quite high. In all of these schemes, the server needs to perform at least expensive cryptographic operations and the computational cost of such an operation is often higher. As mentioned in Section II-D, we could use alternative PIR schemes to reduce computation overheads, but doing so introduces significant costs in terms of communication, implementation, and management of state and database updates (Section V).
Vi-B Private Advertising
|Protocol||Accuracy||Leakage||Trusted Third Party||Financial cost|
|Privad ||Broad interest categories||Ads without client identifiers||Yes||Negligible|
|FLoC ||Broad interest categories||Broad interest categories||Yes||Negligible|
|AdVeil ||Fully targeted||Ads without client identifiers||Yes (Anonymity Proxy)||cents|
|ObliviAd ||Fully targeted||None||Yes (TEE)||TEE costs|
|Adnostic ||Contextually Targeted||Contextual||No||Negligible|
|Brave Ads ||Fully targeted||None||No||30 cents|
|Fully targeted||None||No||3 cents|
There have been multiple attempts to design and implement advertising ecosystems that are practical and privacy-preserving. In the table in Figure 11, we compare the characteristics of with approaches taken from both previous research in this area and real-world deployments. We discuss these systems in further detail below.
Adnostic , proposes a system that features behavioral ad targeting and user privacy and is complementary to the current web ad infrastructure. The user fetches a set of ads from the ad Broker and the behavioral targeting happens locally. The browser processes the user’s history to determine their interests, which are then used to select a subset of the fetched ads to show the user. In addition, Adnostic proposes a cryptographic scheme that relies on homomorphic encryption and zero-knowledge proofs to implement privacy-preserving billing reporting system. This system allows publishers and advertisers to learn the performance of the ad campaigns while preserving user’s privacy. However, Adnostic does not attempt to hide which ads the user requests from the Broker, which leaks user interests and behavior to third parties.
Privad  addresses the privacy concerns in the web advertising industry by introducing a new party – the Dealer – that proxies and anonymizes all interactions between the user and the ad providers. The communication between the user and the Dealer is encrypted, thus the Dealer does not learn any behavioral information about the users. On the other hand, the ad providers have access to the user’s behavioral information but do not know the user’s identity. In addition to providing privacy, the Dealer is also responsible for the billing logic and fraud prevention. The drawbacks of Privad are that the Dealer is a central party that needs to be online and intermediate all the communication between the users and ad providers. In addition, the user’s privacy requires non-collusion between the Dealer and ad providers.
The authors of ObliviAd  propose an hardware-based PIR system that provides strong security and privacy through an Trusted Execution Environment (TEE) and on an Oblivious RAM (ORAM) scheme . The TEE ensures that the ad targeting, ad billing reporting and fraud prevention are privacy-preserving. The user sends an encrypted behavioral profile to a third party running a TEE environment. The ad targeting logic runs on the TEE and selects a batch of ads based on the encrypted user profile. Finally, ObliviAd relies on an ORAM scheme to ensure that the ad fetching does not leak sensitive information about the user. Although ObliviAd provides strong security and privacy guarantees, it assumes that the TEE environment is secure against attacks that may expose the privacy of user’s data being processed within the TEE. As it has been shown in recent literature, sadly, there are no instances of TEE designs that provide those guarantees .
THEMIS  is a decentralized and privacy-preserving ad platform that provides auditability, rewards users for viewing ads, and allows advertisers to verify the performance and billing reports of ad campaigns. The user privacy at ad targeting and fetching phases is guaranteed by relying on a trivial PIR scheme and by performing local ad targeting. First, the user downloads the whole database periodically from a third party who curates the ad catalog. Then, the user selects locally the ads to view based on their profile and browsing history. Since no user-specific data leaves the client, the system does not leak any sensitive information about the user. In addition, THEMIS relies on homomorphic encryption and zero-knowledge proofs to protect the user privacy at ad accounting and billing phases. Due to relying on a trivial PIR scheme, a major limitation of THEMIS is to scale as the number of ads in the catalog increase given the bandwidth necessary for all users to download the whole ad database. could be used as a drop-in replacement for the trivial PIR scheme to overcome this limitation.
In concurrent work, AdVeil  proposes a modular privacy-preserving advertising ecosystem with formal guarantees for end users. The system features private ad targeting, private ad retrieval, and a private ad reporting scheme. The private ad targeting subsystem relies on a single-server PIR protocol and a locality-sensitive hashing mechanism to allow the users to learn which ads to fetch from the broker without disclosing their profile. Both the private ad retrieval and the reporting scheme rely on an anonymizing proxy (e.g. Tor) to ensure the unlinkability between the user’s preferences and the queries issued to the ad broker. Although the fetching performance achieved by AdVeil is better than , the biggest drawback is its reliance on an anonymity proxy to protect the users’ privacy when fetching ads from the broker. Using anonymous proxies correctly such as Tor is not trivial for average web users [43, 44, 45]. In addition, many ISPs and private networks block access to such networks, which effectively prevents AdVeil users from successfully fetching and displaying ads [46, 47, 48]. , on the other hand, relies on PIR for both the ad targeting and ad fetching phases, which provides stronger privacy and usability guarantees without relying on external systems such as anonymity proxies.
In this work, we devise a new framework called for content delivery that relies on cryptographic hardness and best-case privacy, rather than syntactic and optimistic privacy guarantees. In , our scheme utilizes local computation of preferences followed by efficient, configurable, single-server private information retrieval (PIR) to ensure that clients can fetch content from servers, without revealing any of their inherent characteristics to the content provider.
Our solution works by combining novel cryptographic optimizations to PIR schemes that allow storing minimal client state, in order to gain better practicality, that should have independent benefit to the building of practical PIR schemes. In the context of advertisement delivery, we show that we can deliver ads to a client in seconds, with total bandwidth costs of MB, where the ad database has entries. In a system with clients, we calculate that using gives a financial saving in delivering such content to users, compared with the trivial solution of sending the entire database. In addition, performance is comparable with similar ad targeting mechanisms that provide weaker guarantees with regards to leakage of client query information.
Overall, our results show that practical advertisement targeting and delivery systems with best-case privacy guarantees can be built from PIR protocols. We expect that can be used to deliver the new state-of-the-art in privacy-preserving behavioral ad targeting and delivery. Moreover, we believe that our methods could be applied to other settings.
Future Work. In we have utilized single-server stateless PIR protocol. One way to directly improve the latency of the system is to harness the power of stateful or batched frameworks for running single-server PIR [30, 31, 23, 14, 32, 24]. However, as discussed in Section V, these schemes are not directly compatible with , therefore an interesting future direction is to extend to make it possible to embed these approaches as the underlying PIR scheme.
=0mu plus 1mu
-  Google, “FLoC: Federated Learning of Cohorts,” Jan 25 2021.
-  “An introduction to brave’s in-browser ads,” https://brave.com/intro-to-brave-ads/, accessed Jun 201.
-  J. Larisch, D. R. Choffnes, D. Levin, B. M. Maggs, A. Mislove, and C. Wilson, “CRLite: A scalable system for pushing all TLS revocations to all browsers,” in 2017 IEEE Symposium on Security and Privacy. IEEE Computer Society Press, May 2017, pp. 539–556.
-  L. Li, B. Pal, J. Ali, N. Sullivan, R. Chatterjee, and T. Ristenpart, “Protocols for checking compromised credentials,” in ACM CCS 2019, L. Cavallaro, J. Kinder, X. Wang, and J. Katz, Eds. ACM Press, Nov. 2019, pp. 1387–1403.
-  Statista, “Digital advertising spending worldwide from 2019 to 2024,” 28 May 2021, https://www.statista.com/statistics/237974/online-advertising-spending-worldwide/. Accessed 6 Sep 2021.
-  Interactive Advertising Bureau, “IAB Internet Advertising Revenue Report,” 2020.
-  G. Edelman, “Why Don’t We Just Ban Targeted Advertising?” 22 Mar 2020, https://www.wired.com/story/why-dont-we-just-ban-targeted-advertising/. Accessed 6 Sep 2021.
-  Trefis Team and Great Speculations, “Is Google Advertising Revenue 70%, 80%, Or 90% Of Alphabet’s Total Revenue?” 24 Dec 2019, https://www.forbes.com/sites/greatspeculations/2019/12/24/is-google-advertising-revenue-70-80-or-90-of-alphabets-total-revenue/. Accessed 6 Sep 2021.
-  S. Servan-Schreiber, K. Hogan, and S. Devadas, “AdVeil: A Private Targeted-Advertising Ecosystem,” Cryptology ePrint Archive, Report 2021/1032, 2021, https://ia.cr/2021/1032.
-  E. Kushilevitz and R. Ostrovsky, “Replication is NOT needed: SINGLE database, computationally-private information retrieval,” in 38th FOCS. IEEE Computer Society Press, Oct. 1997, pp. 364–373.
-  S. Guha, B. Cheng, and P. Francis, “Privad: Practical Privacy in Online Advertising,” in Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation, ser. NSDI’11. USA: USENIX Association, 2011, p. 169–182.
-  V. Toubiana, A. Narayanan, D. Boneh, H. Nissenbaum, and S. Barocas, “Adnostic: Privacy preserving targeted advertising,” in NDSS 2010. The Internet Society, Feb. / Mar. 2010.
-  M. Backes, A. Kate, M. Maffei, and K. Pecina, “ObliviAd: Provably secure and practical online behavioral advertising,” in 2012 IEEE Symposium on Security and Privacy. IEEE Computer Society Press, May 2012, pp. 257–271.
-  A. Beimel, Y. Ishai, and T. Malkin, “Reducing the servers’ computation in private information retrieval: PIR with preprocessing,” Journal of Cryptology, vol. 17, no. 2, pp. 125–151, Mar. 2004.
-  M. H. Mughees, H. Chen, and L. Ren, “OnionPIR: Response Efficient Single-Server PIR,” Cryptology ePrint Archive, Report 2021/1081, 2021, https://ia.cr/2021/1081.
-  “Simple encrypted arithmetic library — seal,” https://sealcrypto.org/, accessed Jun 2021.
-  G. Pestana, I. Querejeta-Azurmendi, P. Papadopoulos, and B. Livshits, “THEMIS: A Decentralized Privacy-Preserving Ad Platform with Reporting Integrity,” 2021.
-  J. Wilander, “Privacy Preserving Ad Click Attribution For the Web,” 22 May 2019, https://webkit.org/blog/8943/privacy-preserving-ad-click-attribution-for-the-web/. Accessed 6 Sep 2021.
-  O. Goldreich, S. Micali, and A. Wigderson, “How to play any mental game or A completeness theorem for protocols with honest majority,” in 19th ACM STOC, A. Aho, Ed. ACM Press, May 1987, pp. 218–229.
-  Google Research & Ads, “Evaluation of Cohort Algorithms for the FLoC API,” 21 Oct 2020.
-  E. Rescorla, “Privacy analysis of FLoC,” 10 June 2021, https://blog.mozilla.org/en/privacy-security/privacy-analysis-of-floc/. Accessed 7 Sep 2021.
-  Google, “An updated timeline for Privacy Sandbox milestones,” Jun 24 2021.
-  S. Angel, H. Chen, K. Laine, and S. T. V. Setty, “PIR with compressed queries and amortized query processing,” in 2018 IEEE Symposium on Security and Privacy. IEEE Computer Society Press, May 2018, pp. 962–979.
-  S. Patel, G. Persiano, and K. Yeo, “Private stateful information retrieval,” in ACM CCS 2018, D. Lie, M. Mannan, M. Backes, and X. Wang, Eds. ACM Press, Oct. 2018, pp. 1002–1019.
-  H. Corrigan-Gibbs and D. Kogan, “Private information retrieval with sublinear online time,” in EUROCRYPT 2020, Part I, ser. LNCS, A. Canteaut and Y. Ishai, Eds., vol. 12105. Springer, Heidelberg, May 2020, pp. 44–75.
-  M. Charikar, “Similarity estimation techniques from rounding algorithms,” in 34th ACM STOC. ACM Press, May 2002, pp. 380–388.
-  G. S. Manku, A. Jain, and A. Das Sarma, “Detecting near-duplicates for web crawling,” in Proceedings of the 16th International Conference on World Wide Web, ser. WWW ’07. New York, NY, USA: Association for Computing Machinery, 2007, p. 141–150. [Online]. Available: https://doi.org/10.1145/1242572.1242592
-  “Here’s how much money you made google by staring at its ads for 20 years,” https://thenextweb.com/news/heres-how-much-money-you-made-google-by-staring-at-its-ads-for-20-years, accessed: 2021-08-31.
-  M. Green, W. Ladd, and I. Miers, “A protocol for privately reporting ad impressions at scale,” in Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, 2016, pp. 1591–1601.
Y. Ishai, E. Kushilevitz, R. Ostrovsky, and A. Sahai, “Batch codes and their
Proceedings of the 36th Annual ACM Symposium on Theory of Computing, 2004, 2004.
-  R. Henry, “Polynomial batch codes for efficient IT-PIR,” Proc. Priv. Enhancing Technol., 2016.
-  Y. Ishai, E. Kushilevitz, R. Ostrovsky, and A. Sahai, “Cryptography from anonymity,” in 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2006), 2006, 2006.
-  C. Cachin, S. Micali, and M. Stadler, “Computationally private information retrieval with polylogarithmic communication,” in Advances in Cryptology - EUROCRYPT ’99, International Conference on the Theory and Application of Cryptographic Techniques, 1999, 1999.
-  Y. Chang, “Single database private information retrieval with logarithmic communication,” in Information Security and Privacy: 9th Australasian Conference, ACISP 2004, Sydney, Australia, July 13-15, 2004. Proceedings, ser. Lecture Notes in Computer Science, 2004.
-  H. Lipmaa, “An oblivious transfer protocol with log-squared communication,” in Information Security, 8th International Conference, ISC 2005, ser. Lecture Notes in Computer Science, 2005.
-  I. Damgård and M. Jurik, “A generalisation, a simplification and some applications of paillier’s probabilistic public-key system,” in Public Key Cryptography, 4th International Workshop on Practice and Theory in Public Key Cryptography, PKC 2001, ser. Lecture Notes in Computer Science, 2001.
-  R. Sion and B. Carbunar, “On the computational practicality of private information retrieval,” in Proceedings of the Network and Distributed Systems Security Symposium. Internet Society, 2007.
-  C. Aguilar Melchor, J. Barrier, L. Fousse, and M.-O. Killijian, “XPIR: Private information retrieval for everyone,” PoPETs, vol. 2016, no. 2, pp. 155–174, Apr. 2016.
-  A. Ali, T. Lepoint, S. Patel, M. Raykova, P. Schoppmann, K. Seth, and K. Yeo, “Communication-computation trade-offs in PIR,” IACR Cryptol. ePrint Arch., vol. 2019.
-  A. Juels, “Targeted advertising… and privacy too,” in CT-RSA 2001, ser. LNCS, D. Naccache, Ed., vol. 2020. Springer, Heidelberg, Apr. 2001, pp. 408–424.
-  O. Goldreich, “Towards a theory of software protection and simulation by oblivious RAMs,” in 19th ACM STOC, A. Aho, Ed. ACM Press, May 1987, pp. 182–194.
-  D. Cerdeira, N. Santos, P. Fonseca, and S. Pinto, “SoK: Understanding the prevailing security vulnerabilities in TrustZone-assisted TEE systems,” in 2020 IEEE Symposium on Security and Privacy. IEEE Computer Society Press, May 2020, pp. 1416–1432.
-  B. Fabian, F. Goertz, S. Kunz, S. Müller, and M. Nitzsche, “Privately Waiting – A Usability Analysis of the Tor Anonymity Network,” in Sustainable e-Business Management, M. L. Nelson, M. J. Shaw, and T. J. Strader, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2010, pp. 63–75.
-  J. Clark, P. C. van Oorschot, and C. Adams, “Usability of anonymous web browsing: An examination of tor interfaces and deployability,” in Proceedings of the 3rd Symposium on Usable Privacy and Security, ser. SOUPS ’07. New York, NY, USA: Association for Computing Machinery, 2007, p. 41–51. [Online]. Available: https://doi.org/10.1145/1280680.1280687
-  G. Norcie, J. Blythe, K. Caine, and L. J. Camp, “Why johnny can’t blow the whistle: Identifying and reducing usability issues in anonymity systems,” in Proceedings 2014 Workshop on Usable Security. https://doi. org/10.14722/usec, 2014.
-  F. A. Saputra, I. U. Nadhori, and B. F. Barry, “Detecting and blocking onion router traffic using deep packet inspection,” in 2016 International Electronics Symposium (IES), 2016, pp. 283–288.
-  S. Khattak, D. Fifield, S. Afroz, M. Javed, S. Sundaresan, V. Paxson, S. J. Murdoch, and D. McCoy, “Do you see what i see? differential treatment of anonymous users.” Internet Society, 2016.
-  S. Miller, K. Curran, and T. Lunney, “Traffic classification for the detection of anonymous web proxy routing,” International Journal for Information Security Research, vol. 5, no. 1, pp. 538–545, Mar. 2015.