Background & Motivation. Privacy-enhancing content blocking tools such as AdBlock Plus (3), uBlock Origin (23), and Brave (8) are widely used to block online advertising and/or tracking (Garimella et al., 2017; Merzdovnik et al., 2017; Malloy et al., 2016). Trackers have engaged in the arms race with content blockers via counter-blocking (Nithyanand et al., 2016; Mughees et al., 2017) and circumvention (Alrizah et al., 2019; Le et al., 2021). In the counter-blocking arms race, trackers attempt to detect users of content blocking tools and give them an ultimatum to disable content blocking. In the circumvention arms race, trackers attempt to evade filter lists used to block ads and trackers, thus rendering content blocking ineffective. While both of these arms races persist to date, trackers are increasingly employing circumvention because successful counter-blocking has not been effective in persuading users of content blocking tools to disable them (Chen, 2016; Rogers, 2018; Page Fair, 2017).
Limitations of Prior Work. Trackers have been using increasingly sophisticated techniques to circumvent content blocking (Alrizah et al., 2019; Le et al., 2021; Bashir et al., 2018). At a high level, long standing circumvention techniques can be classified into two categories. One type of circumvention is achieved by frequently changing the network location (e.g., domain or URL) of advertising and tracking resources. Content blocking tools attempt to address this type of circumvention by updating filter lists promptly and more frequently (Iqbal et al., 2017; Sjosten et al., 2020; Vastel et al., 2020). The second type of circumvention is achieved by mixing up tracking resources with functional resources such as serving both from the same network endpoint (e.g., first-party or Content Delivery Network (CDN)) (Alrizah et al., 2019; Chen et al., 2021; Dao et al., 2020). Content-blocking tools have struggled against this type of circumvention because they are in a no-win situation: they risk breaking legitimate functionality as collateral damage if they act and risk missing privacy-invasive advertising and tracking if they do not. While there is anecdotal evidence, the prevalence and modus operandi of this type of circumvention has not been studied in prior literature.
Measurement & Analysis. In this paper, we aim to study the prevalence of mixed tracking and functional resources on the web. We present TrackerSift to conduct a large-scale measurement study of mixed resources at different granularities starting from network-level (e.g., domain and hostname) to code-level (e.g., script and method). TrackerSift’s hierarchical analysis sheds light on how tracking and functional resources can be progressively untangled at increasing levels of finer granularity. It uses a localization approach to untangle mixed resources beyond the script-level granularity of state-of-the-art content blocking tools. We show how to classify methods in mixed scripts, which combine tracking and functionality, to implicitly localize the code responsible for tracking behavior. A key challenge in adapting traditional localization approaches to our problem is to find a rigorous suite of test cases (i.e., inputs labeled with their expected outputs). We address this challenge by using existing filter lists (14; 15) to label tracking and functional behaviors during a web page load. By pinpointing the genesis of a tracking behavior even when it is mixed with functional behavior (e.g., method in a bundled script), TrackerSift paves the way towards finer-grained content blocking that is more resilient against circumvention than existing content blocking tools.
Results. Our measurements on landing pages of 100K websites show that 17% (11.8K) of the domains are classified as mixed domains that serve both tracking and functional resources. Notable mixed domains include gstatic.com, google.com, facebook.com, facebook.net, and wp.com. The requests belonging to mixed domains are served from a total of 26.0K hostnames. Among these hostnames, 48% are classified as mixed. Notable mixed hostnames include connect.facebook.net, www.google.com, www.facebook.com, and fonts.gstatic.com. Among the mixed hostnames, TrackerSift classifies 94% of the (initiator) scripts are classified as tracking or functional. The remaining 6% (21.1K) mixed scripts bundle tracking and functionality. For the mixed scripts, TrackerSift classifies 91% of their methods as tracking or functional. The remaining 9% (5.5K) of the mixed methods are then separated using call stack analysis.
We summarize our key contributions as follows:
a large-scale measurement and analysis of mixed tracking and functional web resources; and
a hierarchical localization approach to untangle mixed web resources.
The main contribution of this work is twofold. First, we find quantitative evidence that mixed tracking and functional resources on the web are prevalent. Second, we present a technique called TrackerSift that aims to localize tracking resources from mixed and functional resources, helping content-blockers take effective action against tracking resources confidently. A key challenge in realizing such a technique is finding a test oracle capable of identifying if a web page’s behavior is tracking or functional. To address this challenge, we use filter lists to distinguish between tracking and functional behavior. A general theme in our work is a hierarchical analysis of web resources by progressively localizing tracking resources at increasing granularity. The hierarchical analysis plays a critical role in the early elimination of resources that are functional or tracking while concentrating the efforts on the mixed resources that we could not separate at the coarser granularity. Below we describe TrackerSift’s hierarchical analysis and the corresponding granularities.
Domain classification. At webpage load time, multiple network requests initiate to gather content from various network locations addressed by their URLs. We capture such script-initiated requests’ URLs and apply a filter list to label them as tracking or functional. We then extract the domain names from request URLs and pass on the label from URLs to domain names. We call this granularity at the top-level web resource as domain-level. For each domain, we maintain a tracking count and functional count. All the domains that are classified as tracking or functional are set aside at this level. The rest representing mixed domains serving both tracking and functional requests are further examined at a finer granularity. For instance, in Figure 1, the domain ads.com and news.com serve solely tracking and solely functional content, respectively. The domain google.com serves both and thus needs analysis at a finer granularity.
Hostname classification. At the domain level, we find the requests served by mixed domains and extract their hostnames. We increment the tracking and functional count for each hostname within mixed domains based on the corresponding request’s label. The hostnames serving both tracking and functional requests are further analyzed at a finer granularity, while the rest are classified as either tracking and functional. We call this level as hostname level. In Figure 1, google.com was previously classified as mixed and therefore, all hostnames belonging to google.com need to be examined. We classify ad.google.com and maps.google.com as tracking and functional, respectively, whereas cdn.google.com is mixed.
Script classification. We locate the script initiating the request to a mixed hostname and label it as either functional or tracking, reflecting the type of request they initiate. Similar to other levels, we measure the count of tracking and functional requests from each script and redistribute them into three buckets: functional, tracking, and mixed scripts, where mixed scripts will be further analyzed at a finer granularity. In Figure 1, sdk.js, clone.js, and stack.js initiate requests to mixed hostname on test.com. As clone.js serves mixed resource, it requires fine-grained analysis.
In this section, we describe our browser instrumentation to crawl websites and label the collected data.
Crawling. We used Selenium (42) with Chrome 79.0.3945.79 to automatically crawl landing pages of 100K websites that are randomly sampled from the Tranco top-million list (Pochat et al., 2018) in April 2021. Our crawling infrastructure, based on a campus network in North America, comprised of a 13-nodes cluster with 112 cores at 3.10GHz, 52TB storage, and 832GB memory. Each node uses a Docker container to crawl a subset of 100K webpages. The average page load time (until onLoad event is fired) for a web page was about 10 seconds. Our crawler waits an additional 10 seconds before moving on to the next website. Note that the crawling is stateless, i.e., we clear all cookies and other local browser states between consecutive crawls.
As shown in Figure 2, our crawler was implemented as a purpose-built Chrome extension that used DevTools (18) API to collect the data during crawling. Specifically, it relies on two network events: requestWillBeSent and responseReceived for capturing relevant information from HTTP requests and responses during the page load. The former event provides detailed information for each HTTP request such as a unique identifier for the request (request_id), the web page’s URL (top_level_url), the URL of the document this request is loaded for (frame_url), requested resource type (resource_type), request header, request timestamp, and a call_stack object containing the initiator information and the stack trace for script-initiated HTTP requests. The latter event provides detailed information for each HTTP response such as response headers and response body containing the payload.
Classifying Mixed Resources. We compute the logarithmic ratio of the number of tracking to functional requests to quantify mixing of tracking and functional resources.
An each granularity, we classify resources with the common logarithmic ratio less than -2 as functional because they triggered 100 more functional requests than tracking requests. Similarly, we classify resources with the common logarithmic ratio more than 2 as tracking because they triggered 100 more tracking requests than functional requests. The resources with the common logarithmic ratio between -2 and 2 are classified as mixed.
Results Summary. Table 2 summarizes the results of our crawls of the landing pages of 100K websites. Using the aforementioned classification, we are able to attribute 54% of the requests to tracking or functional domains. The remaining 46% (1129K) of the requests attribute to mixed domains that are further analyzed at the hostname-level. We are able to attribute 24% of the requests from mixed domains to tracking or functional hostnames. The remaining 76% (860K) of the requests attribute to mixed hostnames that are further analyzed at the script URL-level. We are able to attribute 84% of the requests from mixed hostnames to tracking or functional script URLs. The remaining 16% (135K) of the requests attribute to mixed script URLs that are further analyzed at the script method-level. We are able to attribute 72% of the requests from mixed script URLs to tracking or functional script method. This leaves us with 37K requests that cannot be cleanly attributed to tracking or functional resources and require further call stack analysis.
Next, we analyze the distribution of the ratio of tracking to functional. Figure 3 plots the distributions at domain, hostname, script URL, and script method granularities.
Domain classification. 2451K requests in our dataset are served from a total of 69,292 domains (eTLD+1). Figure 2(a) shows three distinct peaks: [2, ) serve mostly tracking requests, (-, -2] serve mostly functional requests, and (-2, 2) serve both tracking and functional requests. We can block 31% of the requests classified as tracking by blocking 6,493 domains that lie in the [2, ) interval. Notable tracking domains include google-analytics.com, doubleclick.net, and googleadservices.com, bing.com. We can allow 23% of the requests classified as functional by not blocking 50,938 domains that lie in the (-, -2] interval. Notable functional domains include CDN and other content hosting domains twimg.com, zychr.com, fbcdn.ne, w.org, and parastorage.com. However, 46% of requests are served by 11,861 mixed domains that lie in the (-2, 2) interval. These mixed domains cannot be safely blocked due to the risk of breaking legitimate functionality and not blocking them results in allowing tracking. Notable mixed domains include gstatic.com, google.com, facebook.com, facebook.net, and wp.com.
Hostname classification. 1129K requests belonging to mixed domains are served from a total of 26,060 hostnames. Figure 2(b) shows three distinct peaks representing hostnames that serve mostly tracking, functional, or both tracking and functional requests. We can block 14% of the requests from tracking 4,429 hostnames. We can allow 9% of the requests from 9,248 functional hostnames. However, 76% of requests are served by 12,383 mixed hostnames. Again, these mixed hostnames cannot be safely blocked due to the risk of breaking legitimate functionality and not blocking them results in allowing tracking. Take the example of hostnames of a popular mixed domain wp.com. The requests from wp.com are served from tracking hostnames such as pixel.wp.com and stats.wp.com, functional hostnames such as widgets.wp.com and c0.wp.com, and mixed hostnames such as i0.wp.com and i1.wp.com.
Script classification. 860K requests belonging to mixed hostnames are served from a total of 350,050 initiator scripts. Figure 2(c) again shows three distinct peaks representing scripts that serve mostly tracking, functional, or both tracking and functional requests. We can block 27% of the requests from tracking 194,156 scripts. We can allow 57% of the requests from 134,726 functional scripts. However, 16% of requests are served by 21,168 mixed scripts. Again, these mixed scripts cannot be safely blocked due to the risk of breaking legitimate functionality and not blocking them results in allowing tracking. For example, let’s analyze initiator scripts of a popular mixed hostname i1.wp.com. These requests from this hostname are the result of different initiator scripts on the webpage www.ibn24.tv. Specifically, a tracking request is initiated by script show_ads_impl_fy2019.js, whereas a functional request is initiated by script jquery.min.js. As another example, on the webpage somosinvictos.com, both tracking and functional requests with hostname i1.wp.com are initiated by the mixed script lazysizes.min.js.
Method classification. 135K requests belonging to mixed scripts are served from a total of 64,019 initiator script methods. Figure 2(d) again shows three distinct peaks representing methods that serve mostly tracking, functional, or both tracking and functional requests. We can block 17% of the requests from tracking 17,940 methods. We can allow 55% of the requests from 40,500 functional methods. However, 28% of requests are served by 5,579 mixed methods. Again, these mixed methods cannot be safely blocked due to the risk of breaking legitimate functionality and not blocking them results in allowing tracking. For example, let’s analyze initiator script methods for a the mixed script tfa.js on the webpage hubblecontacts.com. While both tracking and functional requests are initiated by the same initiator script tfa.js, the tracking request was initiated by get method and functional request was initiated by method X. As another example, let’s analyze initiator script methods for a the mixed script app.js on the webpage radioshack.com.mx. Here both tracking and functional requests are initiated by the same initiator script app.js and method Pa.xhrRequest.
In this section, we discuss some case studies, opportunities for future work, and limitations.
Circumvention strategies. We first highlight two common techniques, script inlining and bundling, used to mix tracking and functional resources.
Existing content blocking tools struggle with inlined and bundled tracking scripts without risking breaking legitimate site functionality. Fine-grained detection by TrackerSift presents an opportunity to handle such scripts by precisely detecting specific methods that implement tracking.
Future work. Our analysis shows that even at the finest granularity TrackerSift’s separation factor is 91% which leaves approximately 5.5K mixed methods for future work to analyze. One possible direction is to apply TrackerSift in the context of a mixed method initiating a request. We can define context as calling context, program scope, or parameters to the mixed method. In the case of calling context, we can perform a call stack analysis that takes a snapshot of a mixed method’s stack trace when the method initiates a tracking or functional request. We hope to see distinct stack traces from tracking and functional requests by a mixed method. We can consolidate the stack traces of a mixed method and locate the point of divergence, i.e., a method in the stack trace that only participates in tracking requests. We hypothesize that if we remove such a method, it will break the chain of methods needed to invoke a tracking behavior, thus removing the tracking behavior.
Figure 4 illustrates our proposed call stack analysis. It shows the snapshot of stack traces of requests nonads-2 and ads-2 in figure 1 initiated by a mixed method m2() on the webpage test.com. The two stack traces are merged to form a call graph where each node represents a unique script and method, and an edge represents a caller-callee relationship. The yellow color indicates that a node participates in invoking both tracking and functional requests. t in track.js is the point of divergence since it only participates in the tracking trace. Therefore, t is most likely to originate a tracking behavior which makes it a good candidate for removal.
Tracking method removal, while seemingly effective, has a risk of functional breakage, primarily when the method is classified as tracking through dynamic analysis. We propose a more conservative approach of tracker-blocking with the help of a tracker guard—a predicate that drops the tracking execution and let the functional execution pass. Such a predicate has a similar structure to that of an assertion. We envision using classic invariant inference techniques (Padhi et al., 2016; Ernst et al., 1999) on a tracker method’s calling context, scope, and arguments to generate a program invariant that holds across all tracking invocations. For example, a simple invariant could be url.hostname=="ads.google.com". If an online invocation satisfies the invariant, the tracker guard will drop the execution as it considers it to be tracking. A key challenge in this approach is collecting the context information, e.g., program scope, method arguments, and stack trace, for each request initiating the mixed method at runtime.
Relatedly, Firefox recently introduced SmartBlock that uses surrogate scripts to block tracking in mixed scripts while avoiding site breakage (20). These surrogate scripts are currently manually designed on a case-by-case basis to handle site breakage (41). TrackerSift can help scale up the process of designing surrogate scripts through automation.
Limitations. We briefly describe a few limitations our measurement and analysis. First, our measurements do not provide full coverage of the events triggered by user interactions (e.g., scroll, click). As with any other dynamic analysis approach, this could possibly lead to false positives (e.g., mixed resources classified as tracking) and false negatives (e.g., mixed resources classified as functional). Second, our measurements are limited to the landing pages and the results might vary for internal pages (Aqeel et al., 2020). Third, our analysis relies on error prone (Alrizah et al., 2019; Vastel et al., 2020) filter lists for ground truth labeling. We argue that filter lists are reasonably reliable for top-ranked websites. Moreover, our classification of mixed resources uses the logarithmic ratio that should be robust to mistakes less than an order of magnitude. Fourth, our method-level analysis does not distinguish between different anonymous functions in a script and treats them as part of one method. This limitation can be addressed by using the line and column number information available for each method invocation in the call stack. Finally, we selected the thresholds of -2 and 2 to classify mixed resources in Equation 1. As shown in Figure 3, this subjective selection reasonably separated mixed resources from tracking and functional resources.
6. Related Work
The problem of finding tacking-inducing code shares similarities with prior research on fault-inducing code localization. For example, widely popular spectra-based fault localization (SBFL) (Jones et al., 2002; Jones and Harrold, 2005; El-Wahab et al., 2018; Agarwal and Agrawal, 2014; Pearson et al., 2017; Souza et al., 2016) collects statement coverage profiles of each test, passing or failing, to localize the lines of code that are most likely to induce a test failure. Bela et al. (Vancsics et al., ) and Laghari et al. (Laghari et al., 2015) present a call frequency-based SBFL technique. Instead of coverage information, they use the frequency of method occurrence in the call stack of failing test cases. A method that appears more in the failing call stack of failing test cases is more likely to be faulty. In TrackerSift
, methods appearing more frequently in tracking script-initiated requests has a higher probability of being privacy-invasive. Abreu et al.(Abreu et al., 2007) studied how accurate these SBFL techniques are, and their accuracy is independent of the quality of test design.
Jiang et al. (Jiang et al., 2012) use call stack to localize the null pointer exception, and Gong et al. (Gong et al., 2014) generate the call stack traces to successfully identify 65% of the root cause of the crashing faults. One common limitation across most fault-localization approaches is that they require an extensive test suite capable of exercising faulty behavior, along with an instrumented runtime to collect statement-level coverage. TrackerSift overcomes the limitations by using filter lists as test oracle during page load time and uses an instrumented browser to capture fine-grained coverage.
We presented TrackerSift, a hierarchical approach to progressively untangle mixed resources at increasing levels of finer granularity from network-level (e.g., domain and hostname) to code-level (e.g., script and method). We showed that it is able to attribute 98% of all requests on 100K websites to tracking or functional resources by the finest level of granularity. Our results highlighted opportunities for fine-grained content blocking to remove mixed resources without breaking legitimate site functionality. Specifically, TrackerSift can be used to design surrogate scripts by automatically identifying and removing tracking methods from mixed scripts.
-  External Links: Cited by: §5.
- On the accuracy of spectrum-based fault localization. In Testing: Academic and Industrial Conference Practice and Research Techniques - MUTATION (TAICPART-MUTATION 2007), Cited by: §6.
-  Adblock Plus. Note: https://adblockplus.org/ Cited by: §1.
- Fault-localization techniques for software systems: a literature review. SIGSOFT Softw. Eng. Notes. Cited by: §6.
- Errors, Misunderstandings, and Attacks: Analyzing the Crowdsourcing Process of Ad-blocking Systems. In ACM Internet Measurement Conference (IMC), Cited by: §1, §1, §3, §5, §6.
- On landing and internal web pages: the strange case of jekyll and hyde in web performance measurement. In Proceedings of the ACM Internet Measurement Conference, Cited by: §5.
- How tracking companies circumvented ad blockers using websockets. In Proceedings of the Internet Measurement Conference (IMC), Cited by: §1.
-  Brave Browser. Note: https://brave.com/ Cited by: §1.
-  Browserify. External Links: Cited by: §5.
- Tough sell: why publisher ’turn-off-your-ad-blocker’ messages are so polite - digiday. Note: https://digiday.com/media/tough-sell-publisher-turn-off-ad-blocker-messages-polite/ Cited by: §1.
- CNAME cloaking, the dangerous disguise of third-party trackers. Note: https://medium.com/nextdns/cname-cloaking-the-dangerous-disguise-of-third-party-trackers-195205dc522a Cited by: §6.
- Characterizing cname cloaking-based tracking on the web. IEEE/IFIP TMA’20, pp. 1–9. Cited by: §1, §6.
-  EasyList. Note: https://easylist.to/easylist/easylist.txt Cited by: §1, §3.
-  (2020-06) EasyPrivacy. EasyList. Note: https://easylist.to/easylist/easyprivacy.txt(Accessed on 06/21/2020) Cited by: §1, §3.
- Graph mining for software fault localization: an edge ranking based approach. Journal of Communications Software and Systems 13, pp. 178–188. External Links: Cited by: §6.
- Dynamically discovering likely program invariants to support program evolution. In Proceedings of the 21st International Conference on Software Engineering, ICSE ’99, New York, NY, USA, pp. 213–224. External Links: Cited by: §5.
-  Extending devtools. External Links: Cited by: §3.
-  Facebook pixel: implementation. External Links: Cited by: §5.
-  (2021-03) Firefox 87 introduces smartblock for private browsing. External Links: Cited by: §5.
- Ad-blocking: a study on performance, privacy and counter-measures. In Proceedings of the 2017 ACM on Web Science Conference, Cited by: §1.
- Locating crashing faults based on crash stack traces. In arXiv:1404.4100, Cited by: §6.
-  (2020-07) Gorhill/ublock: ublock origin - an efficient blocker for chromium and firefox. fast and lean.. uBlock Origin. Note: https://github.com/gorhill/uBlock Cited by: §1.
- The Ad Wars: Retrospective Measurement and Analysis of Anti-Adblock Filter Lists. In IMC, Cited by: §1.
- Fault localization for null pointer exception based on stack trace and program slicing. In 2012 12th International Conference on Quality Software, Cited by: §6.
- Visualization of test information to assist fault localization. New York, NY, USA, pp. 467–477. External Links: Cited by: §6.
- Empirical evaluation of the tarantula automatic fault-localization technique. New York, NY, USA, pp. 273–282. External Links: Cited by: §6.
- Localising faults in test execution traces. In Proceedings of the 14th International Workshop on Principles of Software Evolution, IWPSE 2015, New York, NY, USA, pp. 1–8. External Links: Cited by: §6.
- CV-inspector: towards automating detection of adblock circumvention. In Network and Distributed System Security Symposium (NDSS), Cited by: §1, §1, §6.
- Ad Blockers: Global Prevalence and Impact. In ACM Internet Measurement Conference (IMC), Cited by: §1.
- Block Me If You Can: A Large-Scale Study of Tracker-Blocking Tools. In IEEE European Symposium on Security and Privacy, Cited by: §1.
- Detecting Anti Ad-blockers in the Wild . In Privacy Enhancing Technologies Symposium (PETS), Cited by: §1.
- Adblocking and Counter-Blocking: A Slice of the Arms Race. In USENIX Workshop on Free and Open Communications on the Internet, Cited by: §1.
- Data-driven precondition inference with learned features. SIGPLAN Not. 51 (6), pp. 42–56. External Links: Cited by: §5.
- The State of the Blocked Web. Note: https://pagefair.com/downloads/2017/01/PageFair-2017-Adblock-Report.pdf Cited by: §1.
- Evaluating and improving fault localization. In 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE), Vol. , pp. 609–620. External Links: Cited by: §6.
- Tranco: a research-oriented top sites ranking hardened against manipulation. arXiv preprint arXiv:1806.01156. Cited by: §3.
- Why doesn’t my ad blocker block ‘please turn off your ad blocker’ popups? - vice. Note: https://www.vice.com/en_us/article/j5zk8y/why-your-ad-blocker-doesnt-block-those-please-turn-off-your-ad-blocker-popups Cited by: §1.
-  Security/trackingprotectionbreakage. External Links: Cited by: §5.
-  Selenium. Note: http://docs.seleniumhq.org/ External Links: Cited by: §3.
- Filter List Generation for Underserved Regions. In The Web Conference, Cited by: §1, §3.
- Spectrum-based software fault localization: a survey of techniques, advances, and challenges. ArXiv abs/1607.04347. Cited by: §6.
-  Call frequency-based fault localization. Cited by: §6.
- Who Filters the Filters: Understanding the Growth, Usefulness and Efficiency of Crowdsourced AdBlocking. In ACM SIGMETRICS/Performance, Cited by: §1, §5.