A machine learning approach for detecting CNAME cloaking-based tracking on the Web

09/29/2020
by   Ha Dao, et al.
0

Various in-browser privacy protection techniques have been designed to protect end-users from third-party tracking. In an arms race against these counter-measures, the tracking providers developed a new technique called CNAME cloaking based tracking to avoid issues with browsers that block third-party cookies and requests. To detect this tracking technique, browser extensions require on-demand DNS lookup APIs. This feature is however only supported by the Firefox browser. In this paper, we propose a supervised machine learning-based method to detect CNAME cloaking-based tracking without the on-demand DNS lookup. Our goal is to detect both sites and requests linked to CNAME cloaking-related tracking. We crawl a list of target sites and store all HTTP/HTTPS requests with their attributes. Then we label all instances automatically by looking up CNAME record of subdomain, and applying wildcard matching based on well-known tracking filter lists. After extracting features, we build a supervised classification model to distinguish site and request related to CNAME cloaking-based tracking. Our evaluation shows that the proposed approach outperforms well-known tracking filter lists: F1 scores of 0.790 for sites and 0.885 for requests. By analyzing the feature permutation importance, we demonstrate that the number of scripts and the proportion of XMLHttpRequests are discriminative for detecting sites, and the length of URL request is helpful in detecting requests. Finally, we analyze concept drift by using the 2018 dataset to train a model and obtain a reasonable performance on the 2020 dataset for detecting both sites and requests using CNAME cloaking-based tracking.

READ FULL TEXT
research
12/04/2018

Tracking the Pixels: Detecting Web Trackers via Analyzing Invisible Pixels

Web tracking has been extensively studied over the last decade. To detec...
research
05/01/2020

On Detecting Hidden Third-Party Web Trackers with a Wide Dependency Chain Graph: A Representation Learning Approach

Websites use third-party ads and tracking services to deliver targeted a...
research
03/18/2022

Trackers Bounce Back: Measuring Evasion of Partitioned Storage in the Wild

This work presents a systematic study of navigational tracking, the late...
research
05/22/2018

AdGraph: A Machine Learning Approach to Automatic and Effective Adblocking

Filter lists are widely deployed by adblockers to block ads and other fo...
research
01/26/2023

ASTrack: Automatic Detection and Removal of Web Tracking Code with Minimal Functionality Loss

Recent advances in web technologies make it more difficult than ever to ...
research
08/06/2019

Who's Tracking Sensitive Domains?

We turn our attention to the elephant in the room of data protection, wh...
research
11/11/2021

Classification of URL bitstreams using Bag of Bytes

Protecting users from accessing malicious web sites is one of the import...

Please sign up or login with your details

Forgot password? Click here to reset