A4 : Evading Learning-based Adblockers

01/29/2020 ∙ by Shitong Zhu, et al. ∙ 0

Efforts by online ad publishers to circumvent traditional ad blockers towards regaining fiduciary benefits, have been demonstrably successful. As a result, there have recently emerged a set of adblockers that apply machine learning instead of manually curated rules and have been shown to be more robust in blocking ads on websites including social media sites such as Facebook. Among these, AdGraph is arguably the state-of-the-art learning-based adblocker. In this paper, we develop A4, a tool that intelligently crafts adversarial samples of ads to evade AdGraph. Unlike the popular research on adversarial samples against images or videos that are considered less- to un-restricted, the samples that A4 generates preserve application semantics of the web page, or are actionable. Through several experiments we show that A4 can bypass AdGraph about 60 significant margin of 84.3 web page due to these perturbations are imperceptible. We envision the algorithmic framework proposed in A4 is also promising in improving adversarial attacks against other learning-based web applications with similar requirements.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

As adblockers have gained popularity in recent years [13], ad publishers have started fighting back towards recovering their revenues. Specifically, many techniques have emerged towards circumventing the current generation of adblockers [20]. Notably, prior work [20] has shown that from among Alexa’s top 10K websites, including many social media sites, more than 30% have client-side JavaScript code that serve as countermeasures against adblocker use.

Conventionally, adblockers rely on manually curated (and maintained) blacklists, with rules/signatures that are matched against resource request URLs sent from the browser and the elements rendered in a web page. Unfortunately, maintaining such blacklists does not scale and is error-prone. Furthermore, they are also fairly easy to subvert (just as antivirus signatures) [12, 8]. Given these limitations, there has been a recent trend towards the emergence of machine-learning-based adblockers [9, 16, 17] towards improving the effectiveness and accuracy over signature-based adblockers. Such adblockers can be categorized into “perceptual” and “non-perceptual” classes.

Perceptual adblockers [16, 17]

block ads by recognizing visual cues (e.g. ”sponsored” or other marketing keywords) in the web page. It is claimed that these are more robust because some regulators (e.g. FTC) require publishers to disclose the “ad nature” of their online content. However, recent research has shown that these vision-based adblockers can be easily fooled by adversarial examples; this is an artifact of the recent advances in adversarial machine learning (AML) 


. In brief, by only adding human-imperceptible perturbation pixels to the ad images, a classifier can be fooled.

In contrast, non-perceptual adblockers detect ads based on non-visual features such as the URL contents and page structure. The state-of-the-art of non-perceptual ML-based adblocker, arguably, is AdGraph [9] 111[15] is a concurrent effort on extending AdGraph to automatically generate filter lists but its classification pipeline code has not been released. Because of this our work targets only AdGraph; however, we anticipate that one can draw similar conclusions given the similarity in its design to AdGraph.. Unlike most previous works that rely on URL text or code structure alone, AdGraph builds a page loading graph for a web page with rich contextual information, and extracts features from it to classify HTTP(S) request nodes. The claim is that AdGraph’s use of fine-grained structural/contextual information provides improved robustness over other adblockers, since it requires significant changes to a web page to convincingly alter the structure towards concealing the ad. Furthermore, the use of non-visual features inhibit the applicability of adversarial attack techniques from the unconstrained domain (i.e., image) [18]. In a nutshell, a major contribution in this paper is that we show neither of these conclusions holds true with AML.

While there has been success on applying adversarial examples in unconstrained domains (e.g., images) [4], the feasibility of crafting such inputs in domains with more stringent constraints (e.g., web pages) remains largely unexplored. In the constrained web domain, the visual perceptibility is altered if the semantics of the web page are changed. Specifically, web pages are processed by browsers prior to user exposition (unlike images). Thus, rather than the magnitude of the perturbation being the most important criterion, what matters is that the rendered web page after applying the perturbation presents the same look-and-feel and functionality to the user. This requirement forces the perturbations to be what we call actionable and requires a rethinking of what constraints must be enforced while crafting adversarial samples.

Extending this principle to (ML based) adblockers, given the goal of perturbing an ad resource request to bypass the ML detection model: (i) the adversarial example should be actionable in that it must be “mapped back” to the appropriate valid web page and (ii) the modified request must preserve its original functionality of directing the requester to the remote ad server i.e., this requires the “functional” parts of the page to be equivalent before and after modification; only the other “non-functional” parts can be perturbed to fool the classifier. Our goal in this paper is to develop a tool (we call this A4) to automatically craft such actionable perturbations that can subvert AdGraph. Towards achieving this, our challenge is to realize the two properties below.

Feature-space actionability

: First, any perturbation that is generated in the feature-space must be bounded by domain-specific constraints. Such constraints can be expressed explicitly by a set of mathematical formulas that the perturbed feature vector must comply with, so as to preserve the functionality of the original web resource and the validity of the page. Since these constraints are typically manually identified (depending on the functionality to be preserved), the attacker needs to find a way to integrate them properly into the adversarial example generation algorithm.

Application-space actionability: Second, upon mapping the feature-space perturbations back to the raw input space (web page), the computed modifications may cause undesired changes either because they violate some structural constraints (e.g., the number of nodes in a DOM tree cannot be negative) or the violation of complicated inter-relationships between objects in the page, that are projected to the extracted feature values. Correcting these changes will require adding offsets to the adversarial perturbations from the feature-space. However, blindly applying them may render them non-adversarial, i.e., the attacker must account for these offsets when generating perturbations.

As our primary contribution, we design A4- Actionable Ad Adversarial Attack to generate perturbations that are actionable both in the feature-space and the application-space. A4 only requires minimal domain knowledge towards providing a set of seed features that can be mapped from feature space back to the input or application space. Specifically, it has the following desirable characteristics.

  • Efficient crafting: Inspired by the popular gradient-based attack, Projected Gradient Descent (PGD) [11], A4 iteratively accounts for the unique constraints of the web environment, to generate potent adversarial samples. Our evaluations show that A4 achieves a success rate of about 60% i.e., it subverts 60% of the inputs that were originally classified as ad/tracker by AdGraph, to benign. In comparison, a baseline attack that only accounts for actionability requirements for one iteration (non-iteratively) can only achieve a success rate of less than 33%, while a weaker baseline cannot generate any viable example.

  • Actionability: All perturbed web resources are guaranteed to comply with both the feature-space and application-space constraints. Such compliance makes these examples practical in that they still carry out their ad/tracker functionalities.

  • Stealthiness: A4 generates perturbations that have low detectability. In our setup, generated perturbations are bounded and concealed with respect to the corresponding web pages, which make them hard to be detected by adblockers; furthermore, they are invisible/imperceptible to users (except for displaying the ads).

2 Background and Related work

In this section, we provide brief background on adblockers and AML. We also discuss relevant related work.

Non-perceptual ML-based Adblocking. Because rule-based adblockers are plagued by scale/errors and demonstrable attacks, machine learning (ML)-based adblockers are emerging. Instead of relying on hard-coded blacklists that are manually curated and maintained, they use ML to infer the intrinsic patterns of ad/tracker resources from different representations of web pages. Previously, URL strings and JavaScript code have been used as features to represent web resources in ML models [2]. However, these attempts have low accuracy because the representations used are incomplete in capturing the distinguishing characteristics of ad and non-ad resources. This led to AdGraph [9], a more recent work on identifying ad resources using ML; (arguably) because this is the state-of-the-art in this field, we consider this as our target in this paper.


. By instrumenting the browser core, AdGraph collects a comprehensive set of browser-internal events to stitch together a graph that represents the interactions among the HTML page elements, network requests, and JavaScript executions (e.g., a web element is dynamically created by a script). This representation is then used to train a classifier for identifying advertising and tracking resources. With support from this rich loading context, AdGraph extracts 65 features from a resource load, and classifies the request based on these features. These features can be categorized into two types: structural and content-based. Content-based features include (but not limited to) certain susceptible ad-related keywords in the URL and the requested resource type (e.g. image, iframe). AdGraph’s classifier uses Random Forest as the underlying model, which is non-differentiable. As discussed later in §


, this choice hinders traditional AML based attacks as they require gradient information to guide the adversarial example generation. Moreover, from the 65 features AdGraph uses, 5 of them are categorical i.e., will be converted into more than 250 sparse one-hot-encoded features. Such sparsity not only poses new challenges for existing adversarial attacks that expect dense data, but require additional constraints to ensure the validity of the one-hot vectors (we discuss how

A4 overcomes these in §3).

Adversarial attacks on ML models. The common setup for adversarial attacks against binary ML classifiers is that given a model and an input that is classified as malicious, an attacker needs to modify the input to flip the model’s classification result. Formally, suppose a classifier defined by its prediction function and an input with its malicious label ; an attacker needs to find an adversarial transformation such that . The AML community defines different levels of model transparency to describe the knowledge that an attacker possesses with regards to the target classifier:

  • With White-box attacks, an attacker is assumed to know all the information about the model, including but not limited to the model internals (e.g., the classifier model type, parameters), the training dataset and feature definitions.

  • With Grey-box attacks, the attackers do not know the internals of the model, but know the training dataset and feature definitions. Further, the attacker can query the target classifier about the label for a specific input.

Gradient-based attacks. One popular attack is based on the Fast Gradient Sign Method (FGSM) [3]

, which leverages the gradients derived from the target classifier to compute the perturbation that maximizes its loss function with respect to the particular malicious input. Given the loss function of the target model

, FGSM computes its perturbation as , where is the norm constraint specified by the attacker. There are also other variants [5] that follow the ”loss-maximizing” philosophy used in FGSM. They are generally referred to as gradient-based attacks. Since these attacks all use the gradient information from the target model, they should be considered as white-box attacks.

Gradient-based attacks generate perturbations that are bounded based on different ( above) norms (e.g. , or ). These traditional norms, bounds, or thresholds (referred to as norms in the paper) measure the magnitude of the perturbation, and are thus primarily suitable for visual domain applications (lower norms generally mean less visually-detectable changes) wherein human imperceptibility is the auxiliary characteristic desired in a perturbation. In the web space however, the perturbed page has complex structures and is processed by the browser which parses and renders the page. Thus, the norms can no longer capture what is a “desirable perturbation,” and do not work well. In other words, new metrics are needed to effectively capture the properties of functionality preservation and stealthiness of the perturbed web page.

Projected Gradient Descent. Being a single-step attack, FGSM suffers from low success rates, especially when gradients cannot provide sufficiently accurate guidance (usually the case for non-white-box attacks). One can improve the success rate by applying FGSM iteratively; this is known as the Basic Iterative Method, or Projected Gradient Descent [11]. Essentially, PGD performs FGSM multiple times with a smaller step-size, or . Formally, the search procedure can be expressed as:


where, for a given a given input vector ,


A4 is inspired by the iterative philosophy used in PGD, and extends its simple clipping mechanism to an extensive feedback loop (§3), which seeks to produce actionable perturbations targeting ML-based adblockers.

3 A4: Actionable Ad Adversarial Attack

In this section, we describe how A4 crafts actionable (both in the feature and application spaces as discussed in § 1) and stealty adversarial examples in web domain.Recall that being actionable in the above two spaces refer to the following (i) in the feature space, explicit numeric constraints defined based on domain knowledge, in order to maintain the validity, functionality and stealthiness of the ad request, must be complied with by its perturbed adversarial feature vector; and (ii) in the application space, the perturbed feature vector must be successfully mappable back to the original web page. We point out that actionable perturbations in the feature-space are not naturally actionable in the application-space; implicit/unpredictable side-effects that could result when the feature-space perturbations are mapped back to application-space (e.g. a feature space perturbation of adding nodes to a page changes other features such as the average connection degree) must be considered when generating perturbations, by A4.

Threat Model

Before diving into the details of A4’s algorithm, we first define our threat model. As mentioned in §2, AdGraph is a full-fledged web browser with custom modifications for blocking ad/tracker resources. Generally, there are three participants when a user visits a website using AdGraph: a user, a hosting website and an ad publisher; their relationships are depicted in Figure 1.

Figure 1: Different participants in A4’s threat model

As shown, the objective of A4 is to help the website recover its ad revenue lost due to ads getting blocked by AdGraph. A4 achieves this by adding perturbations to content generated by the hosting website, so that the classifier used by AdGraph is fooled into mis-classifying the ad resource as a non-ad. We assume a grey-box attack setup as elaborated in §2

, because on one hand, even though AdGraph has been open-sourced, it is easy for other ML-based adblockers to hide their model internals; on the other hand, the web is a public space and datasets collected from it can be conveniently crawled and replicated in practice.

Recall that A4

is a gradient-based attack which requires the knowledge of model internals. However, AdGraph uses Random Forest (RF) as its classifier which is non-differentiable (i.e. no gradient can be computed), and thus, we need to find a way to estimate the gradients. Moreover, although AdGraph itself has been open sourced publicly

[9], we do not want to limit A4 to complete white-box setups only, which means that even if the target model is differentiable, its gradients can be inaccessible (e.g., when only its prediction APIs are exposed). Thus, we assume a grey-box threat model and make A4 a transfer-based attack where the attacker has no access to any model internals (model type, gradients etc.), but is aware of the training dataset and feature definitions (see §2). Based on the above, our perturbations should meet the following requirements from the practicality/usability perspective:

  • Transparent to ad publishers: As a third-party who pays the hosting website for displaying its ad contents, an ad publisher is generally reluctant to change the way they operate their services.

  • Easily deployable at the hosting website: From the perspective of the hosting website, the process of injecting perturbations into the target page should be mostly automatic and convenient. We envision an additional procedure in website deployment via which the web pages will go through to make changes to the page.


Optimization problem formulation. Formally, consider the optimization problem:

subject to

where, measures the cost of adding the generated perturbation, and denote the hyperspace that actionable examples can exist in the feature space and application space, respectively. Note that as discussed in §2, it is hard for conventional norms to capture the real cost of adding a perturbation. For this, we also modify the norm to take the domain uniqueness into account, as discussed in the next subsection.

Iterative search. Since the optimization problem defined in Equation 3 does not have an analytic solution [3], we instead approximate one iteratively through a search procedure. captured in the pseudo-code in Algorithm 1.

Input : target model , ad request , maximum iterations , maximum perturbation magnitude
Output : actionable adversarial example
1 while  and  do
2       GenerateFeatureSpacePerturbation() EnforceFeatureSpaceConstraints() MapBackToWebPage() ExtractFeatureValues() VerifyIfAdversarialOnTargetModel()
3 end while
Algorithm 1 A4: Actionable Ad Adversarial Attack

The search process not only enforces the feature-space constraints, but also incorporates corrections to address application-space side-effects that occur when mapping feature-space perturbations back to the web application domain. The key guiding principle is to take small steps (in each iteration) and corrective actions so that we are always on the right path. (details in the next subsection). To better illustrate the framework, we show for each iteration, how the generated original perturbation is moved in hyperspace to ensure its actionability in Figure 3. The intuition is that errors may accumulate across multiple iterations and can mislead us if we do not correct them at every step (and we take smaller steps for the same reason). Inspired by the iterative philosophy that underpins PGD, A4 also divides the overall optimization problem into multiple iterations.

Figure 2: Perturbation trajectory in hyperspace for a search iteration
Figure 3: Transfer-based attack

Transfer-based attack. To craft a successful grey-box attack, we need to use the dataset for training AdGraph to train a local surrogate model that is differentiable, and then use this model to estimate the gradients and craft adversarial examples accordingly. These type of attacks are considered “transfer-based” because the successful adversarial examples crafted locally need to be adversarial on a remote target model that is different and possibly unknown. We depict A4’s transfer-based attack generation in Figure 3. Prior research [14] has shown that this so-called inter-model transferability exist in almost all modern ML models (including non-differentiable ones such as Random Forest).

Perturbable feature selection

. Before delving into constraints, we need to first manually identify what features to perturb. These features must be perturbable, indicating that the attacker must know how to map the perturbations from the feature space back to concrete changes in the application space, i.e., the web page. As mentioned in §2, AdGraph has two categories of features: structural and content-based (URL-related). Generally, we follow three principles to pick features: (i) high impact; (ii) ease of manipulation and (iii) compliance of contraints (as defined in the next subsection). Towards satisfying (ii) and (iii), we start from URL features and pick 6 of them (#2-#7 in Table 1) as they intuitively indicate the ”ad-ness” and are relatively easy to tweak (simple string manipulations for keyword addition/removal). To further guarantee (i), we run ablational experiments by perturbing differnt combinations of features, and show that features #2-#7 provide quite limited evasion rates. Specifically, these URL features only offer a success rate of 37.10% (which as shown later is lower than what is achieved with A4 with all its features). Thus, as a key novelty in A4, we add structural feature #1 that is designed to encode the size of the graph. In order to avoid affecting ad functionality of the request (part of principle (iii)), we limit our pertubation on #1 to being additive only. Under the principles, after systematically analyzing all the 65 features used in AdGraph, we identify 7 seed features from both (URL and structural) categories that the hosting website of the ad request can conveniently control and perturb in practice. Table 1 shows their semantics and data types.

Feature-Space Constraint Enforcement

In this subsection, we describe how A4 imbibes explicit numeric constraints in the feature space. Since the constraints defined in the feature space can be generally considered for three main purposes - validity, functionality and stealthiness, we need to enforce them differently.

Validity constraints. These constraints keep the perturbed features numerically valid i.e., they guarantee that they fall within meaningful domains of definitions. For instance, features #1 and #7 are counts of nodes and characters, and cannot be negative or non-integers; binary features #2 to #6 should always take values of a 0 or a 1. These constraints are enforced by projecting any perturbed value falling outside the meaningful domain of definition back to the domain. Concretely, we define three projection operations for binary and integer types of perturbable features in EnforceFeatureSpaceConstraints() in Algorithm 1:


Through these operations, the perturbed features of our choice are guaranteed to be valid in the feature space.

Functionality constraints. Besides validity, A4 also needs to ensure that the generated feature-space perturbations won’t break any functionality of the original ad request. Hence, we also enforce functionality constraints onto the perturbed features. To do so, we follow two principles which we refer to as non-decreasing and semantic equivalence. For counter-like features like #1 and #7, we limit their perturbed values to be greater than or equal to the original values; otherwise, we project the modified value back to its original value. We refer to this as the “non-decreasing principle.” This projection reflects our assumption that adding information to the web page should not break any existing functionality, but removing existing items might harm the semantics in an unpredictable fashion.

No. # Meaning Data Type
Total number of nodes in the graph
at the time of classification
If predefined ad keywords appear
in the URL string
If predefined special characters
appear in the URL string
If any semicolon appears in
the URL string
If the base domain of the current
page appear in the URL string
If predefined ad dimension keywords
appear in the URL string
7 Length of request URL Integer
Table 1: Perturbed features

For URL features #2 to #7, we need to ensure that after perturbations, the original functionalities/semantics of the request are still preserved. We leverage the fact that for HTTP(s) requests, different character encoding schemes end up delivering the same information to the ad server, as long as the ad server can decode the messages properly. In this case, for features #2 to #6 that detect predefined keywords/characters from the URL string, we can simply change the default ASCII encoding to HTML encoding. Since AdGraph assumes all URL text to be ASCII encoded, our perturbations can bypass its detection over all URL-related features. For feature #7, we choose to append random characters to increase its value, and place the appended string as an unused query, which avoids disrupting other functional parts of the URL. Note that these URL manipulations introduce changes to the request received by the server, ane therefore might require cooperation from the server-side (e.g. support of different text encodings). We argue such cooperation is rather convenient and fits our threat model mentioned previously — the hosting website and third-party ad publishers are motivated to slightly change their server configurations to bypass adblockers and recover revenues. We refer to this latter process as preserving “semantic-equivalence.”

For clarity, we also define the enforcement operations for functionality constraints formally as follows:


Stealthiness constraints. Besides validity and functionality, the generated perturbations should also achieve a high level of stealthiness. Specifically, the perturbations that A4 applies on features will have to be limited by a threshold. Conventionally, the perturbation size is measured via the use of norms. However, these norms are unsuitable for AdGraph’s feature set. First, with many binary/categorical features, use of norms blindly treats all features as having the same scale, which is not the case in reality. For example, changing a binary feature from 0 to 1 means that the status it represents has flipped. This is fundamentally different from an integer feature changing by the same amount; for the latter, it could indicate that its real value has changed from a minimum to a maximum value (due to data normalization happened in dataset pre-processing). Thus, if we set a threshold to 0.3, binary features can never be flipped (as the flipping threshold is 0.5), whereas integer values can still change even if a normalization is applied. To avoid such scale mis-interpretations, we propose a customized norm which is defined as follows.


Besides customizing the norm, we also slightly modify the operation for clipping a perturbation within the norm. Specifically, conventional clipping functions (e.g. the one used by PGD) regard the global range of a particular feature across the whole dataset as the base of the clipping threshold, for conventional features. For web pages, such clippings can easily lead to overly large perturbations as the ranges of many numeric features can vary very drastically from website to website. Therefore, we change the clipping from relying on a global range to a local per-website range, as formally defined in Equation 7


where is the global threshold, is the local threshold, is the global range of with respect to this particular feature in the training dataset, given by , and is the standard clipping operation defined in Equation 2. As shown in §4, our customized norm along with the localized clipping operation, helps limit the effective size of generated perturbations, and thus improves the stealthiness significantly compared to the traditional setup of norms and global clipping.

Application-Space Side-Effect Incorporation

Now that we have generated feature-space adversarial perturbations that comply with manually-defined domain constraints, we need to map them back to concrete changes in the web page representations. As discussed previously, ideally these perturbed feature values should all be reflected accurately in the page. This can be overt if we can re-extract the feature vector from the perturbed web page and verify that it matches the expected one.

However, introducing changes (e.g. total number of nodes) to the web page can bring about unpredictable offsets to values of other features that are not included in the feature-space perturbations. Specifically, there are several inter-dependent features considered by AdGraph such as average_degree_connectivity. As we add perturbations nodes to the page to perturb the feature counting the total number of nodes in the graph, feature average_degree_connectivity might also nondeterministically change as the maximum per-node connection degree is raised, which might end up turning an adversarial perturbation into non-adversarial. More critically, such feature value offsets/drifts are impossible to be prediceted, and therefore cannot be pre-computed in closed-loop formulas, which motivates our design of executing the feedback loop.

Feedback loop.

Figure 4: Proposed feedback loop in A4’s each search iteration
Figure 5: Different mapping-back strategies on adding perturbation nodes

To incorporate such unpredictable side-effects, we passively observe how changes in one feature leads to changes in others. Specifically, we first map the controllable feature-space perturbations back to the web pages by “rendering” the page in a lightweight fashion (we use the timeline structure in Chromium [10]) to do so and then re-extract all the features to capture the side-effects. We finally verify if the final perturbation (with side-effects) can still evade detection. If so, we are done; else, we continue the iterative search procedure to find another candidate perturbation (we enlarge the current step size by a step size to generate a new gradient). Effectively, we have created an automated feedback loop as illustrated in Figure 5.

Mapping-back strategies. For some features, there are multiple ways to concretize the feature-space perturbations as changes to web pages (step 4 in Figure 5). For instance, there are multiple ways to increase the total number of nodes in a page (feature #1 in Table 1). We can choose to place these nodes either as the children of a single existing node (centralized strategy), or as the children of multiple existing nodes (distributed strategy), as shown in Figure 5. These different mapping-back strategies introduce different side-effects to the feature values, and can hence affect the effectiveness of the final adversarial example (as depicted by the red and green points in Figure 3). One example is that for the feature average_degree_connectivity, the centralized strategy is likely to lower the feature value significantly after the map-back as the added nodes cause crowding and thus, raise the current maximum number of connections per node in the graph; this is the denominator in the formula that computes average_degree_connectivity. In contrast, the distributed strategy tends to have negligible side-effects with respect to this feature. In order to maximize the chance of finding a successful adversarial example, we apply all feasible mapping-back strategies in the feedback loop, and then verify their results. This helps A4 discover as many green point cases as possible (Figure 3).

4 Evaluation

In this section, we first evaluate A4’s effectiveness in terms of its success rate and algosithm convergence in crafting adversarial examples against AdGraph; then, we analyze the generated perturbations from several perspectives; finally, we assess A4’s performance overhead 222We will open source the implementation of A4’s attack pipeline with its datasets for reproducibility and for future extensionsi, at the time of publication..

Dataset. Since A4 primarily targets the current version of AdGraph, we need to first replicate the classification performance reported in [9] to ensure that our evaluation is sound. In order to do so, we reached out to the authors of [9] and reproduced the classification pipeline that they used for AdGraph. Given that the web crawl conducted and presented in [9] was in early 2018, and is therefore outdated, we carried out a new crawl on September 15th, 2019 to collect the graph representation of the landing pages of Alexa’s top 10k websites. Then, we processed these graphs and extracted 65 features to form the dataset ready for ML tasks. Table 2 lists some basic statistics from the crawled dataset.

Stat Value
# successfully crawled
records (distinct requests)
# successfully crawled
websites (distinct domains)
# features before one-hot encoding 65
# features after one-hot encoding 332
# categorical features 4
# binary features 36
# numeric features 25
# records to perturb
(distinct requests)
Table 2: Dataset statistics

From the 586,218 request records, we pick 60,000 as the test set (i.e. the remaining 526,218 records are used as a training set). These are used to test the accuracy of trained classifiers and craft adversarial examples. To showcase A4’s effectiveness over high-impact hosting websites, we select resource requests collected from Alexa’s top 500 sites to be included in testing set, from which we randomly pick 1,430 unique requests as the target ad resources to be perturbed.

Model training. To replicate the target classifier in [9], we use the popular open-source machine learning library scikit-learn to train a Random Forest (RF) model based on the crawled training dataset. This is then used as the target model that A4 queries, for each given perturbed example, to verify the attack result. We show the model’s hyper-parameters and classification accuracy metrics over the partitioned testing set in Table 3.

# trees 100
Split criterion entropy
tree depth
Precision 0.79
Recall 0.82
Accuracy 0.94
Table 3: Remote RF’s hyper-parameters and accuracy metrics

As depicted, the accuracy metrics with our reproduced RF model are close enough to the one reported in Table 3, which validates our replication effort.

We need a local surrogate model that is differentiable (recall §3

), to drive our gradient-based attack. To this end, we use a 3-layer Neural Network (NN) to be the surrogate model; we point out that a NN is considered to have the best model capacity

[14] for imitating the decision boundaries of other models. The hyper-parameters and accuracy metrics of this NN are in Table 4. Note that in order to best mimic the remote decision boundary, we use the dataset that trains the target RF classifier and labels given by the target model, instead of ground-truth labels, to train the local NN. Hence, the accuracy in Table 4 represents the agreement rate between two models.

# hidden layers 3

# neurons

(1024, 512, 128)
Dropout rate 0.1
Accuracy (agreement rate) 0.91
Table 4: Local NN’s hyper-parameters and agreement rate

Baseline attacks. For comparison, we consider two baseline attacks we describe below (the descriptions also illustrate the necessity of our proposed solution).

  • Weak baseline: In this attack, we apply the standard PGD and do not enforce any feature-space constraints, other than the basic perturbation size limit ( in Equation 1 and 2) and a basic domain of definition (i.e., ).

  • Strong baseline: In this attack, we apply A4 for one iteration only (i.e., enforce constraints and execute the feedback loop once), instead of performing iterative repetitions. The purpose of this setup is to validate the benefits/advantages of the proposed iterative search framework. Without multi-iteration corrections, we anticipate difficulty in achieving high success rates, mainly because single-step approach might lead to application-space offsets that disrupt the adversarial nature of the example.

Table 5 shows the hyper-parameters used with our weak/strong baselines and A4. Note that the parameter enforcement interval here refers to the number of steps we take in the gradient-based search before we enforce the constraints and call it one iteration of the feedback loop. We operate at units of intervals instead of steps, because multiple steps are often required for binary features to be transitioned across the flipping threshold of 0.5, as explained in §3. Note that to avoid confusion, we use the term “iteration” to refer to an enforcement interval. These parameters are empirically chosen, and we have tuned them to pick the best parameter set that are shown to yield best performances. We also would like to point out that out of the different combinations of parameters, the improvement achieved by A4 over baseline attacks are generally consistent.

Hyper-parameter Value
( in Algorithm 1)
Step-size 0.07
Maximum global perturbation
threshold ( in Equation 2)
Maximum local perturbation
threshold ( in Equation 2)
Enforcement interval
Table 5: Hyper-parameters used for attacks

Success rate. We summarize the results achieved by three attacks in terms of their success rates of finding actionable adversarial examples out of the 1,430 ad requests sampled from Alexa’s top 500 websites, in Table 6. We see that A4 achieves the highest success rate in generating mis-classified examples while guaranteeing their actionability. It is almost twice as successful (84.3% improvement) as the strong baseline. This large margin of improvement shows the power of the iterative search adopted by A4. In comparison, the weak baseline fails to produce any valid perturbation because if none of the constraints is enforced, features of data types like binary or categorical can be changed into meaningless values (e.g. in a one-hot-encoded vector, more than one feature becomes 1). This makes it impossible for these perturbed examples to be rendered in the browser at all.

Attack Success Fail
Weak baseline 0 (0%) 1430 (100%)
Strong baseline 465 (32.52%) 965 (67.48%)
A4 857 (59.93%) 573 (40.07%)
Table 6: Breakdown of attack results

Attack convergence. In Figure 6, we show a histogram of the number of iterations a that are needed to reach convergence for all the successful adversarial perturbations generated by A4, We see that most (>90%) cases converge within 5 iterations; this shows that the iterative enforcement of needed constraints and the feedback loop in A4 are extremely efficient in generating the adversarial examples.

Figure 6: Attack convergence analysis

Perturbation analysis. To better understand the generated perturbations, we investigate (1) feature significance, which shows for all successful examples, what features are modified more/less frequently in the generated perturbations; (2) feature perturbation stats, which capture the statistics of successful perturbations with respect to each feature; and (3) mapping-back strategy significance, which captures “which mapping-back strategy is more likely to lead to a successful perturbation.” We show the resulting histogram/tables of these analysis in Figure 7, Table 7 and 8, respectively.

Figure 7: Feature significance analysis
Feature No. #1 #2 #3 #4 #5 #6 #7 Ave. 865.06 (0.9%) -0.08 0.17 0.48 0.40 0.44 21164.71 (19.3%) Max. 13825 (14.5%) n/a n/a n/a n/a n/a 32937 (30.0%) Min. 0 n/a n/a n/a n/a n/a 0 Table 7: Per-feature perturbation statistics
Total Centr. Distr. Both
# 857 66 (7.70%) 64 (7.47%) 727 (84.83%)
Table 8: Mapping-back strategy significance analysis

From Figure 7, we observe a high level of significance (above or close to 50%) associated with all features. This indicates their significant importance in crafting successful perturbations. With the most significant feature being #7, the length of the request URL, it’s shown that increasing the length of ad requests helps raise their adversarial potential the most from among all the perturbed features. Note that this seemingly contradicts the conclusion reached in [9], which suggests that longer URLs predict higher “ad-ness” from the overall data distribution. We argue that these two observations are actually compatible: for adversarial examples, we are essentially exploiting the discrepancies between target model’s decision boundary and the real patterns that can accurately distinguish ad and non-ad requests, bounded by the constraints and perturbation cost budget. The overall data distribution presented in [9] does not define the local discrepancy, or perturbation space with respect to a particular example. The existence of such adversarial areas in hyperspace exactly showcases the vulnerability that ML models rely too much on overall statistical cues to make classification decisions.

Table 7 summarizes the statistics of the perturbed features. Note that the percentages appended below the statistics of numeric features #1 and #7 are their relative ratios with respect to the data ranges from the crawled dataset, indicating their relative changes (either increase or decrease) compared to the original values. Through these ratios, we can see that by customizing the norm and enforcing per-page limits (recall §3), A4 has shrunken the size of the generated perturbations for numeric features (i.e. #1 and #7). For example, for feature #1, on average, only 0.9% additional nodes (with respect to the maximum total number of nodes in the dataset) are added to the page to flip the classification result. Smaller number of perturbation nodes (added) brings two advantages: (1) better stealthiness of the attack, since (a) it is harder for adblockers to differentiate the perturbation from normal nodes, when there are fewer dummy nodes and (b) these nodes are properly obfuscated against simple detection as A4 does in its implementation; and (2) lower performance overhead for page loads.

Table 8 reports the significance of different mapping-back strategies that A4 tried in its feedback loop. As can be seen, in more than 80% of the successful perturbations, both centralized and distributed mapping-back strategies are attempted (to a similar degree) to craft actionable adversarial examples. However, in close to 15% of the test cases, only one strategy succeeds. These cases show the advantage of applying multiple strategies to cope with the unpredictable application-space side-effects into the iterative search procedure used in A4. Essentially, more valid green points (as depicted in 3) can be discovered with additional strategies.

Performace overhead. Lastly, we report the performance overheads with A4. This comprises of two components: (i) offline feature-space pertubration computation and (ii) online perturbation loading/rendering. For (i), each example takes 5.8 min on average (from 1,430 ad requests). We argue this is reasonable for an offline pre-computation setup in exchange for recovering ad revenue. For (ii), rendering generated perturbations (adding 39 nodes) for a typical ad request http://securepubads.g.doubleclick.net/gpt/pubads˙impl˙2019121002.js on https://www.kompas.com/ incurs an average overhead of 0.23 sec (from 100 loads), and is hence, minuscule compared to the original page load time 20.9s.

5 Discussions and Limitations

Generalizability. Although A4 primarily targets an ML-based adblocker at this point, we would like to argue that its iterative search procedure and feedback loop are general enough to be applied to other AML scenarios in the web domain, or even other domains. At a high level, any ML task that (1) fits the feature-/application-space paradigm as the generated examples should be actionable in both spaces, (2) is constrained in feature-space due to the validity/functionality requirements as discussed in §3, and (3) has associated side-effect offsets upon mapping from feature-space to application-space (as shown in Figure 3), can be seen to be compatible with A4’s methodology.

For example, there are similar characteristics for ML-based malware classifiers. First, most learning-based malware detectors still rely on feature extraction, leaving the space of the discrepancy between these two spaces to be exploitable by

A4. Second, common feature sets used by ML malware classifiers include a large portion of constrained feature types, such as categorical (e.g., file type) and binary (e.g., whether a specific permission is requested [6, 7, 19]). These constraints can be properly defined and enforced using the operations proposed by A4. Lastly, different features used by malware detectors can also be inter-dependent. For example, changing the file type can lead to a different set of allowable permissions (e.g., system app can request more permissions than regular apps). This can also be incorporated leveraging the feedback loop in A4. To summarize, we anticipate it to be a promising direction to apply A4’s framework to other applications with adjustments, and leave it as future work.

Improving ML-based adblocking. Given the evaluation results and multi-dimensional analysis conducted in §4, we would like to summarize some useful insights to improve the design and implementation of AdGraph, and other ML-based adblockers for detecting more ads and trackers accurately and robustly, amid rising attempts from ad publishers to cloak their ads. First, ML-based adblockers should in general shed more light into their feature set selection. During our analysis of AdGraph’s features, for example, we find that there are several global features (e.g. #1 in Table 1) that encode the overall size of the constructed graph. Since these features do not describe anything local with respect to the request node being classified, they are of lower importance, but leave unnecessary perturbation space for adversarial attacks. Evidentally, the feature significance analysis presented in §4 and information gain from the Random Forest used in [9] both verify the relatively low predicative/distinguishing power of these global features.

Second, as URLs are the most functional part of a HTTP(s) request, adblockers should make more efforts in parsing them in order to prevent string manipulation tricks. For AdGraph, our analysis shows that it does not handle HTML’s special character encoding very well, which opens room for the stealthy perturbation of URL-related features. More generally, adblockers are advised to implement more sanitizations to reduce their chances of taking adversarial perturbation as normal contents as much as possible, or put less weight on the URL-related features.

Enhancing hosting websites for evasion. For hosting websites that have incentives to fight against the emerging ML-based adblockers, our investigations suggest three aspects they can adjust with regards to their pages, to make it easier to mount adversarial subversions against learning-based adblockers: (1) maintaining a sufficiently high level of page complexity to guarantee adequate space and flexibility for placing adversarial perturbations; if there are too few elements in the original web page, adversarial perturbations are naturally more detectable; (2) providing the necessary setups (e.g. proxy/rotating servers) to accommodate perturbations on URL-related features. This aligns with the common practice nowadays adopted by some websites countering rule-based adblockers [20].

Completeness of A4’s implementation. As noted in §3, currently A4 is designed to perturb only 7 features from the possible 64. Although this conservative limit makes our attack stealthier in practice, we do plan to explore perturbations on more features in the future and further analyze their effectiveness. Moreover, our current A4 implementation only has two mapping-back strategies (centralized and distributed) as discussed in §3. While we argue that even with these two strategies, we can already showcase the power of our proposed feedback loop, we will explore additional strategies in the future to expand our search space.

6 Conclusions

In this paper, we present the design and implementation of A4 (Actionable Adversarial Ad Attack), a new adversarial attack targeting the state-of-the-art learning-based adblocker AdGraph. Unlike previous work on generating adversarial samples on unconstrained domains, A4, explicitly accounts for constraints that arise in the context of the web domain. We show promising results in this unique domain which can have substantive implications in online advertising.


  • [1] AdblockPlus (2020)

    Meet sentinel, the artificial intelligence ad detector.

    Note: https://adblock.ai/ Cited by: A4: Evading Learning-based Adblockers.
  • [2] S. Bhagavatula, C. Dunn, C. Kanich, M. Gupta, and B. Ziebart (2014) Leveraging machine learning to improve unwanted resource filtering. In Proceedings of the 2014 Workshop on Artificial Intelligent and Security Workshop, pp. 95–102. Cited by: §2.
  • [3] J. Bruna, C. Szegedy, I. Sutskever, I. Goodfellow, W. Zaremba, R. Fergus, and D. Erhan (2013) Intriguing properties of neural networks. Cited by: §2, §3.
  • [4] N. Carlini and D. Wagner (2017) Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy (SP), pp. 39–57. Cited by: §1.
  • [5] Y. Dong, F. Liao, T. Pang, H. Su, J. Zhu, X. Hu, and J. Li (2018) Boosting adversarial attacks with momentum. In

    Proceedings of the IEEE conference on computer vision and pattern recognition

    pp. 9185–9193. Cited by: §2.
  • [6] K. Grosse, N. Papernot, P. Manoharan, M. Backes, and P. McDaniel (2017) Adversarial examples for malware detection. In European Symposium on Research in Computer Security, pp. 62–79. Cited by: §5.
  • [7] W. Hu and Y. Tan (2018) Black-box attacks against rnn based malware detection algorithms. In Workshops at the Thirty-Second AAAI Conference on Artificial Intelligence, Cited by: §5.
  • [8] U. Iqbal, Z. Shafiq, and Z. Qian (2017) The ad wars: retrospective measurement and analysis of anti-adblock filter lists. In Proceedings of the 2017 Internet Measurement Conference, pp. 171–183. Cited by: §1.
  • [9] U. Iqbal, P. Snyder, S. Zhu, B. Livshits, Z. Qian, and Z. Shafiq (2020) Adgraph: a graph-based approach to ad and tracker blocking. In Proc. of IEEE Symposium on Security and Privacy, Cited by: A4: Evading Learning-based Adblockers, §1, §1, §2, §3, §4, §4, §4, §5.
  • [10] U. Iqbal (2019-06) AdGraph: a graph-based approach to ad and tracker blocking. Zenodo. External Links: Link Cited by: §3.
  • [11] A. Kurakin, I. Goodfellow, and S. Bengio (2016) Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236. Cited by: 1st item, §2.
  • [12] M. H. Mughees, Z. Qian, and Z. Shafiq (2017) Detecting anti ad-blockers in the wild. Proceedings on Privacy Enhancing Technologies 2017 (3), pp. 130–146. Cited by: §1.
  • [13] PageFair (2017) 2017 global adblock report. pagefair. Note: https://pagefair.com/downloads/2017/01/PageFair-2017-Adblock-Report.pdf Cited by: §1.
  • [14] N. Papernot, P. McDaniel, and I. Goodfellow (2016) Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. arXiv preprint arXiv:1605.07277. Cited by: §3, §4.
  • [15] A. Sjosten, P. Snyder, A. Pastor, P. Papadopoulos, and B. Livshits (2019) Filter list generation for underserved regions. arXiv preprint arXiv:1910.07303. Cited by: footnote 1.
  • [16] G. Storey, D. Reisman, J. Mayer, and A. Narayanan (2017) The future of ad blocking: an analytical framework and new techniques. arXiv preprint arXiv:1705.08568. Cited by: §1, §1.
  • [17] P. Tigas, S. T. King, B. Livshits, et al. (2019)

    Percival: making in-browser perceptual ad blocking practical with deep learning

    arXiv preprint arXiv:1905.07444. Cited by: §1, §1.
  • [18] F. Tramèr, P. Dupré, G. Rusak, G. Pellegrino, and D. Boneh (2018) Ad-versarial: defeating perceptual ad-blocking. arXiv preprint arXiv:1811.03194. Cited by: §1, §1.
  • [19] Z. Yuan, Y. Lu, Z. Wang, and Y. Xue (2014) Droid-sec: deep learning in android malware detection. In ACM SIGCOMM Computer Communication Review, Vol. 44, pp. 371–372. Cited by: §5.
  • [20] S. Zhu, X. Hu, Z. Qian, Z. Shafiq, and H. Yin (2018) Measuring and disrupting anti-adblockers using differential execution analysis. In The Network and Distributed System Security Symposium (NDSS), Cited by: §1, §5.