You Must Have Clicked on this Ad by Mistake! Data-Driven Identification of Accidental Clicks on Mobile Ads with Applications to Advertiser Cost Discounting and Click-Through Ra

04/03/2018 ∙ by Gabriele Tolomei, et al. ∙ Amazon Association for Computing Machinery Oath Inc. 0

In the cost per click (CPC) pricing model, an advertiser pays an ad network only when a user clicks on an ad; in turn, the ad network gives a share of that revenue to the publisher where the ad was impressed. Still, advertisers may be unsatisfied with ad networks charging them for "valueless" clicks, or so-called accidental clicks. [...] Charging advertisers for such clicks is detrimental in the long term as the advertiser may decide to run their campaigns on other ad networks. In addition, machine-learned click models trained to predict which ad will bring the highest revenue may overestimate an ad click-through rate, and as a consequence negatively impacting revenue for both the ad network and the publisher. In this work, we propose a data-driven method to detect accidental clicks from the perspective of the ad network. We collect observations of time spent by users on a large set of ad landing pages - i.e., dwell time. We notice that the majority of per-ad distributions of dwell time fit to a mixture of distributions, where each component may correspond to a particular type of clicks, the first one being accidental. We then estimate dwell time thresholds of accidental clicks from that component. Using our method to identify accidental clicks, we then propose a technique that smoothly discounts the advertiser's cost of accidental clicks at billing time. Experiments conducted on a large dataset of ads served on Yahoo mobile apps confirm that our thresholds are stable over time, and revenue loss in the short term is marginal. We also compare the performance of an existing machine-learned click model trained on all ad clicks with that of the same model trained only on non-accidental clicks. There, we observe an increase in both ad click-through rate (+3.9 using the latter. [...]

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 8

page 9

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The time a user spends on a web page, referred as dwell time, varies considerably by user and type of content. Still, for a given web page, dwell time can be used as an effective proxy of engagement with its content BarbieriWWW2016 ; Goldman2014GSP ; LalmasKDD2015 ; YiRecSys2014 . If the content is of no interest to the user or is presented poorly, the dwell time on the web page will often be short. By contrast, for a user engaged with the content the dwell time will be longer. A third case, which does not depend on the content or its presentation, corresponds to extremely short dwell time. This paper focuses on these extremely short dwell times.

In online advertising, when a user clicks on an advertisement, or ad for short, she is redirected to the advertiser web page, i.e., the ad landing page. The dwell time on the ad landing page is measured as the time between the ad click and the user returning to the publisher site where the ad was impressed. There are three types of clicks:

  • accidental click: The user clicks on the ad, likely by mistake, reaches the ad landing page, and immediately bounces back; the user spends no time on the landing page.

  • short click: The user intended to click on the ad but once on the landing page decides to bounce back; the post-click experience is not satisfactory, due to the low quality of the landing page or its low relevance.

  • long click: Once landing on it, the user engages with the ad landing page and spends time on the advertiser site.

Figure 1: Empirical distribution of the logarithm of dwell time observed on an ad landing page.

We plot in Figure 1 the distribution of the natural logarithm of dwell time values observed on an ad landing page. The distribution is not unimodal; a small component can be identified with extremely low dwell time values, around 1.8 seconds, as representative of accidental clicks, whereas the other two components capture the short and long clicks, respectively, demonstrating the existence of the above mentioned three types of clicks. In this paper, we propose a data-driven approach for estimating the dwell time thresholds to identify whether a click is accidental or not.

Properly accounting for accidental clicks is business-critical to online advertising. Consider the widely adopted cost per click (CPC) pricing model, when an advertiser is charged by the ad network only when a user clicks on an ad. Users may accidentally click on an ad, get redirected to the ad landing page and then bounce back without looking at it. This behaviour is even more severe on smartphones, as their limited screen size makes it more prone for users to click on ads by mistake GoogleAdWords2016 ; StewartHCI2012 . It is therefore not unusual for advertisers running CPC campaigns to complain when charged for such clicks, as these are “valueless” to them. Ignoring these complaints can be detrimental in the long term, as it may affect the relationship between the ad network and the advertiser, who might at worst switch to other ad networks to run their campaigns.

A main challenge is for the ad network to accurately identify accidental clicks. Current solutions use hand-coded thresholds of dwell time on landing page to determine whether an ad click is accidental or not; for example, all visits to ad landing pages shorter than 5 seconds may be considered as accidental. With this approach, the threshold is fixed and set arbitrarily, and does not take into account empirically observed dwell time of ad clicks. Plus, while 5 seconds may be a reasonable threshold for ad clicks on a tablet, it may not be the right one for a smartphone. As a first and major contribution of this paper, we propose an unsupervised learning method that estimates thresholds of accidental clicks by fitting observed dwell time data to mixture models, which capture the three types of clicks defined above.

Further, we deploy our method for identifying accidental clicks on two applications. The first one is concerned with a controlled approach to discount accidental clicks when charging advertisers. In principle, accidental clicks could be simply discarded once detected, and advertisers not charged for them. However, this strategy will negatively impact revenue for both the ad network and, consequently, the publisher. Therefore, as a second contribution of this paper, we propose a smooth discounting method based on the proportion of detected accidental clicks. The third contribution of this paper presents how accidental clicks identified by our approach are discarded from an existing machine-learned clicks model used to predict an ad click-through rate (CTR for short). Our intuition is that by removing valueless clicks we can feed machine-learned models with “cleaner” training data, mitigating any possible overestimation of CTR.

The rest of the paper is organised as follows. In Section 2, we motivate why we use dwell time as a proxy of an ad click value, and discuss why it is appropriate to adopt a data-driven solution for discovering accidental clicks. Sections 3 and 4 describe our data-driven approach to detect accidental clicks and experiments carried out with real-world data, respectively. Sections 5 and 6 present two applications where our proposed approach to identify accidental click was deployed. In Section 7, we discuss related work and position our contributions. Finally, Section 8 concludes our work.

2 Accidental clicks

We show first how we estimate the value of an ad click using dwell time, and then motivate using a data-driven approach to identify accident clicks.

2.1 Using dwell time as the value of an ad click

Advertisers may wish to be fully charged only for valuable clicks, i.e., those that lead to conversions,111There is no consensus around what a conversion is; it is up to the advertiser to specify it. whereas all remaining clicks should be proportionally discounted. Implementing this strategy requires an accurate estimate of the conversion rate, i.e.

, the probability of conversion conditioned on a click. However, conversion rate suffers from three major problems. First, it is hard to estimate since conversion data are often not available for a large number of advertisers. Second, those data are not missing at random, as advertisers sharing their conversion data would be a biased sample. Finally, using conversion data to identify valuable clicks may lead to high false positive rates, since clicks not followed by conversions are not necessarily “valueless”. In fact, those clicks represent profitable feedbacks for advertisers.

It is therefore more appropriate to focus on identifying valuable clicks as clicks that are not valueless, thus using so-called accidental clicks as complementary of conversions. We propose to use dwell time as our proxy measure of the value of an ad click. Using dwell time can help alleviate the aforementioned problems provided that dwell time is indeed a good proxy for conversion in our data. This was shown to be indeed the case in Goldman2014GSP , and we further validate this in our context.

Conversion Mean Std. Err. Std. Dev.
yes 5.729 .064 2.582
no 3.264 .006 2.569
Table 1: Statistics on the natural logarithm of dwell time.

Table 1 shows statistics on the natural logarithm of dwell time from two samples, one containing observations of dwell time leading to conversions (yes) and the other made up of those that do not (no) for 40 ads managed by a large ad network, Yahoo Gemini222https://gemini.yahoo.com/

. We test against the null hypothesis that the mean of dwell time computed from the

yes-sample ( seconds) is the same as that calculated from the no-sample ( seconds). We run a two-tailed two-sample t-test and are able to reject the null hypothesis stating that the difference of those two means is 0 (level of significance , -value ). Therefore, the average dwell time of the two samples are indeed statistically significantly different.

In addition, we test two regression models to verify if dwell time is a good predictor of conversion. Let

be the binary random variable representing the conversion event occurring after the

-th click on the -th ad. We also denote by the logarithm of dwell time on the -th ad landing page after the

-th click. The simplest model we test is a linear regression of

on , that is:

(1)

where

is a zero-mean Gaussian error term. A variant of this model simply applies the logit operator to the right-hand side of Equation 

1, namely:

(2)
linear (Equation 1) logit (Equation 2)
(intercept) -0.00384 -6.424
(dwell time) 0.00413 0.301*
AIC 14,399 14,297
Table 2: Coefficients of linear and logit regression model.

Table 2 shows the regression coefficients obtained with the two models above. The best performing model is the logit, which is selected as the one with the smallest Akaike Information Criterion (AIC) Aikake1973AIC . We observe that the coefficient associated with the natural logarithm of dwell time () is both positive and significant, hence is a good predictor of conversion. It is worth noting that a unit increase in natural logarithm of dwell time (i.e., seconds, since 1 unit = , where

is dwell time in seconds) will increase the odds of conversion by 30%. This provides further justification in using dwell time as proxy of conversion, and therefore of ad click value.

2.2 A data-driven approach to identifying accidental clicks

Using dwell time as proxy of the “value” of an ad click, this paper puts forward a data-driven approach – based on actual dwell time observations – to identify whether an ad click is accidental or not. On the other hand, using fixed thresholds on dwell time to identify accidental clicks – say 1 second – is instead a common practice. The reason being is that this is straightforward, as no data processing nor computation is required, and can be easily integrated with any existing production systems. This simple approach however prevents capturing subtleties arising from hidden latent factors, such as the device (desktop vs. mobile), the application (e.g., mail vs. news stream) or the network type (e.g., 3G/4G vs. Wi-Fi). A data-driven approach can account for all these factors, thus providing more reliable estimates of dwell time thresholds of accidental clicks from observed data. Moreover, the analysis of observed dwell times may characterise other phenomena; e.g., not only the lowest values (accidental clicks) but also large and extremely large ones (short and long clicks, respectively). We leave this for future work, as we focus on accidental clicks in this paper.

3 Discover accidental clicks

Our data-driven approach for detecting accidental clicks on ads consists of two steps: data modelling and dwell time thresholding

. The former fits observations of dwell time of a large set of ad landing pages to a probabilistic model, whereas the latter estimates threshold of dwell time to identify accidental clicks. We employ an unsupervised learning approach, as no supervised learning approaches to classify ad clicks as accidental can be designed as this would require building a ground truth, which is not achievable.

3.1 Data modelling

We assume that observations of dwell time of ad clicks are generated by an underlying probabilistic model. Ideally, such a model has to simultaneously represent the three types of clicks shown in Figure 1, accidental, short, and long.

Generally speaking, a mixture of distributions is a probabilistic model that captures the presence of “sub-populations” within an overall population Lindsay1995 ; McLachlan2000

. As such, it is a good candidate to describe our observations of dwell time. More formally, a continuous random variable

(e.g., dwell time) is distributed according to a mixture of

(discrete) component distributions if its probability density function (pdf)

is a convex combination of pdfs :

(3)

where:

  • each belongs to the same (parametric) family of distributions (e.g., Normal,333We refer to Normal and Gaussian distribution, interchangeably. Log-Normal, Gamma, Weibull, etc.);

  • is the mixture weight

    (or prior probability) associated with the

    -th component;

  • and is the

    -dimensional vector of mixture weights, so that

    ;

  • is the vector of parameters associated with the -th component, e.g., if

    is the pdf of a Normal distribution

    then ;

  • is the overall vector of parameters of the mixture model;

  • there exists a latent random variable denoted by governing which component each observation of is drawn from. This random variable is distributed according to a categorical distribution whose parameter is the vector of mixture weights , so that:

    1. pick the component distribution with probability ;

    2. generate a value for from the component distribution .

We describe next how to model dwell time on ad landing pages using a mixture of distributions.

3.1.1 Mixture of distributions for dwell time

Representing the three types of clicks can be done by having the observed dwell times on ad landing pages generated by a mixture of distributions. Let be the number of ads. For each ad we consider a sample of i.i.d. positive random variables , with each representing an observation of the dwell time associated with the -th click on the ad : . Each is drawn from a mixture of up to components, so that the pdf of is:

(4)

In addition, each

is the pdf of the same parametric distribution, although we only consider probability distributions with positive domain (

e.g., Log-Normal, Gamma, Weibull, etc.) as dwell time cannot be negative.

Next, we discuss how the parameters of the mixture model can be estimated from the observed data.

3.1.2 Parameter estimation

For each ad, we estimate the overall vector of parameters from the observed dwell times using maximum likelihood estimation (MLE). For each ad we know that the pdf of each of its observations is a mixture of three distributions, defined as in Equation 4. Since all observations in are independent and identically distributed, we compute their joint probability density as:

(5)

From the joint probability density we derive the likelihood function , as . Although the two functions are the same, the likelihood function emphasises that the dataset is fixed and the parameters are variable. The aim of MLE is thus to find a value of i.e., an estimate – maximising the likelihood function:444In practice, we often seek for so as to maximise the log-likelihood function , since this is equivalent (the natural logarithm is monotonically increasing) but simpler because products change into summations.

(6)

Different likelihood functions to be maximised can be obtained depending on which pdf we fill in Equation 6 with. However, if the resulting is differentiable in , we can find as a solution of the system of equations:

(7)

where is the gradient of the likelihood function, i.e., the vector of partial derivatives of the likelihood function with respect to each parameter in .

Unfortunately, a maximum likelihood estimation of the parameters is not straightforward, since often there are no closed-form solutions to Equation 7 available; as such, we cannot solve directly  SchlattmannMAFMM2009 . A typical solution is to use the expectation-maximization (EM) algorithm BishopPRML2006 . EM is an iterative

, numerical approximation procedure that starts with an initial random guess for the values of the parameters and converges to a local maximum (or to a saddle point) of the observed-data likelihood. Although EM does not guarantee convergence to a global maximum, in practice there are a variety of heuristic approaches for escaping a local maximum: multiple restarts, clever initialization, and modifications to the EM algorithm itself 

ElkanMM2010 . Finally, for each ad we compute the set of model parameters which maximise the likelihood function.

3.1.3 Model selection

To choose the “best” model for each ad, we cannot just select the one with parameters , i.e., the one that best fits to the observed data. In general the more complex (flexible) is the model the better will be its goodness-of-fit to the observed data; in other words, the higher will be its likelihood function computed with respect to the observed data. At the same time, the more complex (flexible) is the model the less it generalises to unseen data; in other words, the higher is the chance of the model to overfit the observed data JamesISL2014

. Therefore, if we choose the model having the highest likelihood we always end up selecting the one having the maximum degree of freedom,

i.e., the maximum number of components .

Therefore, to avoid overfitting and find a trade-off between complexity and interpretability555This is also often referred to as the

bias-variance

trade-off. we use tools such as the Akaike Information Criterion (AIC) Aikake1973AIC or Bayesian Information Criterion (BIC) Schwarz1978BIC . The former is computed as , whereas the latter as , where is the number of components of the model, is the likelihood function as maximised by the parameters of the model estimated from the observed data, and is the dataset size. Both criteria try to penalize models that are unnecessarily too complex, and finally select the one with the smallest AIC or BIC.

So far, we have estimated and selected the mixture model that best describes the observed dwell times on each ad landing page. Next, we present how we can use this model to compute dwell time thresholds to identify accidental clicks.

3.2 Dwell time threshold of accidental clicks

For each ad, we fit the observed dwell times on its landing page to a mixture of distributions using MLE and one of the model selection criteria described above. We then focus on the subset of ads exhibiting exactly all three components, namely ads with dwell times fitting a mixture of three distributions. Intuitively, these are the ads showing all the three categories of clicks we have conjectured the existence of, namely, accidental, short and long.

Given an ad and the set of parameters of all its components, we compute statistics such as the expected value or the median of every component. As we are interested in detecting accidental clicks we only focus on the first component of each ad. Using the second and third component to study short and long clicks, respectively, is something we leave for future work.

For example, if we fit the data to a mixture of three Log-Normal distributions we can represent the first component by a random variable distributed as a Log-Normal with parameters

. In general, for any random variable , or equivalently , we can derive the following:

  • (where denotes the expected value);

  • .

We therefore estimate a per-ad threshold of dwell time for detecting accidental clicks using either the expected value or the median of the first component – the latter being more robust to the presence of outliers – by letting

and in the equations above. Finally, to obtain an overall estimate of dwell time threshold of accidental clicks across all the ads, we compute the mean or the median of the individual per-ad estimates.

In this section, we described a two-stage approach for computing thresholds of dwell time from observed data, to detect accidental clicks on ads. Next, we present experimental results when these thresholds are deployed within a large ad network.

4 Experiments

We conduct two experiments on multiple datasets of ads served by a large ad network, codenamed Gemini, on several Yahoo mobile apps. We focus on mobile apps as these are where accidental clicks are more likely to happen GoogleAdWords2016 ; StewartHCI2012 . We only consider ads with at least 100 clicks to increase the confidence of the estimates of our thresholds.

In the first experiment, we choose one pivoting mobile app to estimate the dwell time threshold of accidental clicks, which is then used to identify accidental clicks on other (two in our case) mobile apps. To protect sensitive information, we refer to the former as App 1 and to the other two as App 2 and App 3, respectively. The experiment is performed on two one-month datasets, which we refer to and , respectively. Each consists of a random sample of around 10,000 ads and 6.5M clicks, unevenly distributed on the three mobile apps. In the second experiment, instead of having one pivoting mobile app used to estimate a single dwell time threshold of accidental clicks, we generate a threshold for each mobile app. The dataset used is a random sample from three weeks worth of data containing 120,000 ads and 70M clicks, hereinafter called .

With the first approach – using a pivoting app – the aim is provide a single estimate of dwell time threshold of accidental clicks, using the app with the highest volume of traffic. The second approach provides one threshold per app, which is more flexible and accounts for the effect of different user experience, population, and operating systems (i.e., Android vs. iOS).

4.1 Data preprocessing

In both experiments, we remove outliers by discarding clicks with dwell time greater than 600 seconds.666A threshold already used in previous work LalmasKDD2015 .

We also apply a logarithmic transformation to all the observations. This allows us to fit the log-transformed data to a mixture of Gaussian distributions, as this is the same as fitting the original data to a mixture of Log-Normals.

The logarithmic transformation emphasises differences between small values of dwell time, whereas it smooths the same differences when they happen between larger values of dwell time. In this way, we are able to capture relative differences instead of absolute ones. To identify accidental clicks a difference of 1 second between two small values of dwell time, such as 2 and 3 seconds, is more important than the same difference between, say, 101 and 100 seconds.

Our approach is not tailored to any specific family of parametric distributions, and we select a mixture of Log-Normal distributions purely because this gives us the best (smallest) AIC on average over all the ads.

4.2 Single threshold from a pivoting app

In this first setting, we consider (log-transformed) observations of dwell time from the ads clicked on our pivoting mobile app – App 1 – collected both from and . Then, we fit each set of ad dwell time observations to a mixture of Gaussian distributions. Figure 2 presents three examples of ads, each one fitted to mixture of one, two, and three Gaussian distributions. Except for the ad shown in Figure 1(a), the other two exhibit a first component centered around very small value of dwell time (around seconds), which likely represents accidental clicks.

(a) One component
(b) Two components
(c) Three components
Figure 2: Examples of ads clicked on the pivoting App 1 which fit to one, two, and three components.

Table 3 shows how ads in the two datasets fit to models having one, two, and three components, respectively. For both, the vast majority of ads (82.5% and 65.4%, respectively) fit to exactly three components. These ads are the ones we focus on to detect the dwell time threshold of accidental clicks, since their corresponding landing pages contain all the three categories of clicks described in Section 1. Since our aim is to “isolate” accidental clicks happening on our pivoting app, we concentrate on the first component. According to our conjecture, this would capture users clicking on an ad by mistake, or simply returning to the publisher site without actually landing on the advertiser’s page.

1 comp 2 comps 3 comps
1.0% 16.5% 82.5%
2.9% 31.7% 65.4%
Table 3: Percentage of ads clicked on the pivoting App 1 which fit to one, two, and three components.

For each ad we compute an estimate of the dwell time threshold using the median of its first fitted component, as this is more robust to the variance. In fact, we observe that the variance increases going from the first to the second and finally to the third component. Intuitively, this reflects the variability of dwell time on each click category: dwell times of accidental clicks are expected to differ less between each other than what would be the case with short and long clicks.

To obtain an overall estimate of the threshold (an estimate derived from all the per-ad estimates), we propose two strategies: (i) the mean of all the per-ad medians; (ii) the median of all the per-ad medians. The first one results in a generally higher threshold, which implies considering “accidental” a larger number of clicks. The second estimate is more “conservative” and usually generates a smaller value of the threshold.

In Figure 3, we plot the distribution of per-ad medians of the first component computed from the pivoting App 1 on and . The median of all those medians (the blue dashed line) seems more suitable than the mean (the red dashed line), as it perfectly aligns with the “peak” we are interested in, which sits around seconds. For business confidentiality, we do not disclose the percentage of accidental clicks for each of the considered apps, but we can report that this percentage is stable over the two datasets, once the thresholding strategy is fixed. Anecdotally, this percentage was shown to change using a dataset from a different time period, as the result of a change of the user interface on a specific app.

(a)
(b)
Figure 3: Distributions of per-ad medians computed from the pivoting App 1 (median vs. mean).

4.3 Multiple per-app thresholds

In the previous setting, we reported results obtained when using a pivoting mobile app to compute a single threshold of dwell time, which in turn can be used to identify accidental clicks for other apps. In this section, we discuss another approach, which was rolled out in production. Instead of computing a single threshold from one pivoting app, we generate a dwell time threshold of accidental clicks for each app, individually. By doing so, we are also accounting for the effect of different user experiences or user populations on the different mobile apps and operating systems (i.e., Android vs. iOS). We use a default threshold value for apps with not enough observations of dwell time.

(a) First week
(b) Second week
(c) Third week
Figure 4: Empirical CDF of thresholds of accidental ad clicks on mobile apps computed on .

In Figure 4

, we plot the empirical cumulative distribution function (eCDF) of the thresholds computed on each week of the dataset (

). We observe that the distribution of thresholds remains stable over time. The median of all the per-app thresholds identified with this approach results in a value of 2.1 seconds, which aligns with the threshold generated using the pivoting strategy.

In Figure 5, we separate the eCDFs of the thresholds obtained from apps on Android and iOS. We see that there are differences, as the median values are now and

seconds, respectively. In general, thresholds of accidental ad clicks on Android apps are more right-skewed than those computed for iOS apps, thereby suggesting that thresholds on Android are somewhat less dependent from the app which they are computed on.

(a) Mobile Android Apps
(b) Mobile iOS Apps
Figure 5: Empirical CDF of thresholds of accidental ad clicks computed on across different platforms.

We presented two strategies to calculate dwell time thresholds of accidental clicks. Both strategies rely on the same data-driven approach, i.e., estimating the parameters of a mixture of three dwell time distributions and computing aggregated statistics (e.g., the median) on the first component. The two strategies differ in the (historical) dwell time data used to fit the mixtures. One uses observations of dwell time only from a single app, and hence provides the threshold for apps with few historical dwell time observations, who may be less popular than the pivoting app or have just entered the market. The other provides a per-app threshold, which can be used when there are multiple apps with a sufficiently large number of dwell time observations. In addition, this strategy considers the impact of different user experience on those apps.

In the next two sections, we present two use cases where our proposed data-driven approach for identifying accidental clicks was deployed.

5 Use case I: Discounting accidental clicks when billing advertisers

With a mechanism for detecting accidental clicks, an ad network may simply discard all accidental clicks when billing the advertiser. This can however severely impact revenue for both the ad network and the publisher, at least in the short term. For example, in our dataset , we saw that the top-3 most revenue-losing apps account for of the overall revenue loss for all the apps under consideration.777The actual revenue loss is not shown due to business confidentiality. It is therefore important to control how much revenue loss is acceptable, hence looking for a trade-off between accounting for accidental clicks (satisfying the advertisers) and containing revenue loss (satisfying the ad network and publishers). We present a smooth method to discount the price of accidental clicks, instead of discarding them, so that advertisers are not fully charged for those clicks.

5.1 Smooth discounting strategy

One of the main attractions of ad networks is scale; advertisers have access to a large number of impressions and reach a wide audience with a single buy. However, not all the apps in the network perform equally. The advertiser is then faced with the problem of either selecting which apps to bid on or adjusting the bids by app. Both cases create extra friction for the advertiser. The algorithmic discounting we present below addresses this problem by adjusting the cost per click on each app of the network, such that the return on investment (ROI) for the advertiser is the same across all the apps.

We assume the existence of a pivoting app from which we estimate the dwell time threshold of accidental clicks , such as discussed in Section 4.2. Let be the set of clicks observed on ad , which have been impressed on the pivoting app on a fixed time window. Moreover, let be the total number of observed clicks on . Therefore, – where is the dwell time of – is the set of non-accidental clicks on identified using on the same time window, and is the total number of non-accidental clicks on . Similarly, for any other app we define and .

An advertiser may associate a value to each click on ad , referred to as . This corresponds to the amount of money the advertiser would like to earn from a click on , independently of the source (app, in this case) where such click occurs. Under a CPC cost model, there is a maximum amount of money the advertiser is willing to pay for having ad impressed and clicked, denoted by . We define the advertiser ROI for ad on the generic app as:

(8)

where the numerator is the total value earned by the advertiser considering only valid – non-accidental – clicks on app, and the denominator is the total cost the advertiser would pay for all the clicks on ad occurred on app. If we knew what is the true non-accidental click rate of the app (), we could rewrite Equation 8 as:

(9)

Indeed, is the MLE estimate of the true . We require the app we chose as pivot not only to be the one from which we can accurately estimate but also the best performing app with the highest , i.e., the highest proportion of valid ad clicks overall. This is because we want the pivoting app to be the “benchmark” against which we compare all the other apps of the network.

The advertiser ROI calculated on any app of the network should be ideally equal to that of the pivoting app:

(10)

Moreover, since the value that the advertiser would get from a click on ad () is independent of the source, we can rewrite Equation 10 as:

(11)

We may observe that , since is the highest among all the apps by design. For Equation 11 to be satisfied, we define as the adjusted cost of each accidental click on ad , specific to app:

(12)

The intuition is that the cost of an accidental click on on a generic app () should be obtained by discounting the price for a valid click () proportionally to its relative value with respect to the best performing pivoting app of the network , which is exactly our discount factor.

This strategy does not discount accidental ad clicks on the pivoting app itself. This is because the pivoting app is chosen as the one with the smallest accidental click rate, thus likely with little need to apply a discount factor. Nonetheless, we may also decide to account for accidental clicks on the pivoting app, especially if its click value performance deteriorates. Various strategies may be deployed. For example, we can monitor the number of accidental clicks on the pivoting app and if it eventually exceeds some established threshold, we can apply to those accidental clicks a default discounting strategy by choosing among one of the discount factors computed for the other apps.

5.2 Estimating non-accidental click rate

To implement our proposed discounting strategy, we must accurately estimate the (binomial) proportion of non-accidental clicks of an app. The most straightforward way is to use maximum likelihood estimate, namely a single-point estimate , which is the overall number of non-accidental ad clicks divided by the total number of ad clicks observed during a specific time window:

. This estimate however is not robust when we have an app with a low number of observations. To overcome this, we compute the confidence interval for

.

There exist several ways to compute a confidence interval for an estimate of a binomial proportion .888In this setting, and .

The normal approximation interval is the simplest and most common approach, and assumes the distribution of error of a binomially-distributed observation to be Gaussian. This is computed as

, where is the proportion of successes in a Bernoulli trial process estimated from the statistical sample, is the percentile of a standard Normal distribution, is the error percentile and is the sample size.

The normal approximation however does not always work. Several competing formulas are available that perform better, especially for situations with a small sample size and a proportion very close either to 0 or 1. The choice will depend on how important it is to use a simple and easy-to-explain interval versus the desire for better accuracy. As such, the Agresti-Coull interval BrownSS2001 is another approximate binomial confidence interval, which is more robust than the normal approximation interval. Given successes in Bernoulli trials, it defines the following quantities: and . Then, a confidence interval for is given by: , where is the percentile of a standard Normal distribution, as before.

5.3 Comparing non-accidental click rates

The proposed discounting strategy requires computing the ratio of non-accidental click rates between the app of interest and the pivoting app. If the estimate of is not a single-point estimate such as but is a confidence interval, one way to compare two estimates is to take the ratio of their upper confidence bounds:

(13)

where ucb is the upper confidence bound computed by the Agresti-Coull interval defined at the end of previous section.

We already stated that the pivoting app is assumed to be the one with the highest . However, for our discounting strategy to be robust it should also account for the case when an app is overperforming in terms of “click value” the pivoting app999This may happen if the same pivoting app has been running for long and a new, better performing app slightly overtakes it.. In such a case, we would like the discount factor to be greater than 1 only when we have a degree of confidence in it. One way to implement this is to require the overperforming app’s lower confidence bound (lcb) being greater than the pivot’s. We can therefore modify Equation 13 to:

(14)

We make the following observations for Equation 14. First, the ratio of non-accidental click rate will be greater than 1 only if the lower confidence bound of the app is greater than the upper confidence bound of the pivot. Second, in case of a large sample with non-zero valid clicks the ratio of non-accidental click rate will converge to the ratio of single-point estimate (MLE). Third, in case of a small sample size the ratio of non-accidental click rate will be close to 1 indicating that we do not have enough data to suggest that the app of interest is any different from the pivoting app. Fourth, there is still a minimum number of clicks needed for Equation 14 to produce reliable results.

When we have enough confidence that the ratio of non-accidental click rates between an app and the current pivot is greater than 1, we can update the pivot with that app and compute the discount factors using the new app as the new benchmark.

Next, we discuss through an example the impact on revenue of this smooth discounting strategy once implemented.

5.4 The impact of smooth discount factors

We compute the discount factors for accidental ad clicks on the two datasets and , described in Section 4.1. We consider all the ads impressed and clicked on all three apps, App 1, App 2, and App 3. To increase the confidence in our estimates, we discard ads with less than 40 clicks on each app. We select the pivoting app as the one with the highest , estimated either via MLE or with the Agresti-Coull estimator. In both cases, App 1 is chosen.

Discount Factors
MLE Agresti-Coull
App 2 0.72 0.79
App 3 0.64 0.73
App 2 0.66 0.75
App 3 NA (not enough obs.) NA (not enough obs.)
Table 4: Discount factors computed using , and MLE and Agresti-Coull estimates of .

Table 4 shows the discount factors computed using as the dwell time thresholds of accidental clicks ( seconds on ; seconds on ), and two different ratio of estimates of , one obtained with MLE and the other using Agresti-Coull in combination with Equation 14. Each row shows how much an advertiser should be charged for one accidental click on an ad shown on the app indicated by that row depending on the estimator used, providing that a valid (non-accidental) click on the same ad would cost 1. For example, looking at if an ad click on App 2 originally costs to the advertiser, any accidental ad click on the same app will instead cost after discounting using the Agresti-Coull estimate of from Equation 14.

We observe that the discount factors are comparable across the two datasets when computed using the same strategy. Moreover, larger discounting happens when generated from a single-point MLE of non-accidental click rate; the discounting is smaller when computed using the Agresti-Coull confidence interval. Depending on how aggressive the discounting has to be, one or the other approach may be chosen.

At the beginning of this section, we discussed how with dataset the potential revenue loss that would result from fully discarding accidental clicks when billing advertisers was too high. Now using Equation 12, the discount factor defined in Equation 14, and using App 1 as pivoting app, the revenue drop is reduced by about 73.1%; allowing for advertisers to save money on likely less valuable clicks, while controlling for revenue impact for the ad network and publishers.

6 Use case II: Filtering out accidental clicks when training an ad click model

Many ad networks improve the logic behind their ad serving algorithm through machine-learned models. At each ad request, these models provide a ranked list of ads to serve to maximise the overall expected revenue, eCPM.101010CPM stands for cost per mille (impressions) and indicates the earnings gained every thousand ad impressions sold. The eCPM is an estimate of the truly observed CPM, computed for each ad as . Each (or bid) is the price an advertiser is willing to pay for buying an impression, whereas is the estimate of the click-through rate of ad (). Estimating means estimating for all ads in the ad network inventory. Machine-learned models achieve exactly this task, i.e., they are trained on historical datasets of ad clicks to predict CTR from a feature-vector representations of serviceable ads.

Training models on datasets containing a “large enough” ratio of accidental clicks may overestimate an ad CTR. This is because the estimated CTR becomes “inflated” with accidental clicks, eventually leading to the selection of irrelevant ads to serve. Filtering out accidental clicks when training machine-learned models may provide a more accurate selection of ads, resulting in higher revenue for the ad network and the publisher.

Using our data-driven approach to identify accidental clicks, we compute a threshold for a large number of Yahoo mobile apps (Yahoo News, Yahoo Mail, etc). We compute per-app thresholds of accidental click, as we have a sufficiently large number of dwell time observations for each app.

For each app, we then filter out clicks that are below the corresponding threshold for that app. The filtered clicks are not used to train the ad click model. We refer to this model as accidental_click. The model where no filtering of accidental clicks is our baseline mode, denoted as baseline. We setup an online A/B testing experiment, where a fraction of Yahoo Gemini incoming ad traffic is split between a control bucket and a variation bucket. More specifically, the A/B test affects about of the overall ad serving traffic on the Yahoo apps considered. The traffic served by the control bucket is handled by the baseline ad click model, whilst variation bucket dispatches traffic to the accidental_click model. We thus compare the performance of the two models by measuring both CTR and CPM (click-through rate and revenue).

A significant lift is obtained using the accidental_click model of compared to the baseline model (significance and -value using one-tailed two-proportion z-test). This means that an ad click model trained on a cleaner dataset (i.e., without accidental clicks) leads to a better estimation of ad CTR. This trained click model is better at predicting ads that are more likely to be clicked, i.e., more relevant to users, as it is relies on ads that were clicked by users with an intent. Similarly, we observe a statistically significant lift in CPM of (level of significance and -value ), which is partly due to the lift in CTR.

In this section, we showed that our data-driven approach for identifying accidental clicks is effective as a preprocessing step to training machine-learned ad click models, which ad networks leverage to estimate CTR to rank ads at serving time.

7 Related work

Various works investigated the role of dwell time on web pages. Liu et al. LiuSIGIR2010 , who modelled dwell time on web pages using a Weibull distribution Papoulis2002 , found that web browsing exhibits a significant “negative aging” phenomenon, suggesting that some initial screening has to be passed before a page is examined in detail by a user. They also demonstrated that dwell time distributions can be predicted purely using low-level web page features. We extend this work – focussing on ad (mobile) web pages – by proposing a model of dwell time based on a mixture of distributions instead of Weibull. This allows us to capture three categories of ad clicks, accidental, short, and long, where the focus of this paper is on accidental clicks.

Kim et al. KimWSDM2014 presented a method to explain dwell time on search engine result pages. They estimate dwell time distributions for SAT (satisfied) or DSAT (dissatisfied) clicks for different click segments and use them to derive features to train a click-level satisfaction model. Yi et al. YiRecSys2014 use item-level dwell time as a proxy to quantify how likely a content item is relevant to a particular user in a recommender system. Furthermore, Yin et al. YinKDD2013 show how to enrich the user-vote matrix by converting dwell time on items into users’ “pseudo votes” and then improve recommendation performance. All these works show that considering dwell time leads to improved decision-making tasks. In our work, our task is the identification of accidental clicks.

In the context of sponsored search e.g., BeckerCIKM2009 ; GrbovicSIGIR2016 ; RaghavanSIGIR2009 ; SculleyKDD2009 ; SodomkaWWW2013 and display advertising e.g., AzimiCIKM2012 ; BarajasCIKM2012 ; KaeLDMTA2011 ; RosalesWSDM2012 , studies have mostly focused on predicting ad performance in terms of CTR. An ad with a high ad CTR is considered to perform well, since it indicates that the ad attract users, who click on it. CTR, however, does not account for the post-click experience, that is how users experience the ad landing page. Dwell time on ad landing page has proven to be a good proxy of the post-click experience BarbieriWWW2016 ; LalmasKDD2015 , reflecting the assumption that the longer the time user spends on the ad landing page the higher the chance he or she “converts” (e.g., by purchasing an item, registering to a mailing list, etc.), or simply the more likely is for the user to build an affinity with the brand BeckerCIKM2009 ; RosalesWSDM2012 . These are cases of clicks that bring values to advertisers. Our work also looks at how dwell time relates to conversion rate, confirming what was shown in Goldman2014GSP that the former is a significant predictor of the latter, and hence a good proxy for measuring the “value” of a click. We use dwell time as our proxy of the ad click value, and proposed a data-driven methodology to identify accidental clicks, i.e., ad clicks with very short dwell time that are not only valueless to advertisers, but also to machine-learning models trained to predict ad CTR.

Other studies have investigated click “value” in the context of online advertising, mostly to contrast with fraudulent activities perpetrated by dishonest advertisers and/or web publishers like click spam DaswaniHotBots2007 ; Stone-GrossIMC2011 . To the best of our knowledge, this is the first work using dwell time to identify accidental clicks, in combination with a data-driven approach that can be applied to other domains, where it is important to quantify the value of a click.

8 Conclusions

In this paper, we propose a data-driven method to identify accidental clicks. An accidental click happens when a user that click on an ad, likely by mistake, is redirected to the ad landing page and bounce back without having seen the page. This type of clicks happens often on ads impressed on mobile devices.

We collect empirical dwell time observations from several Yahoo mobile apps for a large number of ads. We decompose the distribution of dwell time into a mixture of components, with each component corresponding to a click category: accidental, short, and long. Representative statistics for the first component of each ad are then further aggregated to provide an overall estimate of the dwell time threshold of accidental clicks.

We assess the validity of our method when this is applied on two use cases. First, we describe a technique that estimates a smooth discounting factor, so that accidental clicks are not fully charged nor totally discarded at billing time. This allows for a trade-off between advertiser’s satisfaction and potential revenue loss. Experiments conducted on different Yahoo mobile apps confirm that thresholds found are stable over time, and revenue loss can be mitigated by around 73.1% using our discounting strategy compared to not charging at all accidental clicks. Second, we demonstrate that an existing machine-learned ad click model used at serving time leads to better online performance if trained on datasets where accidental clicks are removed using our data-driven approach. We observe a positive and statistically significant lift on CTR and CPM (+3.9% and +0.2%, respectively) with the model trained without accidental clicks.

As future work, we plan to look at the two other components of the mixture models, so as to estimate thresholds – again in a data-driven manner – for short and long clicks. The former are clicks that suggest that the user had a negative post-click experience, whereas the latter is an indication of a positive post-click experience. Being able to do so per-app would remove the often ad hoc setting of dwell time thresholds. In addition to that, we would also like to investigate if the same (or similar) methodology proposed in this work could be used to assess the engagement of users with any section of a web page or a mobile app (i.e., not only with ads shown). Having a rigorous, data-driven methodology to classify content on the basis of the time users spent on it (i.e., dwell time) might be useful to providers, who could in turn make better decision on which of their assets they should invest more.

Acknowledgements.
The authors would like to thank Michal Aharon and Marc Bron for their support in setting up the online A/B test, which allowed them to deploy and assess their approach on a second use case, i.e., the ad click model.

On behalf of all authors, the corresponding author states that there is no conflict of interest.

References

  • (1) H. Akaike. Information theory and an extension of the maximum likelihood principle. In Selected Papers of Hirotugu Akaike, pages 199–213. Springer, 1973.
  • (2) J. Azimi, R. Zhang, Y. Zhou, V. Navalpakkam, J. Mao, and X. Fern. Visual appearance of display ads and its effect on click through rate. In CIKM ’12, pages 495–504, New York, NY, USA, 2012. ACM.
  • (3) J. Barajas, R. Akella, M. Holtan, J. Kwon, A. Flores, and V. Andrei. Dynamic effects of ad impressions on commercial actions in display advertising. In CIKM ’12, pages 1747–1751, New York, NY, USA, 2012. ACM.
  • (4) N. Barbieri, F. Silvestri, and M. Lalmas. Improving post-click user engagement on native ads via survival analysis. In WWW ’16, pages 761–770, New York, NY, USA, 2016. ACM.
  • (5) H. Becker, A. Broder, E. Gabrilovich, V. Josifovski, and B. Pang. What happens after an ad click?: Quantifying the impact of landing pages in web advertising. In CIKM ’09, pages 57–66, New York, NY, USA, 2009. ACM.
  • (6) C. M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2006.
  • (7) L. D. Brown, T. T. Cai, and A. DasGupta. Interval estimation for a binomial proportion. Statistical Science, 16(2):101–117, 2001.
  • (8) N. Daswani and M. Stoppelman. The anatomy of clickbot.a. In HotBots’07, pages 11–11. USENIX Association, 2007.
  • (9) C. Elkan. Mixture Models. http://cseweb.ucsd.edu/~elkan/250Bwinter2011/mixturemodels.pdf, March 2010.
  • (10) M. Goldman and J. M. Rao. Experiments as instruments: Heterogeneous position effects in sponsored search auctions. https://ssrn.com/abstract=2524688, November 2014.
  • (11) M. Grbovic, N. Djuric, V. Radosavljevic, F. Silvestri, R. Baeza-Yates, A. Feng, E. Ordentlich, L. Yang, and G. Owens. Scalable semantic matching of queries to ads in sponsored search advertising. In SIGIR ’16, pages 375–384, New York, NY, USA, 2016. ACM.
  • (12) A. Jacobson. Preventing accidental clicks for a better mobile ads experience. https://adwords.googleblog.com/2016/05/preventing-accidental-clicks-for-better-mobile-ads.html, May 2016.
  • (13) G. James, D. Witten, T. Hastie, and R. Tibshirani. An Introduction to Statistical Learning: With Applications in R. Springer Publishing Company, Incorporated, 2014.
  • (14) A. Kae, K. Kan, V. K. Narayanan, and D. Yankov. Categorization of display ads using image and landing page features. In LDMTA ’11, pages 1:1–1:8, New York, NY, USA, 2011. ACM.
  • (15) Y. Kim, A. Hassan, R. W. White, and I. Zitouni. Modeling dwell time to predict click-level satisfaction. In WSDM ’14, pages 193–202, New York, NY, USA, 2014. ACM.
  • (16) M. Lalmas, J. Lehmann, G. Shaked, F. Silvestri, and G. Tolomei. Promoting positive post-click experience for in-stream yahoo gemini users. In KDD ’15, pages 1929–1938, New York, NY, USA, 2015. ACM.
  • (17) B. G. Lindsay. Mixture Models: Theory, Geometry and Applications. NSF-CBMS Conference booktitle in Probability and Statistics, Penn. State University, 1995.
  • (18) C. Liu, R. W. White, and S. Dumais. Understanding web browsing behaviors through weibull analysis of dwell time. In SIGIR ’10, pages 379–386, New York, NY, USA, 2010. ACM.
  • (19) G. Mclachlan and D. Peel. Finite Mixture Models. Wiley-Interscience, 1 edition, Oct 2000.
  • (20) A. Papoulis and S. U. Pillai. Probability, Random Variables, and Stochastic Processes. McGraw-Hill Higher Education, New York, NY, USA, fourth edition, 2002.
  • (21) H. Raghavan and D. Hillard. A relevance model based filter for improving ad quality. In SIGIR ’09, pages 762–763, New York, NY, USA, 2009. ACM.
  • (22) R. Rosales, H. Cheng, and E. Manavoglu. Post-click conversion modeling and analysis for non-guaranteed delivery display advertising. In WSDM ’12, pages 293–302, New York, NY, USA, 2012. ACM.
  • (23) P. Schlattmann. Medical Applications of Finite Mixture Models, Statistics for Biology and Health. Springer-Verlag Berlin Hiedelberg, 2009.
  • (24) G. Schwarz. Estimating the dimension of a model. Annals of Statistics, 6:461–464, 1978.
  • (25) D. Sculley, R. G. Malkin, S. Basu, and R. J. Bayardo. Predicting bounce rates in sponsored search advertisements. In KDD ’09, pages 1325–1334, New York, NY, USA, 2009. ACM.
  • (26) E. Sodomka, S. Lahaie, and D. Hillard. A predictive model for advertiser value-per-click in sponsored search. In WWW ’13, pages 1179–1190, New York, NY, USA, 2013. ACM.
  • (27) C. Stewart, E. Hoggan, L. Haverinen, H. Salamin, and G. Jacucci. An exploration of inadvertent variations in mobile pressure input. In MobileHCI ’12, pages 35–38, New York, NY, USA, 2012. ACM.
  • (28) B. Stone-Gross, R. Stevens, A. Zarras, R. Kemmerer, C. Kruegel, and G. Vigna. Understanding fraudulent activities in online ad exchanges. In IMC ’11, pages 279–294, New York, NY, USA, 2011. ACM.
  • (29) X. Yi, L. Hong, E. Zhong, N. N. Liu, and S. Rajan. Beyond clicks: Dwell time for personalization. In RecSys ’14, pages 113–120, New York, NY, USA, 2014. ACM.
  • (30) P. Yin, P. Luo, W.-C. Lee, and M. Wang. Silence is also evidence: Interpreting dwell time for recommendation from psychological perspective. In KDD ’13, pages 989–997, New York, NY, USA, 2013. ACM.