1 Introduction
The time a user spends on a web page, referred as dwell time, varies considerably by user and type of content. Still, for a given web page, dwell time can be used as an effective proxy of engagement with its content BarbieriWWW2016 ; Goldman2014GSP ; LalmasKDD2015 ; YiRecSys2014 . If the content is of no interest to the user or is presented poorly, the dwell time on the web page will often be short. By contrast, for a user engaged with the content the dwell time will be longer. A third case, which does not depend on the content or its presentation, corresponds to extremely short dwell time. This paper focuses on these extremely short dwell times.
In online advertising, when a user clicks on an advertisement, or ad for short, she is redirected to the advertiser web page, i.e., the ad landing page. The dwell time on the ad landing page is measured as the time between the ad click and the user returning to the publisher site where the ad was impressed. There are three types of clicks:

accidental click: The user clicks on the ad, likely by mistake, reaches the ad landing page, and immediately bounces back; the user spends no time on the landing page.

short click: The user intended to click on the ad but once on the landing page decides to bounce back; the postclick experience is not satisfactory, due to the low quality of the landing page or its low relevance.

long click: Once landing on it, the user engages with the ad landing page and spends time on the advertiser site.
We plot in Figure 1 the distribution of the natural logarithm of dwell time values observed on an ad landing page. The distribution is not unimodal; a small component can be identified with extremely low dwell time values, around 1.8 seconds, as representative of accidental clicks, whereas the other two components capture the short and long clicks, respectively, demonstrating the existence of the above mentioned three types of clicks. In this paper, we propose a datadriven approach for estimating the dwell time thresholds to identify whether a click is accidental or not.
Properly accounting for accidental clicks is businesscritical to online advertising. Consider the widely adopted cost per click (CPC) pricing model, when an advertiser is charged by the ad network only when a user clicks on an ad. Users may accidentally click on an ad, get redirected to the ad landing page and then bounce back without looking at it. This behaviour is even more severe on smartphones, as their limited screen size makes it more prone for users to click on ads by mistake GoogleAdWords2016 ; StewartHCI2012 . It is therefore not unusual for advertisers running CPC campaigns to complain when charged for such clicks, as these are “valueless” to them. Ignoring these complaints can be detrimental in the long term, as it may affect the relationship between the ad network and the advertiser, who might at worst switch to other ad networks to run their campaigns.
A main challenge is for the ad network to accurately identify accidental clicks. Current solutions use handcoded thresholds of dwell time on landing page to determine whether an ad click is accidental or not; for example, all visits to ad landing pages shorter than 5 seconds may be considered as accidental. With this approach, the threshold is fixed and set arbitrarily, and does not take into account empirically observed dwell time of ad clicks. Plus, while 5 seconds may be a reasonable threshold for ad clicks on a tablet, it may not be the right one for a smartphone. As a first and major contribution of this paper, we propose an unsupervised learning method that estimates thresholds of accidental clicks by fitting observed dwell time data to mixture models, which capture the three types of clicks defined above.
Further, we deploy our method for identifying accidental clicks on two applications. The first one is concerned with a controlled approach to discount accidental clicks when charging advertisers. In principle, accidental clicks could be simply discarded once detected, and advertisers not charged for them. However, this strategy will negatively impact revenue for both the ad network and, consequently, the publisher. Therefore, as a second contribution of this paper, we propose a smooth discounting method based on the proportion of detected accidental clicks. The third contribution of this paper presents how accidental clicks identified by our approach are discarded from an existing machinelearned clicks model used to predict an ad clickthrough rate (CTR for short). Our intuition is that by removing valueless clicks we can feed machinelearned models with “cleaner” training data, mitigating any possible overestimation of CTR.
The rest of the paper is organised as follows. In Section 2, we motivate why we use dwell time as a proxy of an ad click value, and discuss why it is appropriate to adopt a datadriven solution for discovering accidental clicks. Sections 3 and 4 describe our datadriven approach to detect accidental clicks and experiments carried out with realworld data, respectively. Sections 5 and 6 present two applications where our proposed approach to identify accidental click was deployed. In Section 7, we discuss related work and position our contributions. Finally, Section 8 concludes our work.
2 Accidental clicks
We show first how we estimate the value of an ad click using dwell time, and then motivate using a datadriven approach to identify accident clicks.
2.1 Using dwell time as the value of an ad click
Advertisers may wish to be fully charged only for valuable clicks, i.e., those that lead to conversions,^{1}^{1}1There is no consensus around what a conversion is; it is up to the advertiser to specify it. whereas all remaining clicks should be proportionally discounted. Implementing this strategy requires an accurate estimate of the conversion rate, i.e.
, the probability of conversion conditioned on a click. However, conversion rate suffers from three major problems. First, it is hard to estimate since conversion data are often not available for a large number of advertisers. Second, those data are not missing at random, as advertisers sharing their conversion data would be a biased sample. Finally, using conversion data to identify valuable clicks may lead to high false positive rates, since clicks not followed by conversions are not necessarily “valueless”. In fact, those clicks represent profitable feedbacks for advertisers.
It is therefore more appropriate to focus on identifying valuable clicks as clicks that are not valueless, thus using socalled accidental clicks as complementary of conversions. We propose to use dwell time as our proxy measure of the value of an ad click. Using dwell time can help alleviate the aforementioned problems provided that dwell time is indeed a good proxy for conversion in our data. This was shown to be indeed the case in Goldman2014GSP , and we further validate this in our context.
Conversion  Mean  Std. Err.  Std. Dev. 

yes  5.729  .064  2.582 
no  3.264  .006  2.569 
Table 1 shows statistics on the natural logarithm of dwell time from two samples, one containing observations of dwell time leading to conversions (yes) and the other made up of those that do not (no) for 40 ads managed by a large ad network, Yahoo Gemini^{2}^{2}2https://gemini.yahoo.com/
. We test against the null hypothesis that the mean of dwell time computed from the
yessample ( seconds) is the same as that calculated from the nosample ( seconds). We run a twotailed twosample ttest and are able to reject the null hypothesis stating that the difference of those two means is 0 (level of significance , value ). Therefore, the average dwell time of the two samples are indeed statistically significantly different.In addition, we test two regression models to verify if dwell time is a good predictor of conversion. Let
be the binary random variable representing the conversion event occurring after the
th click on the th ad. We also denote by the logarithm of dwell time on the th ad landing page after theth click. The simplest model we test is a linear regression of
on , that is:(1) 
where
is a zeromean Gaussian error term. A variant of this model simply applies the logit operator to the righthand side of Equation
1, namely:(2) 
linear (Equation 1)  logit (Equation 2)  

(intercept)  0.00384  6.424 
(dwell time)  0.00413  0.301* 
AIC  14,399  14,297 
Table 2 shows the regression coefficients obtained with the two models above. The best performing model is the logit, which is selected as the one with the smallest Akaike Information Criterion (AIC) Aikake1973AIC . We observe that the coefficient associated with the natural logarithm of dwell time () is both positive and significant, hence is a good predictor of conversion. It is worth noting that a unit increase in natural logarithm of dwell time (i.e., seconds, since 1 unit = , where
is dwell time in seconds) will increase the odds of conversion by 30%. This provides further justification in using dwell time as proxy of conversion, and therefore of ad click value.
2.2 A datadriven approach to identifying accidental clicks
Using dwell time as proxy of the “value” of an ad click, this paper puts forward a datadriven approach – based on actual dwell time observations – to identify whether an ad click is accidental or not. On the other hand, using fixed thresholds on dwell time to identify accidental clicks – say 1 second – is instead a common practice. The reason being is that this is straightforward, as no data processing nor computation is required, and can be easily integrated with any existing production systems. This simple approach however prevents capturing subtleties arising from hidden latent factors, such as the device (desktop vs. mobile), the application (e.g., mail vs. news stream) or the network type (e.g., 3G/4G vs. WiFi). A datadriven approach can account for all these factors, thus providing more reliable estimates of dwell time thresholds of accidental clicks from observed data. Moreover, the analysis of observed dwell times may characterise other phenomena; e.g., not only the lowest values (accidental clicks) but also large and extremely large ones (short and long clicks, respectively). We leave this for future work, as we focus on accidental clicks in this paper.
3 Discover accidental clicks
Our datadriven approach for detecting accidental clicks on ads consists of two steps: data modelling and dwell time thresholding
. The former fits observations of dwell time of a large set of ad landing pages to a probabilistic model, whereas the latter estimates threshold of dwell time to identify accidental clicks. We employ an unsupervised learning approach, as no supervised learning approaches to classify ad clicks as accidental can be designed as this would require building a ground truth, which is not achievable.
3.1 Data modelling
We assume that observations of dwell time of ad clicks are generated by an underlying probabilistic model. Ideally, such a model has to simultaneously represent the three types of clicks shown in Figure 1, accidental, short, and long.
Generally speaking, a mixture of distributions is a probabilistic model that captures the presence of “subpopulations” within an overall population Lindsay1995 ; McLachlan2000
. As such, it is a good candidate to describe our observations of dwell time. More formally, a continuous random variable
(e.g., dwell time) is distributed according to a mixture of(discrete) component distributions if its probability density function (pdf)
is a convex combination of pdfs :(3) 
where:

each belongs to the same (parametric) family of distributions (e.g., Normal,^{3}^{3}3We refer to Normal and Gaussian distribution, interchangeably. LogNormal, Gamma, Weibull, etc.);

is the vector of parameters associated with the th component, e.g., if
is the pdf of a Normal distribution
then ; 
is the overall vector of parameters of the mixture model;

there exists a latent random variable denoted by governing which component each observation of is drawn from. This random variable is distributed according to a categorical distribution whose parameter is the vector of mixture weights , so that:

pick the component distribution with probability ;

generate a value for from the component distribution .

We describe next how to model dwell time on ad landing pages using a mixture of distributions.
3.1.1 Mixture of distributions for dwell time
Representing the three types of clicks can be done by having the observed dwell times on ad landing pages generated by a mixture of distributions. Let be the number of ads. For each ad we consider a sample of i.i.d. positive random variables , with each representing an observation of the dwell time associated with the th click on the ad : . Each is drawn from a mixture of up to components, so that the pdf of is:
(4) 
In addition, each
is the pdf of the same parametric distribution, although we only consider probability distributions with positive domain (
e.g., LogNormal, Gamma, Weibull, etc.) as dwell time cannot be negative.Next, we discuss how the parameters of the mixture model can be estimated from the observed data.
3.1.2 Parameter estimation
For each ad, we estimate the overall vector of parameters from the observed dwell times using maximum likelihood estimation (MLE). For each ad we know that the pdf of each of its observations is a mixture of three distributions, defined as in Equation 4. Since all observations in are independent and identically distributed, we compute their joint probability density as:
(5) 
From the joint probability density we derive the likelihood function , as . Although the two functions are the same, the likelihood function emphasises that the dataset is fixed and the parameters are variable. The aim of MLE is thus to find a value of – i.e., an estimate – maximising the likelihood function:^{4}^{4}4In practice, we often seek for so as to maximise the loglikelihood function , since this is equivalent (the natural logarithm is monotonically increasing) but simpler because products change into summations.
(6)  
Different likelihood functions to be maximised can be obtained depending on which pdf we fill in Equation 6 with. However, if the resulting is differentiable in , we can find as a solution of the system of equations:
(7) 
where is the gradient of the likelihood function, i.e., the vector of partial derivatives of the likelihood function with respect to each parameter in .
Unfortunately, a maximum likelihood estimation of the parameters is not straightforward, since often there are no closedform solutions to Equation 7 available; as such, we cannot solve directly SchlattmannMAFMM2009 . A typical solution is to use the expectationmaximization (EM) algorithm BishopPRML2006 . EM is an iterative
, numerical approximation procedure that starts with an initial random guess for the values of the parameters and converges to a local maximum (or to a saddle point) of the observeddata likelihood. Although EM does not guarantee convergence to a global maximum, in practice there are a variety of heuristic approaches for escaping a local maximum: multiple restarts, clever initialization, and modifications to the EM algorithm itself
ElkanMM2010 . Finally, for each ad we compute the set of model parameters which maximise the likelihood function.3.1.3 Model selection
To choose the “best” model for each ad, we cannot just select the one with parameters , i.e., the one that best fits to the observed data. In general the more complex (flexible) is the model the better will be its goodnessoffit to the observed data; in other words, the higher will be its likelihood function computed with respect to the observed data. At the same time, the more complex (flexible) is the model the less it generalises to unseen data; in other words, the higher is the chance of the model to overfit the observed data JamesISL2014
. Therefore, if we choose the model having the highest likelihood we always end up selecting the one having the maximum degree of freedom,
i.e., the maximum number of components .Therefore, to avoid overfitting and find a tradeoff between complexity and interpretability^{5}^{5}5This is also often referred to as the
biasvariance
tradeoff. we use tools such as the Akaike Information Criterion (AIC) Aikake1973AIC or Bayesian Information Criterion (BIC) Schwarz1978BIC . The former is computed as , whereas the latter as , where is the number of components of the model, is the likelihood function as maximised by the parameters of the model estimated from the observed data, and is the dataset size. Both criteria try to penalize models that are unnecessarily too complex, and finally select the one with the smallest AIC or BIC.So far, we have estimated and selected the mixture model that best describes the observed dwell times on each ad landing page. Next, we present how we can use this model to compute dwell time thresholds to identify accidental clicks.
3.2 Dwell time threshold of accidental clicks
For each ad, we fit the observed dwell times on its landing page to a mixture of distributions using MLE and one of the model selection criteria described above. We then focus on the subset of ads exhibiting exactly all three components, namely ads with dwell times fitting a mixture of three distributions. Intuitively, these are the ads showing all the three categories of clicks we have conjectured the existence of, namely, accidental, short and long.
Given an ad and the set of parameters of all its components, we compute statistics such as the expected value or the median of every component. As we are interested in detecting accidental clicks we only focus on the first component of each ad. Using the second and third component to study short and long clicks, respectively, is something we leave for future work.
For example, if we fit the data to a mixture of three LogNormal distributions we can represent the first component by a random variable distributed as a LogNormal with parameters
. In general, for any random variable , or equivalently , we can derive the following:
(where denotes the expected value);

.
We therefore estimate a perad threshold of dwell time for detecting accidental clicks using either the expected value or the median of the first component – the latter being more robust to the presence of outliers – by letting
and in the equations above. Finally, to obtain an overall estimate of dwell time threshold of accidental clicks across all the ads, we compute the mean or the median of the individual perad estimates.In this section, we described a twostage approach for computing thresholds of dwell time from observed data, to detect accidental clicks on ads. Next, we present experimental results when these thresholds are deployed within a large ad network.
4 Experiments
We conduct two experiments on multiple datasets of ads served by a large ad network, codenamed Gemini, on several Yahoo mobile apps. We focus on mobile apps as these are where accidental clicks are more likely to happen GoogleAdWords2016 ; StewartHCI2012 . We only consider ads with at least 100 clicks to increase the confidence of the estimates of our thresholds.
In the first experiment, we choose one pivoting mobile app to estimate the dwell time threshold of accidental clicks, which is then used to identify accidental clicks on other (two in our case) mobile apps. To protect sensitive information, we refer to the former as App 1 and to the other two as App 2 and App 3, respectively. The experiment is performed on two onemonth datasets, which we refer to and , respectively. Each consists of a random sample of around 10,000 ads and 6.5M clicks, unevenly distributed on the three mobile apps. In the second experiment, instead of having one pivoting mobile app used to estimate a single dwell time threshold of accidental clicks, we generate a threshold for each mobile app. The dataset used is a random sample from three weeks worth of data containing 120,000 ads and 70M clicks, hereinafter called .
With the first approach – using a pivoting app – the aim is provide a single estimate of dwell time threshold of accidental clicks, using the app with the highest volume of traffic. The second approach provides one threshold per app, which is more flexible and accounts for the effect of different user experience, population, and operating systems (i.e., Android vs. iOS).
4.1 Data preprocessing
In both experiments, we remove outliers by discarding clicks with dwell time greater than 600 seconds.^{6}^{6}6A threshold already used in previous work LalmasKDD2015 .
We also apply a logarithmic transformation to all the observations. This allows us to fit the logtransformed data to a mixture of Gaussian distributions, as this is the same as fitting the original data to a mixture of LogNormals.
The logarithmic transformation emphasises differences between small values of dwell time, whereas it smooths the same differences when they happen between larger values of dwell time. In this way, we are able to capture relative differences instead of absolute ones. To identify accidental clicks a difference of 1 second between two small values of dwell time, such as 2 and 3 seconds, is more important than the same difference between, say, 101 and 100 seconds.
Our approach is not tailored to any specific family of parametric distributions, and we select a mixture of LogNormal distributions purely because this gives us the best (smallest) AIC on average over all the ads.
4.2 Single threshold from a pivoting app
In this first setting, we consider (logtransformed) observations of dwell time from the ads clicked on our pivoting mobile app – App 1 – collected both from and . Then, we fit each set of ad dwell time observations to a mixture of Gaussian distributions. Figure 2 presents three examples of ads, each one fitted to mixture of one, two, and three Gaussian distributions. Except for the ad shown in Figure 1(a), the other two exhibit a first component centered around very small value of dwell time (around seconds), which likely represents accidental clicks.
Table 3 shows how ads in the two datasets fit to models having one, two, and three components, respectively. For both, the vast majority of ads (82.5% and 65.4%, respectively) fit to exactly three components. These ads are the ones we focus on to detect the dwell time threshold of accidental clicks, since their corresponding landing pages contain all the three categories of clicks described in Section 1. Since our aim is to “isolate” accidental clicks happening on our pivoting app, we concentrate on the first component. According to our conjecture, this would capture users clicking on an ad by mistake, or simply returning to the publisher site without actually landing on the advertiser’s page.
1 comp  2 comps  3 comps  

1.0%  16.5%  82.5%  
2.9%  31.7%  65.4% 
For each ad we compute an estimate of the dwell time threshold using the median of its first fitted component, as this is more robust to the variance. In fact, we observe that the variance increases going from the first to the second and finally to the third component. Intuitively, this reflects the variability of dwell time on each click category: dwell times of accidental clicks are expected to differ less between each other than what would be the case with short and long clicks.
To obtain an overall estimate of the threshold (an estimate derived from all the perad estimates), we propose two strategies: (i) the mean of all the perad medians; (ii) the median of all the perad medians. The first one results in a generally higher threshold, which implies considering “accidental” a larger number of clicks. The second estimate is more “conservative” and usually generates a smaller value of the threshold.
In Figure 3, we plot the distribution of perad medians of the first component computed from the pivoting App 1 on and . The median of all those medians (the blue dashed line) seems more suitable than the mean (the red dashed line), as it perfectly aligns with the “peak” we are interested in, which sits around seconds. For business confidentiality, we do not disclose the percentage of accidental clicks for each of the considered apps, but we can report that this percentage is stable over the two datasets, once the thresholding strategy is fixed. Anecdotally, this percentage was shown to change using a dataset from a different time period, as the result of a change of the user interface on a specific app.
4.3 Multiple perapp thresholds
In the previous setting, we reported results obtained when using a pivoting mobile app to compute a single threshold of dwell time, which in turn can be used to identify accidental clicks for other apps. In this section, we discuss another approach, which was rolled out in production. Instead of computing a single threshold from one pivoting app, we generate a dwell time threshold of accidental clicks for each app, individually. By doing so, we are also accounting for the effect of different user experiences or user populations on the different mobile apps and operating systems (i.e., Android vs. iOS). We use a default threshold value for apps with not enough observations of dwell time.
In Figure 4
, we plot the empirical cumulative distribution function (eCDF) of the thresholds computed on each week of the dataset (
). We observe that the distribution of thresholds remains stable over time. The median of all the perapp thresholds identified with this approach results in a value of 2.1 seconds, which aligns with the threshold generated using the pivoting strategy.In Figure 5, we separate the eCDFs of the thresholds obtained from apps on Android and iOS. We see that there are differences, as the median values are now and
seconds, respectively. In general, thresholds of accidental ad clicks on Android apps are more rightskewed than those computed for iOS apps, thereby suggesting that thresholds on Android are somewhat less dependent from the app which they are computed on.
We presented two strategies to calculate dwell time thresholds of accidental clicks. Both strategies rely on the same datadriven approach, i.e., estimating the parameters of a mixture of three dwell time distributions and computing aggregated statistics (e.g., the median) on the first component. The two strategies differ in the (historical) dwell time data used to fit the mixtures. One uses observations of dwell time only from a single app, and hence provides the threshold for apps with few historical dwell time observations, who may be less popular than the pivoting app or have just entered the market. The other provides a perapp threshold, which can be used when there are multiple apps with a sufficiently large number of dwell time observations. In addition, this strategy considers the impact of different user experience on those apps.
In the next two sections, we present two use cases where our proposed datadriven approach for identifying accidental clicks was deployed.
5 Use case I: Discounting accidental clicks when billing advertisers
With a mechanism for detecting accidental clicks, an ad network may simply discard all accidental clicks when billing the advertiser. This can however severely impact revenue for both the ad network and the publisher, at least in the short term. For example, in our dataset , we saw that the top3 most revenuelosing apps account for of the overall revenue loss for all the apps under consideration.^{7}^{7}7The actual revenue loss is not shown due to business confidentiality. It is therefore important to control how much revenue loss is acceptable, hence looking for a tradeoff between accounting for accidental clicks (satisfying the advertisers) and containing revenue loss (satisfying the ad network and publishers). We present a smooth method to discount the price of accidental clicks, instead of discarding them, so that advertisers are not fully charged for those clicks.
5.1 Smooth discounting strategy
One of the main attractions of ad networks is scale; advertisers have access to a large number of impressions and reach a wide audience with a single buy. However, not all the apps in the network perform equally. The advertiser is then faced with the problem of either selecting which apps to bid on or adjusting the bids by app. Both cases create extra friction for the advertiser. The algorithmic discounting we present below addresses this problem by adjusting the cost per click on each app of the network, such that the return on investment (ROI) for the advertiser is the same across all the apps.
We assume the existence of a pivoting app from which we estimate the dwell time threshold of accidental clicks , such as discussed in Section 4.2. Let be the set of clicks observed on ad , which have been impressed on the pivoting app on a fixed time window. Moreover, let be the total number of observed clicks on . Therefore, – where is the dwell time of – is the set of nonaccidental clicks on identified using on the same time window, and is the total number of nonaccidental clicks on . Similarly, for any other app we define and .
An advertiser may associate a value to each click on ad , referred to as . This corresponds to the amount of money the advertiser would like to earn from a click on , independently of the source (app, in this case) where such click occurs. Under a CPC cost model, there is a maximum amount of money the advertiser is willing to pay for having ad impressed and clicked, denoted by . We define the advertiser ROI for ad on the generic app as:
(8) 
where the numerator is the total value earned by the advertiser considering only valid – nonaccidental – clicks on app, and the denominator is the total cost the advertiser would pay for all the clicks on ad occurred on app. If we knew what is the true nonaccidental click rate of the app (), we could rewrite Equation 8 as:
(9) 
Indeed, is the MLE estimate of the true . We require the app we chose as pivot not only to be the one from which we can accurately estimate but also the best performing app with the highest , i.e., the highest proportion of valid ad clicks overall. This is because we want the pivoting app to be the “benchmark” against which we compare all the other apps of the network.
The advertiser ROI calculated on any app of the network should be ideally equal to that of the pivoting app:
(10) 
Moreover, since the value that the advertiser would get from a click on ad () is independent of the source, we can rewrite Equation 10 as:
(11) 
We may observe that , since is the highest among all the apps by design. For Equation 11 to be satisfied, we define as the adjusted cost of each accidental click on ad , specific to app:
(12) 
The intuition is that the cost of an accidental click on on a generic app () should be obtained by discounting the price for a valid click () proportionally to its relative value with respect to the best performing pivoting app of the network , which is exactly our discount factor.
This strategy does not discount accidental ad clicks on the pivoting app itself. This is because the pivoting app is chosen as the one with the smallest accidental click rate, thus likely with little need to apply a discount factor. Nonetheless, we may also decide to account for accidental clicks on the pivoting app, especially if its click value performance deteriorates. Various strategies may be deployed. For example, we can monitor the number of accidental clicks on the pivoting app and if it eventually exceeds some established threshold, we can apply to those accidental clicks a default discounting strategy by choosing among one of the discount factors computed for the other apps.
5.2 Estimating nonaccidental click rate
To implement our proposed discounting strategy, we must accurately estimate the (binomial) proportion of nonaccidental clicks of an app. The most straightforward way is to use maximum likelihood estimate, namely a singlepoint estimate , which is the overall number of nonaccidental ad clicks divided by the total number of ad clicks observed during a specific time window:
. This estimate however is not robust when we have an app with a low number of observations. To overcome this, we compute the confidence interval for
.There exist several ways to compute a confidence interval for an estimate of a binomial proportion .^{8}^{8}8In this setting, and .
The normal approximation interval is the simplest and most common approach, and assumes the distribution of error of a binomiallydistributed observation to be Gaussian. This is computed as
, where is the proportion of successes in a Bernoulli trial process estimated from the statistical sample, is the percentile of a standard Normal distribution, is the error percentile and is the sample size.The normal approximation however does not always work. Several competing formulas are available that perform better, especially for situations with a small sample size and a proportion very close either to 0 or 1. The choice will depend on how important it is to use a simple and easytoexplain interval versus the desire for better accuracy. As such, the AgrestiCoull interval BrownSS2001 is another approximate binomial confidence interval, which is more robust than the normal approximation interval. Given successes in Bernoulli trials, it defines the following quantities: and . Then, a confidence interval for is given by: , where is the percentile of a standard Normal distribution, as before.
5.3 Comparing nonaccidental click rates
The proposed discounting strategy requires computing the ratio of nonaccidental click rates between the app of interest and the pivoting app. If the estimate of is not a singlepoint estimate such as but is a confidence interval, one way to compare two estimates is to take the ratio of their upper confidence bounds:
(13) 
where ucb is the upper confidence bound computed by the AgrestiCoull interval defined at the end of previous section.
We already stated that the pivoting app is assumed to be the one with the highest . However, for our discounting strategy to be robust it should also account for the case when an app is overperforming in terms of “click value” the pivoting app^{9}^{9}9This may happen if the same pivoting app has been running for long and a new, better performing app slightly overtakes it.. In such a case, we would like the discount factor to be greater than 1 only when we have a degree of confidence in it. One way to implement this is to require the overperforming app’s lower confidence bound (lcb) being greater than the pivot’s. We can therefore modify Equation 13 to:
(14) 
We make the following observations for Equation 14. First, the ratio of nonaccidental click rate will be greater than 1 only if the lower confidence bound of the app is greater than the upper confidence bound of the pivot. Second, in case of a large sample with nonzero valid clicks the ratio of nonaccidental click rate will converge to the ratio of singlepoint estimate (MLE). Third, in case of a small sample size the ratio of nonaccidental click rate will be close to 1 indicating that we do not have enough data to suggest that the app of interest is any different from the pivoting app. Fourth, there is still a minimum number of clicks needed for Equation 14 to produce reliable results.
When we have enough confidence that the ratio of nonaccidental click rates between an app and the current pivot is greater than 1, we can update the pivot with that app and compute the discount factors using the new app as the new benchmark.
Next, we discuss through an example the impact on revenue of this smooth discounting strategy once implemented.
5.4 The impact of smooth discount factors
We compute the discount factors for accidental ad clicks on the two datasets and , described in Section 4.1. We consider all the ads impressed and clicked on all three apps, App 1, App 2, and App 3. To increase the confidence in our estimates, we discard ads with less than 40 clicks on each app. We select the pivoting app as the one with the highest , estimated either via MLE or with the AgrestiCoull estimator. In both cases, App 1 is chosen.
Discount Factors  

MLE  AgrestiCoull  
App 2  0.72  0.79  
App 3  0.64  0.73  
App 2  0.66  0.75  
App 3  NA (not enough obs.)  NA (not enough obs.) 
Table 4 shows the discount factors computed using as the dwell time thresholds of accidental clicks ( seconds on ; seconds on ), and two different ratio of estimates of , one obtained with MLE and the other using AgrestiCoull in combination with Equation 14. Each row shows how much an advertiser should be charged for one accidental click on an ad shown on the app indicated by that row depending on the estimator used, providing that a valid (nonaccidental) click on the same ad would cost 1. For example, looking at if an ad click on App 2 originally costs to the advertiser, any accidental ad click on the same app will instead cost after discounting using the AgrestiCoull estimate of from Equation 14.
We observe that the discount factors are comparable across the two datasets when computed using the same strategy. Moreover, larger discounting happens when generated from a singlepoint MLE of nonaccidental click rate; the discounting is smaller when computed using the AgrestiCoull confidence interval. Depending on how aggressive the discounting has to be, one or the other approach may be chosen.
At the beginning of this section, we discussed how with dataset the potential revenue loss that would result from fully discarding accidental clicks when billing advertisers was too high. Now using Equation 12, the discount factor defined in Equation 14, and using App 1 as pivoting app, the revenue drop is reduced by about 73.1%; allowing for advertisers to save money on likely less valuable clicks, while controlling for revenue impact for the ad network and publishers.
6 Use case II: Filtering out accidental clicks when training an ad click model
Many ad networks improve the logic behind their ad serving algorithm through machinelearned models. At each ad request, these models provide a ranked list of ads to serve to maximise the overall expected revenue, eCPM.^{10}^{10}10CPM stands for cost per mille (impressions) and indicates the earnings gained every thousand ad impressions sold. The eCPM is an estimate of the truly observed CPM, computed for each ad as . Each (or bid) is the price an advertiser is willing to pay for buying an impression, whereas is the estimate of the clickthrough rate of ad (). Estimating means estimating for all ads in the ad network inventory. Machinelearned models achieve exactly this task, i.e., they are trained on historical datasets of ad clicks to predict CTR from a featurevector representations of serviceable ads.
Training models on datasets containing a “large enough” ratio of accidental clicks may overestimate an ad CTR. This is because the estimated CTR becomes “inflated” with accidental clicks, eventually leading to the selection of irrelevant ads to serve. Filtering out accidental clicks when training machinelearned models may provide a more accurate selection of ads, resulting in higher revenue for the ad network and the publisher.
Using our datadriven approach to identify accidental clicks, we compute a threshold for a large number of Yahoo mobile apps (Yahoo News, Yahoo Mail, etc). We compute perapp thresholds of accidental click, as we have a sufficiently large number of dwell time observations for each app.
For each app, we then filter out clicks that are below the corresponding threshold for that app. The filtered clicks are not used to train the ad click model. We refer to this model as accidental_click. The model where no filtering of accidental clicks is our baseline mode, denoted as baseline. We setup an online A/B testing experiment, where a fraction of Yahoo Gemini incoming ad traffic is split between a control bucket and a variation bucket. More specifically, the A/B test affects about of the overall ad serving traffic on the Yahoo apps considered. The traffic served by the control bucket is handled by the baseline ad click model, whilst variation bucket dispatches traffic to the accidental_click model. We thus compare the performance of the two models by measuring both CTR and CPM (clickthrough rate and revenue).
A significant lift is obtained using the accidental_click model of compared to the baseline model (significance and value using onetailed twoproportion ztest).
This means that an ad click model trained on a cleaner dataset (i.e., without accidental clicks) leads to a better estimation of ad CTR. This trained click model is better at predicting ads that are more likely to be clicked, i.e., more relevant to users, as it is relies on ads that were clicked by users with an intent.
Similarly, we observe a statistically significant lift in CPM of (level of significance and value ), which is partly due to the lift in CTR.
In this section, we showed that our datadriven approach for identifying accidental clicks is effective as a preprocessing step to training machinelearned ad click models, which ad networks leverage to estimate CTR to rank ads at serving time.
7 Related work
Various works investigated the role of dwell time on web pages. Liu et al. LiuSIGIR2010 , who modelled dwell time on web pages using a Weibull distribution Papoulis2002 , found that web browsing exhibits a significant “negative aging” phenomenon, suggesting that some initial screening has to be passed before a page is examined in detail by a user. They also demonstrated that dwell time distributions can be predicted purely using lowlevel web page features. We extend this work – focussing on ad (mobile) web pages – by proposing a model of dwell time based on a mixture of distributions instead of Weibull. This allows us to capture three categories of ad clicks, accidental, short, and long, where the focus of this paper is on accidental clicks.
Kim et al. KimWSDM2014 presented a method to explain dwell time on search engine result pages. They estimate dwell time distributions for SAT (satisfied) or DSAT (dissatisfied) clicks for different click segments and use them to derive features to train a clicklevel satisfaction model. Yi et al. YiRecSys2014 use itemlevel dwell time as a proxy to quantify how likely a content item is relevant to a particular user in a recommender system. Furthermore, Yin et al. YinKDD2013 show how to enrich the uservote matrix by converting dwell time on items into users’ “pseudo votes” and then improve recommendation performance. All these works show that considering dwell time leads to improved decisionmaking tasks. In our work, our task is the identification of accidental clicks.
In the context of sponsored search e.g., BeckerCIKM2009 ; GrbovicSIGIR2016 ; RaghavanSIGIR2009 ; SculleyKDD2009 ; SodomkaWWW2013 and display advertising e.g., AzimiCIKM2012 ; BarajasCIKM2012 ; KaeLDMTA2011 ; RosalesWSDM2012 , studies have mostly focused on predicting ad performance in terms of CTR. An ad with a high ad CTR is considered to perform well, since it indicates that the ad attract users, who click on it. CTR, however, does not account for the postclick experience, that is how users experience the ad landing page. Dwell time on ad landing page has proven to be a good proxy of the postclick experience BarbieriWWW2016 ; LalmasKDD2015 , reflecting the assumption that the longer the time user spends on the ad landing page the higher the chance he or she “converts” (e.g., by purchasing an item, registering to a mailing list, etc.), or simply the more likely is for the user to build an affinity with the brand BeckerCIKM2009 ; RosalesWSDM2012 . These are cases of clicks that bring values to advertisers. Our work also looks at how dwell time relates to conversion rate, confirming what was shown in Goldman2014GSP that the former is a significant predictor of the latter, and hence a good proxy for measuring the “value” of a click. We use dwell time as our proxy of the ad click value, and proposed a datadriven methodology to identify accidental clicks, i.e., ad clicks with very short dwell time that are not only valueless to advertisers, but also to machinelearning models trained to predict ad CTR.
Other studies have investigated click “value” in the context of online advertising, mostly to contrast with fraudulent activities perpetrated by dishonest advertisers and/or web publishers like click spam DaswaniHotBots2007 ; StoneGrossIMC2011 . To the best of our knowledge, this is the first work using dwell time to identify accidental clicks, in combination with a datadriven approach that can be applied to other domains, where it is important to quantify the value of a click.
8 Conclusions
In this paper, we propose a datadriven method to identify accidental clicks. An accidental click happens when a user that click on an ad, likely by mistake, is redirected to the ad landing page and bounce back without having seen the page. This type of clicks happens often on ads impressed on mobile devices.
We collect empirical dwell time observations from several Yahoo mobile apps for a large number of ads. We decompose the distribution of dwell time into a mixture of components, with each component corresponding to a click category: accidental, short, and long. Representative statistics for the first component of each ad are then further aggregated to provide an overall estimate of the dwell time threshold of accidental clicks.
We assess the validity of our method when this is applied on two use cases. First, we describe a technique that estimates a smooth discounting factor, so that accidental clicks are not fully charged nor totally discarded at billing time. This allows for a tradeoff between advertiser’s satisfaction and potential revenue loss. Experiments conducted on different Yahoo mobile apps confirm that thresholds found are stable over time, and revenue loss can be mitigated by around 73.1% using our discounting strategy compared to not charging at all accidental clicks. Second, we demonstrate that an existing machinelearned ad click model used at serving time leads to better online performance if trained on datasets where accidental clicks are removed using our datadriven approach. We observe a positive and statistically significant lift on CTR and CPM (+3.9% and +0.2%, respectively) with the model trained without accidental clicks.
As future work, we plan to look at the two other components of the mixture models, so as to estimate thresholds – again in a datadriven manner – for short and long clicks. The former are clicks that suggest that the user had a negative postclick experience, whereas the latter is an indication of a positive postclick experience. Being able to do so perapp would remove the often ad hoc setting of dwell time thresholds. In addition to that, we would also like to investigate if the same (or similar) methodology proposed in this work could be used to assess the engagement of users with any section of a web page or a mobile app (i.e., not only with ads shown). Having a rigorous, datadriven methodology to classify content on the basis of the time users spent on it (i.e., dwell time) might be useful to providers, who could in turn make better decision on which of their assets they should invest more.
Acknowledgements.
The authors would like to thank Michal Aharon and Marc Bron for their support in setting up the online A/B test, which allowed them to deploy and assess their approach on a second use case, i.e., the ad click model.On behalf of all authors, the corresponding author states that there is no conflict of interest.
References
 (1) H. Akaike. Information theory and an extension of the maximum likelihood principle. In Selected Papers of Hirotugu Akaike, pages 199–213. Springer, 1973.
 (2) J. Azimi, R. Zhang, Y. Zhou, V. Navalpakkam, J. Mao, and X. Fern. Visual appearance of display ads and its effect on click through rate. In CIKM ’12, pages 495–504, New York, NY, USA, 2012. ACM.
 (3) J. Barajas, R. Akella, M. Holtan, J. Kwon, A. Flores, and V. Andrei. Dynamic effects of ad impressions on commercial actions in display advertising. In CIKM ’12, pages 1747–1751, New York, NY, USA, 2012. ACM.
 (4) N. Barbieri, F. Silvestri, and M. Lalmas. Improving postclick user engagement on native ads via survival analysis. In WWW ’16, pages 761–770, New York, NY, USA, 2016. ACM.
 (5) H. Becker, A. Broder, E. Gabrilovich, V. Josifovski, and B. Pang. What happens after an ad click?: Quantifying the impact of landing pages in web advertising. In CIKM ’09, pages 57–66, New York, NY, USA, 2009. ACM.
 (6) C. M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). SpringerVerlag New York, Inc., Secaucus, NJ, USA, 2006.
 (7) L. D. Brown, T. T. Cai, and A. DasGupta. Interval estimation for a binomial proportion. Statistical Science, 16(2):101–117, 2001.
 (8) N. Daswani and M. Stoppelman. The anatomy of clickbot.a. In HotBots’07, pages 11–11. USENIX Association, 2007.
 (9) C. Elkan. Mixture Models. http://cseweb.ucsd.edu/~elkan/250Bwinter2011/mixturemodels.pdf, March 2010.
 (10) M. Goldman and J. M. Rao. Experiments as instruments: Heterogeneous position effects in sponsored search auctions. https://ssrn.com/abstract=2524688, November 2014.
 (11) M. Grbovic, N. Djuric, V. Radosavljevic, F. Silvestri, R. BaezaYates, A. Feng, E. Ordentlich, L. Yang, and G. Owens. Scalable semantic matching of queries to ads in sponsored search advertising. In SIGIR ’16, pages 375–384, New York, NY, USA, 2016. ACM.
 (12) A. Jacobson. Preventing accidental clicks for a better mobile ads experience. https://adwords.googleblog.com/2016/05/preventingaccidentalclicksforbettermobileads.html, May 2016.
 (13) G. James, D. Witten, T. Hastie, and R. Tibshirani. An Introduction to Statistical Learning: With Applications in R. Springer Publishing Company, Incorporated, 2014.
 (14) A. Kae, K. Kan, V. K. Narayanan, and D. Yankov. Categorization of display ads using image and landing page features. In LDMTA ’11, pages 1:1–1:8, New York, NY, USA, 2011. ACM.
 (15) Y. Kim, A. Hassan, R. W. White, and I. Zitouni. Modeling dwell time to predict clicklevel satisfaction. In WSDM ’14, pages 193–202, New York, NY, USA, 2014. ACM.
 (16) M. Lalmas, J. Lehmann, G. Shaked, F. Silvestri, and G. Tolomei. Promoting positive postclick experience for instream yahoo gemini users. In KDD ’15, pages 1929–1938, New York, NY, USA, 2015. ACM.
 (17) B. G. Lindsay. Mixture Models: Theory, Geometry and Applications. NSFCBMS Conference booktitle in Probability and Statistics, Penn. State University, 1995.
 (18) C. Liu, R. W. White, and S. Dumais. Understanding web browsing behaviors through weibull analysis of dwell time. In SIGIR ’10, pages 379–386, New York, NY, USA, 2010. ACM.
 (19) G. Mclachlan and D. Peel. Finite Mixture Models. WileyInterscience, 1 edition, Oct 2000.
 (20) A. Papoulis and S. U. Pillai. Probability, Random Variables, and Stochastic Processes. McGrawHill Higher Education, New York, NY, USA, fourth edition, 2002.
 (21) H. Raghavan and D. Hillard. A relevance model based filter for improving ad quality. In SIGIR ’09, pages 762–763, New York, NY, USA, 2009. ACM.
 (22) R. Rosales, H. Cheng, and E. Manavoglu. Postclick conversion modeling and analysis for nonguaranteed delivery display advertising. In WSDM ’12, pages 293–302, New York, NY, USA, 2012. ACM.
 (23) P. Schlattmann. Medical Applications of Finite Mixture Models, Statistics for Biology and Health. SpringerVerlag Berlin Hiedelberg, 2009.
 (24) G. Schwarz. Estimating the dimension of a model. Annals of Statistics, 6:461–464, 1978.
 (25) D. Sculley, R. G. Malkin, S. Basu, and R. J. Bayardo. Predicting bounce rates in sponsored search advertisements. In KDD ’09, pages 1325–1334, New York, NY, USA, 2009. ACM.
 (26) E. Sodomka, S. Lahaie, and D. Hillard. A predictive model for advertiser valueperclick in sponsored search. In WWW ’13, pages 1179–1190, New York, NY, USA, 2013. ACM.
 (27) C. Stewart, E. Hoggan, L. Haverinen, H. Salamin, and G. Jacucci. An exploration of inadvertent variations in mobile pressure input. In MobileHCI ’12, pages 35–38, New York, NY, USA, 2012. ACM.
 (28) B. StoneGross, R. Stevens, A. Zarras, R. Kemmerer, C. Kruegel, and G. Vigna. Understanding fraudulent activities in online ad exchanges. In IMC ’11, pages 279–294, New York, NY, USA, 2011. ACM.
 (29) X. Yi, L. Hong, E. Zhong, N. N. Liu, and S. Rajan. Beyond clicks: Dwell time for personalization. In RecSys ’14, pages 113–120, New York, NY, USA, 2014. ACM.
 (30) P. Yin, P. Luo, W.C. Lee, and M. Wang. Silence is also evidence: Interpreting dwell time for recommendation from psychological perspective. In KDD ’13, pages 989–997, New York, NY, USA, 2013. ACM.
Comments
There are no comments yet.