I Introduction
Headline testing is important for publishers and media sites. Visitors to the homepage of online publishers are usually presented with lists or groups of headlines, sometimes along with snippets^{1}^{1}1For example, Yahoo Homepage (https://www.yahoo.com/), Google News (https://news.google.com/), The New York Times (https://www.nytimes.com/), and The Wall Street Journal (https://www.wsj.com/).. A compelling headline would encourage visitors to click it and read the whole article, and thus help increase user engagement, social sharing and revenue. At our company, we follow a testrollout strategy for headline testings. Once an article is published, multiple title variations are displayed to randomized user buckets with equal size for a defined period to conduct A/B testing. When the test is complete, the headline variant that has most clicks during the A/B testing period is selected and displayed for the rest of the article life. This strategy has been adopted as a common practice to select the best title variants and ad versions for headline testing [13] and online advertising [21], respectively.
There are two limitations of this testrollout strategy. First, during the initial testing period, we have to show the title variants with lower clickthrough rates (CTRs) to a sizable fraction of user population. For those users, we fail to optimize their engagement. Further, a large proportion of article traffic clusters in its early life, because freshness is a key factor of article popularity [17]. The testrollout strategy conducts A/B testing at the beginning of the article life, which may lead to significant click loss.
On the other hand, the performance of headline variations usually varies over time. The conclusion drawn from the initial testing period may not always hold throughout the whole article life. The testrollout practice is not able to capture any changes subsequent to the A/B testing period.
To address the limitations mentioned above, we formulate headline testing as a multiarmed bandit (MAB) problem, and introduce a batched Thompson Sampling method to optimize user engagement while learning the performance of each headline variant. The MAB problem is defined as follows. There are K arms, each associated with an unknown reward distribution. The player iteratively plays one arm, observe the associated reward, and decides which arm to play in the next iteration [3]. There is a tension between selecting the currently bestperforming arm to harvest immediate gain (exploitation) and discovering the optimal arm, i.e., the arm with the highest expected reward, but risk immediate loss (exploration). In the headline testing scenario, the headline variants of an article correspond to the arms, and the click count of each headline is the reward associated with each arm. Our goal is to maximize the sum of clicks across arms for each article.
There are many MAB algorithms such as greedy [18], Upper Confidence Bound (UCB) algorithms [6] [5] [4], Thompson Sampling [25], and Gittins Index [10]. Among them we select Thompson Sampling due to its strong empirical results [8] [22], solid theoretical guarantees [12] [20] [3] [15], and wide industrial application [11] [2] [1] [24] [14].
In Thompson Sampling, traffic allocation is updated for every single view event. It becomes a formidable computational burden as our site has a high volume of incoming traffic at high velocity. This leads us to consider an algorithm that processes user responses after they arrive in batches over a certain time period. Thompson Sampling with batch updates had been studied in display advertising and news article recommendation to analyze how it performs in the case of delayed feedback (i.e., observed rewards) processing compared with other MAB algorithms [8]. Batch update is incorporated in recent industrial applications of Thompson Sampling [14] [23]. Although how they select the update frequency is not disclosed, [14] adopts a method to update traffic allocation once a day, and [23] updates twice a day.
In this paper, we present a batched Thompson Sampling (bTS) method that is tuned for optimal performance when user feedback is processed in batches. The performance evaluation is based on empirical impressions/clicks of articles at our site and user responses simulated from empirical CTRs, and shows that the bTS method is robust, converges quickly to the true optimal arms, and outperforms the testrollout strategy. Our study is motivated by the headline testing problem, but one can apply this algorithm to other realworld problems where we need to accomplish optimization while testing with high volume and high velocity incoming data.
The rest of the paper is organized as follows. Section II introduces the current headline testing practice at our company, as well as its limitations. Section III describes the batched Thompson Sampling (bTS) method that we propose to apply in headline testing to gain more reward. We show the evaluation of the bTS method in Section IV. Section V concludes and discusses the future work.
Ii Current Headline Testing Practice
Currently at our news website, headline testing follows the testrollout strategy which consists of two periods:

Testing period
is the first hour after an article is published. During the testing period, each viewing request is randomly assigned to one of the headline variants with equal probability. At the end of the testing period, we deem the headline variant with the highest CTR as the
winner headline. 
Posttesting period refers to the remaining article lifespan after the testing period, during which we display the winner headline to all traffic.
The testrollout strategy for headline testing is intuitive and easy to implement in the system. However, it has the following two limitations:
Iia User Engagement Loss in the Testing Period
In the first hour testing period, impressions are equally allocated to each headline variant, which means that for an article with arms, only of the traffic is assigned to the headline variant with the highest underlying CTR. Showing inferior headline variants to of the traffic will sacrifice user engagement, e.g., article clicks, and user experience. The loss of user engagement and experience are both sizable because the traffic in the testing period covers as high as 24.36% of the total impressions across all articles. This result is based on empirical headline testing data from our news website, which will be described in Section IIIB1.
IiB Arm Performance Discrepancy Between Testing and Posttesting Period
The testrollout practice is unable to capture any performance change of headline variations beyond the testing period. Empirical data from our news website, same as the data mentioned above, shows the headlines deemed best during the testing period changed their performance afterward: on average there is a 12% discrepancy in their CTR between testing and posttesting periods. Thus, it is possible for the arm deemed optimal during the onehour testing period to be actually suboptimal over the article lifespan. Such a performance change cannot be observed and handled in the current practice.
To enhance this testrollout practice, we formulate headline testing as a MAB problem and present a batched Thompson Sampling approach. It is able to explore for the optimal variant, and at the same time gradually shift traffic to the bestperforming variant. The following section introduces the MAB headline testing methodology in detail.
Iii Methodology
This section describes the batched Thompson Sampling (bTS) method that is able to gradually allocate traffic towards the wellperforming arms, while leaving some traffic to other arms so as to explore for the possibly unobserved optimal arm. Section IIIA formulates headline testing as a Bernoulli bandit problem, and introduces Thompson Sampling for Bernoulli bandits. In Section IIIB, we explain the rationale of incorporating batched updates in Thompson Sampling, and introduce the factors associated with bTS. Section IIIB describes our empirical impressions/clicks data and how we simulate user responses based on them. Sections IIIC to IIIE present how we determine and tune the factors of bTS.
Iiia Preliminaries
IiiA1 Headline Testing as a Bernoulli Bandit Problem
Suppose an article has headline variants written by editors. In the MAB framework, each headline variant is treated as an arm. Each headline, when displayed, yields either a click (success) or no click (failure) as the reward. The reward for headline
is Bernoulli distributed, with the
success probability (i.e., the probability of being clicked) as . The reward distribution of arm is fixed but unknown, in the sense that its parameter is unknown. At time step , we select an arm to display, and collect the reward observed from the selected arm. Here is the total number of impressions we decide to run MAB experiment on. Our goal is to maximize the total clicks for this article.IiiA2 Thompson Sampling
Thompson Sampling is a randomized Bayesian algorithm to solve the MAB problem of reward maximization [25]
. The general idea of Thompson Sampling is to impose a prior distribution on the parameters of the reward distribution, update the posterior distribution using the observed reward, and play an arm according to its posterior probability at each time step.
In Bernoulli bandit problems, Thompson Sampling uses Beta distribution to model the success probability
for armbecause the observed reward follows a Bernoulli distribution, and the Beta distribution is a conjugate prior for the Bernoulli distribution
[3]. Initially, the Thompson Sampling algorithm imposes a Beta(1,1) prior on the success probabilities of all arms. It is a reasonable initial prior, because Beta(1,1) is the uniform distribution on the interval (0,1)
[3] [8]. At time step , Thompson Sampling algorithm draws a random sample from the Beta distribution of each arm, and displays the arm associated with the largest sampled value. Based on the observed feedback of the displayed arm, the Beta(, ) distribution of the displayed arm is updated to Beta( + 1, ) if the feedback is a click, or to Beta(, + 1) otherwise.Many studies have demonstrated the strong performance of Thompson Sampling algorithm in the MAB problem, both theoretically and empirically. [12] investigates Thompson Sampling as Bayesian Learning Automaton, and shows that in the twoarmed Bernoulli bandit problem, Thompson Sampling converges to only playing the optimal arm with probability one. In [20]
, Thompson sampling is proved to be a consistent estimator of the optimal arm. Further,
[3] and [15] provide regret bounds for Thompson Sampling that are asymptotically optimal in the sense defined by [19], so that it has the theoretical guarantee competitive to UCB algorithms. Empirically, [8] shows the performance of Thompson Sampling is competitive to or better than that of other alternative MAB algorithms, such as greedy and UCB, on realworld problems like display advertising and news article recommendation. [8] also mentions Thompson Sampling can be implemented efficiently, in comparison with full Bayesian methods such as Gittins index. Recently, adaptations of Thompson Sampling have been applied in many domains, such as revenue management [9], recommendation system [16], online service experiments [24], website optimization [14], and online advertising [11] [2] [1].IiiB Batch Updates for Realworld Highvolume Traffic
In traditional Thompson Sampling for Bernoulli bandits, Beta distribution of the selected arm is updated after every reward feedback is observed. In a realworld system, especially when both the volume and velocity of incoming traffic are high, the feedback is typically processed in batches over a certain period of time. This is the case for the current infrastructure at our company. Thus, to implement Thompson Sampling, it is necessary to apply the batched Thompson Sampling (bTS) that updates the posterior distribution after a fixed time period.
The general procedure of bTS is described as follows: within each fixed time interval (i.e., batch), the Beta distribution of each arm remains unchanged. We allocate traffic across arms based on their random Beta distribution samples drawn for each incoming view event. At the end of a time interval, we aggregate the data collected within this batch, namely the numbers of clicks and impressions for each arm, and use the aggregated data to update the Beta distribution of each arm.
There are three factors to be tuned for bTS. The first one is how long the algorithm should run. We determine an algorithm stopping point after which the click gain is so little that the system cost is not worthwhile. The second is how we aggregate the feedbacks within each batch to update the posterior distributions. The last factor is the time interval between each update. These three factors are determined based on the empirical as well as simulated data from our realworld headline testing platform, which will be described as follows.
IiiB1 Empirical data
The empirical data cover the articles with headline testing at our news website on two weekdays and one weekend day. In the testing period, the data consist of impressions and clicks of all headline variants for each article. While in the posttesting period, we only have data on the headline variant deemed best in the previous testing period for each article, because all traffic is shifted to this headline variant. The impressions and clicks are aggregated by every minute.
IiiB2 Simulation practice
Similar as how [8] evaluates Thompson Sampling in display advertising, we evaluate the performance of bTS under different factor values in a simulated environment, where impressions and CTRs are real, but the user responses are simulated based on the empirical CTR of each headline.
In the simulation of Thompson Sampling [8], the reward probability of each arm is modeled by a Beta distribution which is updated after an arm is selected. For batched Thompson Sampling, the Beta distribution of each arm is only updated at the end of each batch, i.e., a fixedlength time interval. Accordingly, the number of events within a batch, which is referred to as batch size, is determined by the count of empirical impressions occurred in this fixedlength time interval. The batches of an article rarely have the same size, because the number of impressions usually varies a lot in different time intervals. Table I illustrates an example of how batch sizes are calculated based on the minutelevel impressions when updates occur every 3 minutes.
Timestamp  Impressions^{2}^{2}2The numbers are for illustration purpose only. They are not actual impressions.  Batch Index  Batch size 
12:48:00  615  1  9,945 
12:49:00  4,568  1  
12:50:00  4,762  1  
12:51:00  5,282  2  16,028 
12:52:00  5,412  2  
12:53:00  5,334  2 
We simulate user clicks following the practice of [21]: Upon a viewing request, suppose the algorithm selects to display arm , then the user response is simulated from the Bernoulli() distribution, where we denote as the estimated success probability of arm , defined by its empirical CTR during the testing period. Note that for the arm deemed best during the testing period, we still use its testing period empirical CTR to calculate , although theoretically we can calculate its “overall” empirical CTR including both testing and posttesting period. This is because this “overall” empirical CTR is not comparable to the empirical CTRs of other arms, which can only be calculated during the testing period. After the simulation is complete on all articles, we quantify the performance of a headline testing algorithm by total clicks summed across all articles over their lifespans.
IiiC Factor 1: Algorithm Stopping Point
Due to the observed CTR discrepancy between the testing and posttesting period illustrated in Section IIB, we would avoid stopping the algorithm too early, so as to cover any potential performance change among headline variations. One straightforward proposal is to run the algorithm throughout the whole article life. Although technically achievable, this proposal is not desired, given that the distribution of impressions over time usually has a very long right tail. When the impressions are very sparse, the click gain is so little that it is not worth the engineering overhead of running the algorithm. Thus, we would like to stop the algorithm when the majority of articles are no longer active.
We define the active lifespan of an article as the time it takes to reach 95% of its total impressions. Figure 1 demonstrates that the active lifespans of 95% of the articles are under 48 hours. Thus, we take 48 hours as the algorithm stopping point.
IiiD Factor 2: Update Methods
In traditional Thompson Sampling with Bernoulli bandits, the Beta(, ) distribution of the selected arm is updated to Beta(, ) if we observe a click, otherwise it is updated to Beta(, ). When observed responses come in batches, we need to aggregate the impressions and clicks data for each arm within each batch, before updating the corresponding Beta distribution. We consider two update methods to achieve this: summation update and normalization update.
To specifically describe the two update methods, we denote the Beta distribution of arm in the batches by Beta(, ). and are the click and nonclick counters for arm in batch . denotes the number of impressions in the batch.
Algorithm 1 explains the bTS method with summation update. It is a direct extension from the eventlevel update method of traditional Thompson Sampling, where and are updated by raw counts of clicks and nonclicks.
We also consider another update method named as normalization update. As illustrated in Algorithm 2, it increments and by the number of normalized clicks and nonclicks respectively, assuming equal traffic allocation across arms.
Normalization update method addresses a sideeffect of imbalanced traffic allocation across arms within each batch, which may fail to update in favor of the true winner arm. More specifically, if arm has few clicks in a batch, it may be a reflection of a low , or because the traffic allocated to this arm is not large enough to generate many clicks. Normalization update helps to mitigate the second possibility by assuming each arm has equal traffic allocation. The downside of this method is that it lowers the noise tolerance of the algorithm, because it would give of the weight to potential noise in the data. Given that K is usually less than four, this method may magnify the potential noise.
We compare the performance of the two methods using the simulation practice described in Section IIIB2. The algorithm runs for 48 hours as determined in Section IIIC, and the performance of either update method is quantified by the total click counts for all articles over their lifespans.

Update frequency in minutes  
1min  3min  5min  10min  30min  60min  

+2.50%  +3.57%  +1.43%  +1.08%  +1.28%  +0.41% 
According to the result shown in Table II, summation update consistently has better performance than normalization update across different update frequencies. One explanation is that, in the case of headline testing, data noise has more impact on the algorithm performance than the sideeffect of imbalanced traffic allocation across arms. Thus, we use the summation update method described in Algorithm 1 for the bTS method.
IiiE Factor 3: Update Frequency

Update frequency in minutes  
1min  3min  5min  10min  30min  60min  

0  0.04%  0.06%  0.18%  0.76%  1.52% 
In Table III, we demonstrate the percentage gap of total clicks between the bestperforming update frequency and the remaining frequencies. The table shows that for the bTS method with summation update, more frequent updates lead to more gain in clicks. The gain becomes marginal when the update frequency is lower than 5 minutes. Our infrastructure is capable of updating as frequent as every 5 minutes with almost no cost. However, going beyond 5min frequency creates a formidable challenge towards the infrastructure, due to the network transformation cost among multiple components of the system. In consideration of the marginal benefit associated with more granular updates, choosing 5min as the update frequency is a reasonable decision for our usecase.
To conclude the factors we selected for the bTS method in our realworld headline testing scenario to achieve the tradeoff between gain in clicks and system cost: we run the algorithm for 48 hours, aggregate data in each batch using summation update method described in Algorithm 1, and update the Beta distribution of each arm every 5 minutes.
Iv Methodology Evaluation
This section evaluates the bTS method in the setting of headline testing. In Sections IVA, IVB, and IVC, we evaluate its false convergence rate, speed of optimization, and robustness respectively. Then, we show bTS outperforms the testrollout practice, in terms of click gain in Section IVD, and less exposure of suboptimal headlines in Section IVE.
Iva False Convergence Rate
We analyze the false convergence rate of the algorithm, defined as the proportion of articles that fail to allocate most traffic to the optimal arm – the arm with highest – when their traffic allocation across arms is stable. Our evaluation shows 99.25% of the articles converge correctly for the bTS method.
Figure 2 demonstrates the traffic proportion over time of some sample articles that converge correctly: initially all arms have equal traffic allocation. When the algorithm begins running, it starts to explore for the optimal arm, while simultaneously allocating more traffic to the arm that performs well.
The figure also illustrates how Thompson Sampling gracefully handles exploration at the level of individual arms, as pointed out in [24]. bTS explores the clearly inferior arms (Arm 2 in all sample articles) less frequently than arms that might be optimal, i.e., good arms (Arm 1 and Arm 3 in the sample articles). This increases the click gain by shifting traffic from a clearly inferior arm to arms with better performance. It also helps distinguish the optimal arm faster, because there are more samples to compare among the good arms.
IvB Speed of optimization
We show the bTS method optimizes quickly, by analyzing the time it takes for the optimal arm to constantly have the largest traffic allocation. The histogram presented in Figure 4 shows the distribution of the time to optimize among articles that converge correctly. The percentile of the time to optimize is 30 minutes, which means after 30 minutes, 80% of the articles constantly have the largest traffic proportion allocated to their optimal arms.
IvC Selfcorrection under Stress Test
We conducted a stress test via simulation to evaluate the robustness of our algorithm against unfavorable traffic allocation. In the stress test, we change the initialized beta distribution for each arm, so that around 90% of the traffic is allocated to the arm with the lowest , and the remaining traffic is equally allocated to the other arms. We analyze the time it takes for the optimal arm to have the highest likelihood to be displayed for five consecutive batches^{3}^{3}3This is equivalent to the optimal arm having the highest mean of beta distribution among other arms for five consecutive batches., which we refer to as selfcorrection, and illustrate the distribution of the time needed for selfcorrection in Figure 4. The red line in the figure shows that the percentile is at 33 minutes – 80% of the articles are able to selfcorrect within 33 minutes.
IvD Gain in Clicks
In this section, we illustrate that applying bTS to headline testing generates more clicks than the testrollout strategy, especially during the testing period of an article when bTS dynamically allocate more traffic to the wellperforming arms, while impressions are equally allocated to each arm in the testrollout strategy. The testrollout baseline is set up as follows: for a given article, denote the total baseline click number as , the click number for arm during the testing period as , and its click number during the posttesting period as , then
(1)  
(2)  
(3)  
(4) 
where and are the article impressions in the testing period and the posttesting period, respectively. is the CTR for arm in the testing period. Expressions (2) and (3) mean and
are simulated from the corresponding Binomial distributions.
We set up this testrollout baseline instead of directly using the empirical click data, in order to achieve a fair comparison with the bTS method. Suppose an article has two arms, with their testing period empirical CTRs denoted as and , and . Thus, arm is displayed to all traffic in the posttesting period, and we can calculate its “overall” empirical CTR, denoted as , by its empirical clicks divided by its empirical impressions during the overall article life. Note that is not necessarily equal to as illustrated in Section IIB.
The simulation for bTS uses and as the estimated success probability of corresponding arms as explained in Section IIIB2, and it is not comparable with the empirical click data, where we can consider and as the underlying success probabilities. The possible gap between and can also lead to different click counts between bTS and empirical click data, even if bTS performs the same as current practice. On the other hand, the testrollout baseline we set up in (1)  (4) uses and as the underlying success probabilities, and thus is comparable with bTS.
The gain in clicks for bTS against the testrollout baseline is summarized in table IV. The gain in clicks is split into the “firsthour” period and the “remaininghour” period, which corresponds to the testing and posttesting periods in the testrollout strategy. On average, bTS gives a 13.54% increase in total clicks during the first hour. It also has competitive accuracy in selecting the correct optimal arm to exploit, indicated by 1% increase in clicks during the posttesting period than testrollout baseline.
It is worth noting that this baseline is conservative  the actual gain after implementation is likely to be larger. The baseline setup assumes the CTRs of arms are consistent between testing and posttesting period. This assumption leads to dilution in click gain during the posttesting period: the baseline almost always displays the true optimal arm in the posttesting period due to the long period of exploration^{4}^{4}4There is a 97% accuracy rate for the baseline to display the optimal arm during the posttesting period., and bTS cannot beat the baseline when the optimal arm is displayed in the baseline during the posttesting period. Thus, the 13.54% click gain harvested in testing period is diluted to 3.69% overall click gain. In practice, arm performance may change over time as mentioned in Section IIB. The current testrollout practice is not able to capture the fluctuation, while the bTS algorithm would detect the change and adjust the traffic accordingly, which would potentially gain more clicks.
Total clicks  First hour  Remaining hours  Total clicks 
% Increase in Clicks  13.54%  1.00%  3.69% 
IvE Fewer Impressions on Suboptimal Headlines
Another benefit of bTS is that it improves user experience by exposing much fewer suboptimal headlines. We analyze the overall percentage decrease in impressions on suboptimal headlines of the bTS method versus the testrollout practice as baseline. For the testrollout strategy, we assume that no traffic is allocated to suboptimal headlines during the posttesting period. Even in this bestcase scenario for the baseline, the suboptimal headline impressions of the bTS method decrease by 71.53% compared with the testrollout practice.
V CONCLUSIONS and Future work
In this paper, we propose to apply a MAB approach to headline testing where data is processed in batches due to high volume and high velocity of incoming data. Although the bTS algorithm is developed under the news article headline testing scenario, the parameters used to tune our model can be generalized to other realworld MAB problems, including marketing campaigns, onetime event optimization, and user purchase/registration funnels.
The stationary assumption beyond the first hour should be assessed after we have data on all arms beyond the first hour, which will be available after the bTS method is implemented. The current assumption is that , the success probability for arm , is constant over time. It is possible for to change over time, and this formulates a nonstationary bandit problem. Note that for nonstationary environments, the proposed bTS method is already an improvement of the testrollout strategy, because it continuously tests headline variants throughout the active lifespan of news articles. There is room for further enhancement of the methodology to achieve better performance, especially when changes so significantly and rapidly that the order among for flips multiple times during the article lifespan.
Clickbait detection and prevention are known challenges to media sites [7]. We are also motivated to explore more sophisticated metrics than raw clicks to be used as the reward of MAB. As an example, we may discount the clicks with a short dwell time, and the reward of each headline variation becomes a number between 0 and 1. We can then optimize this “clickdwell” reward using an adaptation of Bernoulli Thompson Sampling algorithm introduced in [3], which supports the reward to be distributed in the interval [0,1]. When the discounting strategy of dwell time is properly selected, this should be helpful to mitigate the clickbait problem.
At our news website, one module hosts multiple articles. When headline tests are conducted simultaneously on multiple articles within a module, they may interact with each other and lead to cannibalization. That is, one headline variant may be the optimal arm under one condition but not the others, depending on how we select the titles for the remaining articles in the same module. This motivates us to extend our research and develop a MAB testing framework that optimizes the article titles for the entire module, e.g., with a multivariate MAB algorithm introduced in [14].
References
 [1] (2014) Laser: a scalable response prediction platform for online advertising. In Proceedings of the 7th ACM international conference on Web search and data mining, pp. 173–182. Cited by: §I, §IIIA2.
 [2] (2013) Computational advertising: the linkedin way. In Proceedings of the 22nd ACM international conference on Conference on information & knowledge management, pp. 1585–1586. Cited by: §I, §IIIA2.
 [3] (2012) Analysis of thompson sampling for the multiarmed bandit problem. In Conference on Learning Theory, pp. 39–1. Cited by: §I, §I, §IIIA2, §IIIA2, §V.

[4]
(2010)
Regret bounds and minimax policies under partial monitoring.
Journal of Machine Learning Research
11 (Oct), pp. 2785–2836. Cited by: §I. 
[5]
(2009)
Exploration–exploitation tradeoff using variance estimates in multiarmed bandits
. Theoretical Computer Science 410 (19), pp. 1876–1902. Cited by: §I.  [6] (2002) Finitetime analysis of the multiarmed bandit problem. Machine learning 47 (23), pp. 235–256. Cited by: §I.
 [7] (2016) Stop clickbait: detecting and preventing clickbaits in online news media. In Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 9–16. Cited by: §V.
 [8] (2011) An empirical evaluation of thompson sampling. In Advances in neural information processing systems, pp. 2249–2257. Cited by: §I, §I, §IIIA2, §IIIA2, §IIIB2, §IIIB2.
 [9] (2017) Online network revenue management using thompson sampling. Cited by: §IIIA2.
 [10] (2011) Multiarmed bandit allocation indices. John Wiley & Sons. Cited by: §I.
 [11] (2010) Webscale bayesian clickthrough rate prediction for sponsored search advertising in microsoft’s bing search engine. Cited by: §I, §IIIA2.
 [12] (2008) A bayesian learning automaton for solving twoarmed bernoulli bandit problems. In 2008 Seventh International Conference on Machine Learning and Applications, pp. 23–30. Cited by: §I, §IIIA2.
 [13] Headline testing page at optimizely. External Links: Link Cited by: §I.
 [14] (2017) An efficient bandit algorithm for realtime multivariate optimization. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1813–1821. Cited by: §I, §I, §IIIA2, §V.
 [15] (2012) Thompson sampling: an asymptotically optimal finitetime analysis. In International Conference on Algorithmic Learning Theory, pp. 199–213. Cited by: §I, §IIIA2.
 [16] (2015) Efficient thompson sampling for online matrixfactorization recommendation. In Advances in neural information processing systems, pp. 1297–1305. Cited by: §IIIA2.
 [17] (2016) Predicting the shape and peak time of news article views. In Big Data (Big Data), 2016 IEEE International Conference on, pp. 2400–2409. Cited by: §I.
 [18] (2014) Algorithms for multiarmed bandit problems. arXiv preprint arXiv:1402.6028. Cited by: §I.
 [19] (1985) Asymptotically efficient adaptive allocation rules. Advances in applied mathematics 6 (1), pp. 4–22. Cited by: §IIIA2.
 [20] (2012) Optimistic bayesian sampling in contextualbandit problems. Journal of Machine Learning Research 13 (Jun), pp. 2069–2106. Cited by: §I, §IIIA2.
 [21] (2017) Customer acquisition via display advertising using multiarmed bandit experiments. Marketing Science 36 (4), pp. 500–522. Cited by: §I, §IIIB2.
 [22] (2010) A modern bayesian look at the multiarmed bandit. Applied Stochastic Models in Business and Industry 26 (6), pp. 639–658. Cited by: §I.
 [23] (2014) Overview of content experiments: multiarmed bandit experiments. External Links: Link Cited by: §I.
 [24] (2015) Multiarmed bandit experiments in the online service economy. Applied Stochastic Models in Business and Industry 31 (1), pp. 37–45. Cited by: §I, §IIIA2, §IVA.
 [25] (1933) On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25 (3/4), pp. 285–294. Cited by: §I, §IIIA2.
Comments
There are no comments yet.