Improving customer retention is a very important task especially for businesses that generate revenues from long-term relationship with customers (i.e., subscription-based services). Mobile gaming, which is fastest-growing industry(Tom, 2019), is another important business that needs higher retention rates. Most of the titles are free to play and the developer generate revenue from “in-game purchasing” of items and characters of the game. The developer wants to keep users engaged enough to buy items in the game and keep them playing for longer time.
With the real-time online advertising technology (RTB), it is very easy to reach out to the existing users and persuade them to return to the service with advertising. Such marketing strategy is called “Retargeting”. The retargeting strategy in this context is to reach out to the users who stop playing and encourage them to return to the game. To win-back the churned users, the ads often emphasizes the new features of the game including events, items, and characters. The ads are also required to be attractive enough to obtain users’ attention in a very short time.
In this scenario, a DSP (Demand-Side Platform) for retargeting needs to choose the most effective ad creative for each user. Since the developers continuously add new features to the game, new ad creatives are added every day, every hour. There is no time to test the effectiveness of each ad creative beforehand. The contextual bandit algorithm is best suited to such situation, which selects “best” ad creative using side information while balancing exploration and exploitation.
One concern for the algorithm is the negative effect caused by over-exposure of the ads. The DSP for retargeting keeps buying the ad views of specific users and show the ad repeatedly until the users return to the game. As a result, users who are not returning to the game possibly keep watching same or similar ad creatives many times. In marketing science literature, it is widely recognized that repetitive advertising has two phases: in the early stage of repetitive ad exposure, the audience increase the “awareness” of the product. However, after number of repetition, they get tired and start feeling “fatigue” at some point. The two phases are called “wear-in” and “wear-out” in the literature. The degree and timing of these effects are sensitive to the characteristics of advertising and hard to find the optimal number of repetition for a specific product (Pechman and Stewart, 1988; Schmidt and Eisend, 2015).
Based on the theory, the DSP needs to consider the psychological status (awareness/fatigue) of the audience when choosing ad creative. The DSP is expected to show the most attractive ad creative. However, it should switch to more fresh creative to mitigate fatigue when the user gets tired. Formally, this problem is considered bandit problem with non-stationary reward because the reward is changed according to the past actions of the system. Existing literature tackles non-stationary bandit problem by estimating the change in the reward(Komiyama and Qin, 2014; Levine et al., 2017). We extend their approach by explicitly defining awareness/fatigue of each user at each time and estimate the change in the reward due to the users’ psychological status. Using awareness/fatigue parameters, we set up new contextual bandit-based ad creative selection algorithm and deploy it in the real-world production. A one week experiment in the online environment shows that our proposed algorithm out-performs baseline contextual bandit and random algorithm in most of the KPIs. Moreover, we find that the wear-in and wear-out effects are not the same for clicks and conversions; wear-in seems to exist for conversions but not for clicks.
Our contributions are three fold. First we proposed a new ad creative selection algorithm that explicitly consider users’ psychological status and proved that the algorithm is implementable in the harsh online advertising environment and is indeed effective. Second, we combine the theory of wear-in and wear-out and bandit algorithm. Third, we find the difference in wear-in/wear-out effects of ad between on clicks and on conversions.
The rest of the paper is as follows. In section 2
, we review related literature in both marketing science and machine learning. Section3 explains our task and existing system. Section 4 clarifies the concept of awareness and fatigue and defines its numerical measure. The new algorithm is proposed in section 5. Section 6 describes the settings of the experiments and then shows the results. Section 7 examines the results. Section 8 concludes.
2. Related Works
We review related works in both marketing science and machine learning.
2.1. The Effect of Repetitive Exposure of Ads
The relationship between the number of repetitions and the effectiveness of ad have been actively studied in marketing science literature for a long time. Extensive reviews and meta-analysis point out that the optimal number of repeated ad exposures depends many conditions and thus is hard to find(Pechman and Stewart, 1988; Schmidt and Eisend, 2015). In the context of online advertising, an analysis of large scale natural experiments finds that wear-out occurs in a heterogeneous manner (Lewis, 2015). A lab experiment reveals that ad creatives with high quality is immune to wear-out effects (Chen et al., 2016).
2.2. Bandit Algorithms in Online Ad/Recommendation System
The bandit algorithms have been already applied to variety of advertising/recommendation system (Li et al., 2010; Chapelle and Li, 2011). In particular, an application to ad format selection is closest to ours(Tang et al., 2013). Our research is also closely related to the non-stationary bandit problem, in which the reward is changing with time and the optimal arm is not always the same. Among various algorithms, directly estimating the decaying factor of rewards are proved to be effective(Komiyama and Qin, 2014; Levine et al., 2017). We basically follow their strategy. We also consider continuous creation and deletion of arms, which is considered in (Chakrabarti et al., 2009). While existing literature focus on relatively simple mechanism of reward changing (e.g. decrease with time or pulls), we introduce a numerical measure of ad awareness/fatigue of users and tackle the problem. Several studies explicitly consider user abandoning due to the users’ psychological status in contextual bandit(Lei et al., 2017; Cao and Sun, 2019). In these studies, profitable interventions are considered to have a negative effect on users psychological status and possibly cause user churn. In this scenario, a good contextual bandit algorithm needs to save the frequency of uncomfortable treatments to keep the users engaged. The proposed algorithms solve a constrained optimization problem to find the best policy (Lei et al., 2017) or calculate the optimal sequence of interventions (Cao and Sun, 2019) at each time of intervention. However, both studies are limited to the simulation and not applied to real production environment.
3. Ad Creative Selection System
In this section we define our problem and explain current deployed system.
3.1. Ad Creative Selection
A DSP provided by CyberAgent, inc., a Japan-based major online advertising company, is employed by many mobile game developers. The DSP targets the churned game users who have stopped playing. The DSP intensively buys ad views of these users staying various web/apps through Real-Time Bidding (RTB) and show them the ads to promote the game. To maximize the effect of ad, the DSP prepares a wide range of “ad creatives” (a combination of image and text, a sample is shown in Fig.1) for each campaigns and chooses best ones for each impression.
3.2. Contextual Bandit-based a Ad Selection
Since new ad creatives are continuously added and deployed, it is hard to find the best creative in advance. The DSP, therefore, employs contextual bandit-based ad creative selection algorithm to balance exploration and exploitation. In particular, Let be a binary outcome variable taking one if click or conversion occurs at th impression and zero otherwise, be a binary
-dimensional vector of contextual features, andbe an choice of ad creative. Our contextual feature vector includes information from bid request (e.g. device, site/app and SSP), and hour of the day. Then our CTR predictor111When estimating the parameters we use click as rewards as conversion is a rare event which makes estimation hard. estimates for action (i.e. choice of ad creative) and feature
where is a -dimensional weight vector that is invariant to action, and is a -dimensional action-specific weight vector. Both vectors include bias term. represents the sigmoid (logistic) function.
Following (Chapelle and Li, 2011)
, it adapts the Thompson sampling algorithm which introduces randomization for exploration. That is, for each, action-specific weight vector and bias are sampled from distribution where scaler controls the degree of exploitation. Action-invariant vector is set to . Then the system chooses the best ad creative from candidates set , namely, . The action space is different across . is determined based on the characteristics of user and ad-slot by the system. Moreover, the set of available ad creative changes over time. As explained in (Chakrabarti et al., 2009)
, ads are continuously added and deleted from circulation. The algorithm chooses newly-added ad creative with a probability offor exploration. The whole procedure is described in Algorithm 1.
3.3. Parameter Estimation
Again following (Chapelle and Li, 2011)
, we derive posterior Gaussian distribution of the weight vector from L2-regularized logistic regression or equivalently MAP estimation. The inverse of Hessian of objective function (negative log likelihood plus L2-penalty) becomes variance-covariance matrix for the weight vector while point estimates become mean of the distribution(Bishop, 2006). Due to the intractable size of our Hessian matrix, we only use diagonal elements of Hessian.
To estimate both common weight parameters and action-specific , at once, we first expand feature vector by interacting a binary action vector whose th element takes one when corresponding action is taken while the others take zeros. That is, . Finally, by combining action-invariant component , we have . Each record is transformed to a -dimensional binary vector by hashing-trick(Weinberger et al., 2009).
We apply stochastic gradient decent (SGDCliassifier in skleran package of Python) to obtain an estimate of with regularization parameter . The parameter vector is updated daily basis using a batch of data from the previous day.
4. Awareness and Fatigue
The DSP keeps buying their ad views until the user start playing the game again. As the DSP can buy unlimited amount of ad views from virtually anywhere in the internet through RTB. As consumers’ screen time gets longer these days, a substantial share of targeted users see the same ads numerous times. Even with the frequency cap of 10 times an hour, user can see ads up to 240 times in past 24 hours. With the above bandit-based algorithm, the DSP tends to choose the same ad creatives because the feature vector does not change in the very short time. As discussed in Section 1, the effects of ad changes with the number of ad exposures. There should be some point that the optimal number of ad exposures for same ad.
To illustrate our motivation, Fig. 2 shows the hypothetical relationship between ad awareness/fatigue and the effect. Suppose there is a churned game user. When he watches an ad creative of the game for the first time, his “awareness” of the game increases and the probability of click/conversion (start gaming) will increase. The probability of conversion is expected to be increasing with his awareness of the game. Hence, the repeated exposure to the similar ad is good. However, after a number of repeated exposures, his awareness peaks out and he will get tired of seeing the same ad creative. Then, his fatigue start increasing, which leads to decreased probability of conversion. If the DSP finds that the user gets tired of similar ad creative, it should show totally different ad creative to obtain his attention. Luckily the DSP usually has a wide range of ad creative for each ad campaign, it is easy to show the totally different one from the previously-shown. The problem is when it should change the ad creative.
Our strategy is to estimate the change of the reward (conversion rates/click-through rate) due to the awareness/fatigue directly. First we define our measure of awareness/fatigue. Awareness/fatigue is a function of the current ad creative on user’s screen and the user’s memory about past exposures to ad creatives of the same advertising campaign (set of ad creatives for a specific game). For convenience, we call awareness/fatigue just “fatigue” since the main concern is the negative effects from excessive ad exposures.
4.1. The measure of fatigue
Formally, let be a vector representation of histories of ad exposures by user at time . Each element in represents the number of exposures to each creative. For example, suppose user has been exposed thrice (three times) to ad Creative 1, twice to Creative 2 and never to the others. Then . We record the histories for each user-advertiser pair.
The expected level of fatigue to ad creative is calculated as a weighted sum of history: , where the weight vector measure how similar and the other creatives are. That is,
4.2. Calculation of Similarity
Each similarity score
is calculated as a weighted average of text similarity and image similarity. The text similarity is calculated by the cosine similarity of the bag of words (BoW) representations of the description texts. We extracted the words from the texts by using Mecab(Matsumoto, ), which is the most commonly used morphological analyzer for the Japanese language, with NEologd dictionary(Sato, 2017), which is a frequently updated Japanese dictionary containing up-to-date words. We used the gensim library(Řehůřek and Sojka, 2010) for the BoW calculation.
The image similarity is also calculated by the cosine similarity of the vector representation of the images. The vector representation is extracted from each image by using pre-trained MobileNetV2(Sandler, 2018)
implemented in Keras(Chollet and others., 2015)
. Here, we use the output from the last pooling layer as the representation. We put three times higher weight to the text similarity because the text similarity (i.e., BoW representation) is much easier to interpret than the image similarity (i.e., weight of the neural network).
The online advertising is an extremely harsh environment. To deliver thousands of ad creatives to numerous users within unnoticeable duration, the system can spare only a few milliseconds to calculate fatigues for all the candidates of each impression. To save time, we first calculate the similarity scores for each pair of creatives in advance and store in the server. Another problem is the massive data volume. Since we cannot predict how many users see how many creatives each day in advance, the data of ad exposure history could be very large. We record the histories for each user-advertiser pair. Owing to the extremely high frequency of impressions, recording all the ad creative exposures for all the users could lead to an explosion of the data volume, which needs to be capped at some point. In addition, we need to fill the gap between the logged data and human perception. In the system, multiple impressions for the same ad creative within a short time period are often recorded within a short time period; however, the human users are not aware of it. Therefore, the system records the first impression of a user to a specific ad creative every minute. That is, if a user watches the same ad creative more than once within a minute, it is counted as one impression.222This data reduction affect less than 40% of the users. To further reduce the data volume and consider forgetting, the histories more than 24 hours before are deleted from the database.
5. Fatigue Bandit
Our proposed algorithm is a simple extension of the contextual bandit-based ad creative selection which incorporates fatigue parameters in its reward predictor.
To allow for the inverse-U shape relationship described in Fig.2, is included in the function as a quadratic form. For convenience, we call the new algorithm Fatigue Bandit or FB hereinafter. FB chooses an action (ad creative) along with the expected CTRs calculated in (3). The whole procedure is Algorithm 2.
The only difference from equation (1) is the inclusion of . Intuitively, with the fatigue parameter the predictor underestimates CTR when the level of fatigue is high (the chosen creative or similar ad creatives have already been seen by the user a lot).
The evaluation of non-stationary bandit algorithm is very hard. Since the reward is changed by the past intervention by the system, offline evaluation is not very meaningfull. Hence we run an online experiment to examine the effectiveness of the proposed algorithm. In this section, we first describe the setting of the experiment and then show the results.
Three advertising campaigns for three different mobile game titles are selected for the experiment according to various conditions 333We used four campaigns but it turns out that one of them has only one creative.. A part of impressions for these campaigns are used for the experiment. The sample size of the training data reaches a maximum of one million a day with the hundreds of thousand of features. To deal with the size and the sparsity of the training data, we employ negative down-sampling and the hashing trick. We set sampling rate for negative data to 5% and limit the length of features to for the system without fatigue parameters (CB) and for FB.
These hyper-parameters are tuned using the replay method (Li et al., 2010) for CB. FB shares all the above setting except for the inclusion of the fatigue parameter.
CB (Algorithm 1) and FB (Algorithm 2), are tested against the result of randomly-chosen ad creative (Random algorithm) in a standard A/B testing. That is, they share the impressions equally based on the users’ id and update their parameters based on the logs they generate. The experiment ran for one week.
6.2. Main Results
Table 1 shows the overall results of the experiments. Each KPI is normalized with respect to the result of random algorithm (i.e., KPI for FB and CB is divided by the corresponding KPI of Randam). Fatigue bandit out-performed both contextual bandit and random algorithm in CTR (click-through rate), CVR (conversion rate, the share of post-click conversions in all impressions), and Post-impression CVR including non-click conversions. The results clearly show that the proposed algorithm successfully increased both clicks and conversions. Contextual bandit algorithm outperforms FB for post-click CVR. That is, while CB collects less clicks than FB, CB gets conversions more likely than FB once it gets click. But the difference is not statistically significant.
|Alg.||Impressions||CTR||CVR||Post-Click CVR||Post-Imp CVR|
6.3. Heterogeneity in campaigns
To further examine the results, Table 2 shows the CTRs for each campaign. The detailed results reveal that baseline contextual bandit (CB) is not always superior to random. As for conversion rates, the performance of the contextual bandit is poor in campaign A. On the other hand, our fatigue bandit stably outperforms contextual bandit and Random. The instability of contextual bandit possibly come from the data volume as campaign A has least size of data. With combination of small size of training data and sparcity, it is hard to learn data correctly. In real-world production environment, the data volume cannot be controlled by experimenters. Still, fatigue bandit algorithm successfully learned the data.
To see how the fatigue bandit worked in the experiment, Fig. 3 shows the relation between the number of ad exposures and the the level of fatigue. Apparently, users do not feel fatigue with random algorithm (dashed line). Random algorithm just distribute ad creatives evenly regardless of their performance while two bandit algorithms choose “better” ad creative to maximize clicks and conversions. We see the clear difference between Fatigue bandit (dotted line) and baseline contextual bandit (solid line). Fatigue bandit tries to save the accumulation of fatigue by changing ad creatives when the level of fatigue is high. The fatigue bandit indeed works as we expected.
Next, we check the association between the level of fatigue and the KPIs (CTR and CVR). Fig4 shows the local linear regressions of KPIs on fatigue. To eliminate bias generated by bandit algorithms we use logs from random algorithm only. The two panels shows the difference between CTR and CVR. While CTR constantly decreases with the fatigue, CVR shows more complex relationship with fatigue. In other words, we see only wear-out effects of ad exposures on CTR but find both wear-in and wear-out effects for CVR. Possible explanation is that while audience tend to click on new and fresh ad creative for simple curiosity, they do not login game until they understand the message brought by ad creative after repetitive exposures. Hence, it is possible that the DSP should change the algorithm for different KPIs. This could explain why CB is better than FB in terms of post-click CVR. CB tends to show the same ad creatives consistently and gets more chance to get conversions once it gets clicks. But overall performance (CTR and CVR) is in favor of FB.
In this paper, we proposed a new contextual bandit-based ad creative selection algorithm, which explicitly considers the wear-in and wear-out effects caused by the repetitive exposures to the ad. We introduced a numerical measure of the level of awareness/fatigue and set up easy-to-implement and efficient algorithm that changes ad creative according to the level of fatigue of users. The proposed algorithm was applied to a running DSP in a production environment and experimentally deployed for one week. The results show the superiority of the proposed algorithm over the baseline contextual bandit algorithm.
Acknowledgements.The authors thank the Dynalyst team of CyberAgent, especially Hidetoshi Kawase for providing immense support to perform the experiments, and Kazuki Taniguchi, Masahiro Nomura, Kota Yamaguchi and Mayu Otani (AILab), Takanori Maehara (RIKEN) and Jumpei Komiyama (U Tokyo) for helpful advice. They also thank anonymous referees for previous version of the manuscript.
- Pattern recognition and machine learning. springer. Cited by: §3.3.
Dynamic Learning of Sequential Choice Bandit Problem under Marketing Fatigue.
Proceedings of the AAAI Conference on Artificial Intelligence33, pp. 3264–3271 (en). External Links: Cited by: §2.2.
- Mortal multi-armed bandits. In Advances in Neural Information Processing Systems 21, D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou (Eds.), pp. 273–280. External Links: Cited by: §2.2, §3.2.
- An Empirical Evaluation of Thompson Sampling. In Advances in neural information processing systems, Granada, Spain, pp. 2249–2257 (en). Cited by: §2.2, §3.2, §3.3.
- The effects of creativity on advertising wear-in and wear-out. Journal of the Academy of Marketing Science 44 (3), pp. 334–349 (en). External Links: Cited by: §2.1.
- Keras. External Links: Cited by: §4.2.
- Time-Decaying Bandits for Non-stationary Systems. In Web and Internet Economics, T. Liu, Q. Qi, and Y. Ye (Eds.), Vol. 8877, pp. 460–466 (en). External Links: Cited by: §1, §2.2.
- An Actor-Critic Contextual Bandit Algorithm for Personalized Mobile Health Interventions. arXiv:1706.09090 [cs, stat] (en). Note: arXiv: 1706.09090 External Links: Cited by: §2.2.
- Rotting Bandits. Advances in Neural Information Processing Systems 30. External Links: Cited by: §1, §2.2.
- Worn-Out or Just Getting Started? The Impact of Frequency in Online Display Advertising. Boston, Massachusetts, USA (en). Cited by: §2.1.
- A Contextual-Bandit Approach to Personalized News Article Recommendation. Proceedings of the 19th international conference on World wide web - WWW ’10, pp. 661 (en). Note: arXiv: 1003.0146 External Links: Cited by: §2.2, §6.1.
-  Japanese morphological analysis system chasen version 2.0 manual.. Cited by: §4.2.
- Advertising Repetition: A Critical Review of Wearin and Wearout.. Current issues and research in advertising 11 (1-2), pp. 285–329. Cited by: §1, §2.1.
- Software framework for topic modelling with large corpora. External Links: Cited by: §4.2.
- MobileNetV2: inverted residuals and linear bottlenecks.. arXiv:1801.04381 [cs]. Cited by: §4.2.
- Implementation of a word segmentation dictionary called mecab-ipadic-neologd and study on how to use it effectively for information retrieval (in japanese).. Cited by: §4.2.
- Advertising Repetition: A Meta-Analysis on Effective Frequency in Advertising. Journal of Advertising 44 (4), pp. 415–428 (en). External Links: Cited by: §1, §2.1.
- Automatic ad format selection via contextual bandits. In Proceedings of the 22nd ACM international conference on Conference on information & knowledge management - CIKM ’13, San Francisco, California, USA, pp. 1587–1594 (en). External Links: Cited by: §2.2.
- The Global Games Market Will Generate $152.1 Billion in 2019 as the U.S. Overtakes China as the Biggest Market. Cited by: §1.
- Feature hashing for large scale multitask learning. In Proceedings of the 26th Annual International Conference on Machine Learning - ICML ’09, Montreal, Quebec, Canada, pp. 1–8 (en). External Links: Cited by: §3.3.