Log In Sign Up

Adversarial Gradient Driven Exploration for Deep Click-Through Rate Prediction

by   Kailun Wu, et al.

Nowadays, data-driven deep neural models have already shown remarkable progress on Click-through Rate (CTR) prediction. Unfortunately, the effectiveness of such models may fail when there are insufficient data. To handle this issue, researchers often adopt exploration strategies to examine items based on the estimated reward, e.g., UCB or Thompson Sampling. In the context of Exploitation-and-Exploration for CTR prediction, recent studies have attempted to utilize the prediction uncertainty along with model prediction as the reward score. However, we argue that such an approach may make the final ranking score deviate from the original distribution, and thereby affect model performance in the online system. In this paper, we propose a novel exploration method called Adversarial Gradient Driven Exploration (AGE). Specifically, we propose a Pseudo-Exploration Module to simulate the gradient updating process, which can approximate the influence of the samples of to-be-explored items for the model. In addition, for better exploration efficiency, we propose an Dynamic Threshold Unit to eliminate the effects of those samples with low potential CTR. The effectiveness of our approach was demonstrated on an open-access academic dataset. Meanwhile, AGE has also been deployed in a real-world display advertising platform and all online metrics have been significantly improved.


page 1

page 2

page 3

page 4


Exploration in Online Advertising Systems with Deep Uncertainty-Aware Learning

Modern online advertising systems inevitably rely on personalization met...

Exploration with Model Uncertainty at Extreme Scale in Real-Time Bidding

In this work, we present a scalable and efficient system for exploring t...

Regularized Adversarial Sampling and Deep Time-aware Attention for Click-Through Rate Prediction

Improving the performance of click-through rate (CTR) prediction remains...

Action for Better Prediction

Good prediction is necessary for autonomous robotics to make informed de...

Contextual User Browsing Bandits for Large-Scale Online Mobile Recommendation

Online recommendation services recommend multiple commodities to users. ...

GuideBoot: Guided Bootstrap for Deep Contextual Bandits

The exploration/exploitation (E E) dilemma lies at the core of interac...

Never Forget: Balancing Exploration and Exploitation via Learning Optical Flow

Exploration bonus derived from the novelty of the states in an environme...