Learning to Bid in Contextual First Price Auctions

09/07/2021
by   Ashwinkumar Badanidiyuru, et al.
0

In this paper, we investigate the problem about how to bid in repeated contextual first price auctions. We consider a single bidder (learner) who repeatedly bids in the first price auctions: at each time t, the learner observes a context x_t∈ℝ^d and decides the bid based on historical information and x_t. We assume a structured linear model of the maximum bid of all the others m_t = α_0· x_t + z_t, where α_0∈ℝ^d is unknown to the learner and z_t is randomly sampled from a noise distribution ℱ with log-concave density function f. We consider both binary feedback (the learner can only observe whether she wins or not) and full information feedback (the learner can observe m_t) at the end of each time t. For binary feedback, when the noise distribution ℱ is known, we propose a bidding algorithm, by using maximum likelihood estimation (MLE) method to achieve at most O(√(log(d) T)) regret. Moreover, we generalize this algorithm to the setting with binary feedback and the noise distribution is unknown but belongs to a parametrized family of distributions. For the full information feedback with unknown noise distribution, we provide an algorithm that achieves regret at most O(√(dT)). Our approach combines an estimator for log-concave density functions and then MLE method to learn the noise distribution ℱ and linear weight α_0 simultaneously. We also provide a lower bound result such that any bidding policy in a broad class must achieve regret at least Ω(√(T)), even when the learner receives the full information feedback and ℱ is known.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/21/2023

Repeated Bilateral Trade Against a Smoothed Adversary

We study repeated bilateral trade where an adaptive σ-smooth adversary g...
research
03/03/2020

Contextual Search for General Hypothesis Classes

We study a general version of the problem of online learning under binar...
research
05/29/2021

On the Theory of Reinforcement Learning with Once-per-Episode Feedback

We study a theory of reinforcement learning (RL) in which the learner re...
research
05/19/2023

Tester-Learners for Halfspaces: Universal Algorithms

We give the first tester-learner for halfspaces that succeeds universall...
research
06/15/2022

Corruption-Robust Contextual Search through Density Updates

We study the problem of contextual search in the adversarial noise model...
research
02/20/2015

Contextual Semibandits via Supervised Learning Oracles

We study an online decision making problem where on each round a learner...
research
06/06/2019

Stochastic Bandits with Context Distributions

We introduce a novel stochastic contextual bandit model, where at each s...

Please sign up or login with your details

Forgot password? Click here to reset