On the Complexity of Adversarial Decision Making

06/27/2022
by   Dylan J. Foster, et al.
8

A central problem in online learning and decision making – from bandits to reinforcement learning – is to understand what modeling assumptions lead to sample-efficient learning guarantees. We consider a general adversarial decision making framework that encompasses (structured) bandit problems with adversarial rewards and reinforcement learning problems with adversarial dynamics. Our main result is to show – via new upper and lower bounds – that the Decision-Estimation Coefficient, a complexity measure introduced by Foster et al. in the stochastic counterpart to our setting, is necessary and sufficient to obtain low regret for adversarial decision making. However, compared to the stochastic setting, one must apply the Decision-Estimation Coefficient to the convex hull of the class of models (or, hypotheses) under consideration. This establishes that the price of accommodating adversarial rewards or dynamics is governed by the behavior of the model class under convexification, and recovers a number of existing results – both positive and negative. En route to obtaining these guarantees, we provide new structural results that connect the Decision-Estimation Coefficient to variants of other well-known complexity measures, including the Information Ratio of Russo and Van Roy and the Exploration-by-Optimization objective of Lattimore and György.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/19/2023

Tight Guarantees for Interactive Decision Making with the Decision-Estimation Coefficient

A foundational problem in reinforcement learning and interactive decisio...
research
05/01/2023

On the Complexity of Multi-Agent Decision Making: From Learning in Games to Partial Monitoring

A central problem in the theory of multi-agent reinforcement learning (M...
research
05/18/2021

Learning and Information in Stochastic Networks and Queues

We review the role of information and learning in the stability and opti...
research
04/24/2023

Instance-Optimality in Interactive Decision Making: Toward a Non-Asymptotic Theory

We consider the development of adaptive, instance-dependent algorithms f...
research
02/25/2020

CybORG: An Autonomous Cyber Operations Research Gym

Autonomous Cyber Operations (ACO) involves the consideration of blue tea...
research
09/23/2022

Unified Algorithms for RL with Decision-Estimation Coefficients: No-Regret, PAC, and Reward-Free Learning

Finding unified complexity measures and algorithms for sample-efficient ...
research
11/03/2020

Specialization in Hierarchical Learning Systems

Joining multiple decision-makers together is a powerful way to obtain mo...

Please sign up or login with your details

Forgot password? Click here to reset