Unified Algorithms for RL with Decision-Estimation Coefficients: No-Regret, PAC, and Reward-Free Learning

09/23/2022
by   Fan Chen, et al.
1

Finding unified complexity measures and algorithms for sample-efficient learning is a central topic of research in reinforcement learning (RL). The Decision-Estimation Coefficient (DEC) is recently proposed by Foster et al. (2021) as a necessary and sufficient complexity measure for sample-efficient no-regret RL. This paper makes progress towards a unified theory for RL with the DEC framework. First, we propose two new DEC-type complexity measures: Explorative DEC (EDEC), and Reward-Free DEC (RFDEC). We show that they are necessary and sufficient for sample-efficient PAC learning and reward-free learning, thereby extending the original DEC which only captures no-regret learning. Next, we design new unified sample-efficient algorithms for all three learning goals. Our algorithms instantiate variants of the Estimation-To-Decisions (E2D) meta-algorithm with a strong and general model estimation subroutine. Even in the no-regret setting, our algorithm E2D-TA improves upon the algorithms of Foster et al. (2021) which require either bounding a variant of the DEC which may be prohibitively large, or designing problem-specific estimation subroutines. As applications, we recover existing and obtain new sample-efficient learning results for a wide range of tractable RL problems using essentially a single algorithm. We also generalize the DEC to give sample-efficient algorithms for all-policy model estimation, with applications for learning equilibria in Markov Games. Finally, as a connection, we re-analyze two existing optimistic model-based algorithms based on Posterior Sampling or Maximum Likelihood Estimation, showing that they enjoy similar regret bounds as E2D-TA under similar structural conditions as the DEC.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/29/2022

Partially Observable RL with B-Stability: Unified Structural Condition and Sharp Sample-Efficient Algorithms

Partial Observability – where agents can only observe partial informatio...
research
11/16/2020

Reward Biased Maximum Likelihood Estimation for Reinforcement Learning

The principle of Reward-Biased Maximum Likelihood Estimate Based Adaptiv...
research
11/25/2022

A Note on Model-Free Reinforcement Learning with the Decision-Estimation Coefficient

We consider the problem of interactive decision making, encompassing str...
research
01/19/2023

Tight Guarantees for Interactive Decision Making with the Decision-Estimation Coefficient

A foundational problem in reinforcement learning and interactive decisio...
research
02/01/2021

Bellman Eluder Dimension: New Rich Classes of RL Problems, and Sample-Efficient Algorithms

Finding the minimal structural assumptions that empower sample-efficient...
research
06/27/2022

On the Complexity of Adversarial Decision Making

A central problem in online learning and decision making – from bandits ...
research
06/09/2023

A Unified Model and Dimension for Interactive Estimation

We study an abstract framework for interactive learning called interacti...

Please sign up or login with your details

Forgot password? Click here to reset