Provable Model-based Nonlinear Bandit and Reinforcement Learning: Shelve Optimism, Embrace Virtual Curvature

02/08/2021
by   Kefan Dong, et al.
0

This paper studies model-based bandit and reinforcement learning (RL) with nonlinear function approximations. We propose to study convergence to approximate local maxima because we show that global convergence is statistically intractable even for one-layer neural net bandit with a deterministic reward. For both nonlinear bandit and RL, the paper presents a model-based algorithm, Virtual Ascent with Online Model Learner (ViOL), which provably converges to a local maximum with sample complexity that only depends on the sequential Rademacher complexity of the model class. Our results imply novel global or local regret bounds on several concrete settings such as linear bandit with finite or sparse model class, and two-layer neural net bandit. A key algorithmic insight is that optimism may lead to over-exploration even for two-layer neural net model class. On the other hand, for convergence to local maxima, it suffices to maximize the virtual return if the model can also reasonably predict the size of the gradient and Hessian of the real return.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/15/2021

PC-MLP: Model-based Reinforcement Learning with Policy Cover Guided Exploration

Model-based Reinforcement Learning (RL) is a popular learning paradigm d...
research
05/15/2023

Uniform-PAC Guarantees for Model-Based RL with Bounded Eluder Dimension

Recently, there has been remarkable progress in reinforcement learning (...
research
07/14/2021

Going Beyond Linear RL: Sample Efficient Neural Function Approximation

Deep Reinforcement Learning (RL) powered by neural net approximation of ...
research
07/09/2021

Optimal Gradient-based Algorithms for Non-concave Bandit Optimization

Bandit problems with linear or concave reward have been extensively stud...
research
04/14/2023

Bandit-Based Policy Invariant Explicit Shaping for Incorporating External Advice in Reinforcement Learning

A key challenge for a reinforcement learning (RL) agent is to incorporat...
research
02/14/2012

Fractional Moments on Bandit Problems

Reinforcement learning addresses the dilemma between exploration to find...

Please sign up or login with your details

Forgot password? Click here to reset