Bayes-CPACE: PAC Optimal Exploration in Continuous Space Bayes-Adaptive Markov Decision Processes

10/06/2018
by   Gilwoo Lee, et al.
0

We present the first PAC optimal algorithm for Bayes-Adaptive Markov Decision Processes (BAMDPs) in continuous state and action spaces, to the best of our knowledge. The BAMDP framework elegantly addresses model uncertainty by incorporating Bayesian belief updates into long-term expected return. However, computing an exact optimal Bayesian policy is intractable. Our key insight is to compute a near-optimal value function by covering the continuous state-belief-action space with a finite set of representative samples and exploiting the Lipschitz continuity of the value function. We prove the near-optimality of our algorithm and analyze a number of schemes that boost the algorithm's efficiency. Finally, we empirically validate our approach on a number of discrete and continuous BAMDPs and show that the learned policy has consistently competitive performance against baseline approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/01/2018

Bayesian Policy Optimization for Model Uncertainty

Addressing uncertainty is critical for autonomous systems to robustly ad...
research
12/07/2022

Tight Performance Guarantees of Imitator Policies with Continuous Actions

Behavioral Cloning (BC) aims at learning a policy that mimics the behavi...
research
11/09/2018

Performance Guarantees for Homomorphisms Beyond Markov Decision Processes

Most real-world problems have huge state and/or action spaces. Therefore...
research
06/30/2011

Finding Approximate POMDP solutions Through Belief Compression

Standard value function approaches to finding policies for Partially Obs...
research
04/05/2016

Bounded Optimal Exploration in MDP

Within the framework of probably approximately correct Markov decision p...
research
03/03/2019

State-Continuity Approximation of Markov Decision Processes via Finite Element Analysis for Autonomous System Planning

Motion planning under uncertainty for an autonomous system can be formul...
research
02/07/2020

Bayesian Residual Policy Optimization: Scalable Bayesian Reinforcement Learning with Clairvoyant Experts

Informed and robust decision making in the face of uncertainty is critic...

Please sign up or login with your details

Forgot password? Click here to reset