Revised Progressive-Hedging-Algorithm Based Two-layer Solution Scheme for Bayesian Reinforcement Learning

06/21/2019
by   Xin Huang, et al.
0

Stochastic control with both inherent random system noise and lack of knowledge on system parameters constitutes the core and fundamental topic in reinforcement learning (RL), especially under non-episodic situations where online learning is much more demanding. This challenge has been notably addressed in Bayesian RL recently where some approximation techniques have been developed to find suboptimal policies. While existing approaches mainly focus on approximating the value function, or on involving Thompson sampling, we propose a novel two-layer solution scheme in this paper to approximate the optimal policy directly, by combining the time-decomposition based dynamic programming (DP) at the lower layer and the scenario-decomposition based revised progressive hedging algorithm (PHA) at the upper layer, for a type of Bayesian RL problem. The key feature of our approach is to separate reducible system uncertainty from irreducible one at two different layers, thus decomposing and conquering. We demonstrate our solution framework more especially via the linear-quadratic-Gaussian problem with unknown gain, which, although seemingly simple, has been a notorious subject over more than half century in dual control.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/26/2019

Approximate Dynamic Programming For Linear Systems with State and Input Constraints

Enforcing state and input constraints during reinforcement learning (RL)...
research
11/14/2019

A Reduction from Reinforcement Learning to No-Regret Online Learning

We present a reduction from reinforcement learning (RL) to no-regret onl...
research
02/28/2020

Mixed Reinforcement Learning with Additive Stochastic Uncertainty

Reinforcement learning (RL) methods often rely on massive exploration da...
research
11/02/2020

Reinforcement Learning of Structured Control for Linear Systems with Unknown State Matrix

This paper delves into designing stabilizing feedback control gains for ...
research
04/04/2023

Online augmentation of learned grasp sequence policies for more adaptable and data-efficient in-hand manipulation

When using a tool, the grasps used for picking it up, reposing, and hold...
research
10/13/2015

Dual Control for Approximate Bayesian Reinforcement Learning

Control of non-episodic, finite-horizon dynamical systems with uncertain...
research
06/16/2020

Online Reinforcement Learning Control by Direct Heuristic Dynamic Programming: from Time-Driven to Event-Driven

In this paper time-driven learning refers to the machine learning method...

Please sign up or login with your details

Forgot password? Click here to reset