Smoothed Dual Embedding Control

12/29/2017
by   Bo Dai, et al.
0

We revisit the Bellman optimality equation with Nesterov's smoothing technique and provide a unique saddle-point optimization perspective of the policy optimization problem in reinforcement learning based on Fenchel duality. A new reinforcement learning algorithm, called Smoothed Dual Embedding Control or SDEC, is derived to solve the saddle-point reformulation with arbitrary learnable function approximator. The algorithm bypasses the policy evaluation step in the policy optimization from a principled scheme and is extensible to integrate with multi-step bootstrapping and eligibility traces. We provide a PAC-learning bound on the number of samples needed from one single off-policy sample path, and also characterize the convergence of the algorithm. Finally, we show the algorithm compares favorably to the state-of-the-art baselines on several benchmark control problems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/31/2023

An Efficient Off-Policy Reinforcement Learning Algorithm for the Continuous-Time LQR Problem

In this paper, an off-policy reinforcement learning algorithm is designe...
research
05/13/2020

Adaptive Smoothing Path Integral Control

In Path Integral control problems a representation of an optimally contr...
research
05/20/2023

On First-Order Meta-Reinforcement Learning with Moreau Envelopes

Meta-Reinforcement Learning (MRL) is a promising framework for training ...
research
02/01/2023

Bridging Physics-Informed Neural Networks with Reinforcement Learning: Hamilton-Jacobi-Bellman Proximal Policy Optimization (HJBPPO)

This paper introduces the Hamilton-Jacobi-Bellman Proximal Policy Optimi...
research
10/21/2020

Logistic Q-Learning

We propose a new reinforcement learning algorithm derived from a regular...
research
01/07/2020

Reinforcement Learning via Fenchel-Rockafellar Duality

We review basic concepts of convex duality, focusing on the very general...
research
07/11/2019

Imitation-Projected Programmatic Reinforcement Learning

We study the problem of programmatic reinforcement learning, in which po...

Please sign up or login with your details

Forgot password? Click here to reset