Learning the Linear Quadratic Regulator from Nonlinear Observations

10/08/2020
by   Zakaria Mhammedi, et al.
4

We introduce a new problem setting for continuous control called the LQR with Rich Observations, or RichLQR. In our setting, the environment is summarized by a low-dimensional continuous latent state with linear dynamics and quadratic costs, but the agent operates on high-dimensional, nonlinear observations such as images from a camera. To enable sample-efficient learning, we assume that the learner has access to a class of decoder functions (e.g., neural networks) that is flexible enough to capture the mapping from observations to latent states. We introduce a new algorithm, RichID, which learns a near-optimal policy for the RichLQR with sample complexity scaling only with the dimension of the latent state space and the capacity of the decoder function class. RichID is oracle-efficient and accesses the decoder class only through calls to a least-squares regression oracle. Our results constitute the first provable sample complexity guarantee for continuous control with an unknown nonlinearity in the system model and general function approximation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/27/2022

Provably Sample-Efficient RL with Side Information about Latent Dynamics

We study reinforcement learning (RL) in settings where observations are ...
research
12/30/2022

Can Direct Latent Model Learning Solve Linear Quadratic Gaussian Control?

We study the task of learning state representations from potentially hig...
research
06/29/2020

Extracting Latent State Representations with Linear Dynamics from Rich Observations

Recently, many reinforcement learning techniques were shown to have prov...
research
06/09/2022

Sample-Efficient Reinforcement Learning in the Presence of Exogenous Information

In real-world reinforcement learning applications the learner's observat...
research
08/17/2022

Nearly Optimal Latent State Decoding in Block MDPs

We investigate the problems of model estimation and reward-free learning...
research
06/14/2019

Provably Efficient Q-learning with Function Approximation via Distribution Shift Error Checking Oracle

Q-learning with function approximation is one of the most popular method...
research
10/17/2021

Provable RL with Exogenous Distractors via Multistep Inverse Dynamics

Many real-world applications of reinforcement learning (RL) require the ...

Please sign up or login with your details

Forgot password? Click here to reset