Learning the Linear Quadratic Regulator from Nonlinear Observations

10/08/2020
by   Zakaria Mhammedi, et al.
4

We introduce a new problem setting for continuous control called the LQR with Rich Observations, or RichLQR. In our setting, the environment is summarized by a low-dimensional continuous latent state with linear dynamics and quadratic costs, but the agent operates on high-dimensional, nonlinear observations such as images from a camera. To enable sample-efficient learning, we assume that the learner has access to a class of decoder functions (e.g., neural networks) that is flexible enough to capture the mapping from observations to latent states. We introduce a new algorithm, RichID, which learns a near-optimal policy for the RichLQR with sample complexity scaling only with the dimension of the latent state space and the capacity of the decoder function class. RichID is oracle-efficient and accesses the decoder class only through calls to a least-squares regression oracle. Our results constitute the first provable sample complexity guarantee for continuous control with an unknown nonlinearity in the system model and general function approximation.

READ FULL TEXT

page 1

page 2

page 3

page 4

05/27/2022

Provably Sample-Efficient RL with Side Information about Latent Dynamics

We study reinforcement learning (RL) in settings where observations are ...
06/29/2020

Extracting Latent State Representations with Linear Dynamics from Rich Observations

Recently, many reinforcement learning techniques were shown to have prov...
06/09/2022

Sample-Efficient Reinforcement Learning in the Presence of Exogenous Information

In real-world reinforcement learning applications the learner's observat...
08/17/2022

Nearly Optimal Latent State Decoding in Block MDPs

We investigate the problems of model estimation and reward-free learning...
10/17/2021

Provable RL with Exogenous Distractors via Multistep Inverse Dynamics

Many real-world applications of reinforcement learning (RL) require the ...
08/08/2017

Demixing Structured Superposition Signals from Periodic and Aperiodic Nonlinear Observations

We consider the demixing problem of two (or more) structured high-dimens...
11/17/2020

Revisiting the Sample Complexity of Sparse Spectrum Approximation of Gaussian Processes

We introduce a new scalable approximation for Gaussian processes with pr...