Multifidelity Reinforcement Learning with Control Variates

by   Sami Khairy, et al.

In many computational science and engineering applications, the output of a system of interest corresponding to a given input can be queried at different levels of fidelity with different costs. Typically, low-fidelity data is cheap and abundant, while high-fidelity data is expensive and scarce. In this work we study the reinforcement learning (RL) problem in the presence of multiple environments with different levels of fidelity for a given control task. We focus on improving the RL agent's performance with multifidelity data. Specifically, a multifidelity estimator that exploits the cross-correlations between the low- and high-fidelity returns is proposed to reduce the variance in the estimation of the state-action value function. The proposed estimator, which is based on the method of control variates, is used to design a multifidelity Monte Carlo RL (MFMCRL) algorithm that improves the learning of the agent in the high-fidelity environment. The impacts of variance reduction on policy evaluation and policy improvement are theoretically analyzed by using probability bounds. Our theoretical analysis and numerical experiments demonstrate that for a finite budget of high-fidelity data samples, our proposed MFMCRL agent attains superior performance compared with that of a standard RL agent that uses only the high-fidelity environment data for learning the optimal policy.


page 1

page 2

page 3

page 4


Multifidelity Deep Operator Networks

Operator learning for complex nonlinear operators is increasingly common...

One-Shot High-Fidelity Imitation: Training Large-Scale Deep Nets with RL

Humans are experts at high-fidelity imitation -- closely mimicking a dem...

Structural Properties of Optimal Fidelity Selection Policies for Human-in-the-loop Queues

We study optimal fidelity selection for a human operator servicing a que...

Multi-Fidelity Reinforcement Learning with Gaussian Processes

This paper studies the problem of Reinforcement Learning (RL) using as f...

Minimax Error of Interpolation and Optimal Design of Experiments for Variable Fidelity Data

Engineering problems often involve data sources of variable fidelity wit...

Learning to Reach, Swim, Walk and Fly in One Trial: Data-Driven Control with Scarce Data and Side Information

We develop a learning-based control algorithm for unknown dynamical syst...

High-fidelity Interpretable Inverse Rig: An Accurate and Sparse Solution Optimizing the Quartic Blendshape Model

We propose a method to fit arbitrarily accurate blendshape rig models by...

Please sign up or login with your details

Forgot password? Click here to reset