Marginalized Importance Sampling for Off-Environment Policy Evaluation

09/04/2023
by   Pulkit Katdare, et al.
0

Reinforcement Learning (RL) methods are typically sample-inefficient, making it challenging to train and deploy RL-policies in real world robots. Even a robust policy trained in simulation, requires a real-world deployment to assess their performance. This paper proposes a new approach to evaluate the real-world performance of agent policies without deploying them in the real world. The proposed approach incorporates a simulator along with real-world offline data to evaluate the performance of any policy using the framework of Marginalized Importance Sampling (MIS). Existing MIS methods face two challenges: (1) large density ratios that deviate from a reasonable range and (2) indirect supervision, where the ratio needs to be inferred indirectly, thus exacerbating estimation error. Our approach addresses these challenges by introducing the target policy's occupancy in the simulator as an intermediate variable and learning the density ratio as the product of two terms that can be learned separately. The first term is learned with direct supervision and the second term has a small magnitude, thus making it easier to run. We analyze the sample complexity as well as error propagation of our two step-procedure. Furthermore, we empirically evaluate our approach on Sim2Sim environments such as Cartpole, Reacher and Half-Cheetah. Our results show that our method generalizes well across a variety of Sim2Sim gap, target policies and offline data collection policies. We also demonstrate the performance of our algorithm on a Sim2Real task of validating the performance of a 7 DOF robotic arm using offline data along with a gazebo based arm simulator.

READ FULL TEXT
research
12/21/2021

Off Environment Evaluation Using Convex Risk Minimization

Applying reinforcement learning (RL) methods on robots typically involve...
research
12/31/2021

Importance of Empirical Sample Complexity Analysis for Offline Reinforcement Learning

We hypothesize that empirically studying the sample complexity of offlin...
research
06/05/2020

Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization

Most reinforcement learning (RL) algorithms assume online access to the ...
research
08/10/2022

Robust Reinforcement Learning using Offline Data

The goal of robust reinforcement learning (RL) is to learn a policy that...
research
06/04/2019

Off-Policy Evaluation via Off-Policy Classification

In this work, we consider the problem of model selection for deep reinfo...
research
10/05/2021

OTTR: Off-Road Trajectory Tracking using Reinforcement Learning

In this work, we present a novel Reinforcement Learning (RL) algorithm f...
research
07/01/2022

Offline Policy Optimization with Eligible Actions

Offline policy optimization could have a large impact on many real-world...

Please sign up or login with your details

Forgot password? Click here to reset