Truncating Trajectories in Monte Carlo Reinforcement Learning

05/07/2023
by   Riccardo Poiani, et al.
0

In Reinforcement Learning (RL), an agent acts in an unknown environment to maximize the expected cumulative discounted sum of an external reward signal, i.e., the expected return. In practice, in many tasks of interest, such as policy optimization, the agent usually spends its interaction budget by collecting episodes of fixed length within a simulator (i.e., Monte Carlo simulation). However, given the discounted nature of the RL objective, this data collection strategy might not be the best option. Indeed, the rewards taken in early simulation steps weigh exponentially more than future rewards. Taking a cue from this intuition, in this paper, we design an a-priori budget allocation strategy that leads to the collection of trajectories of different lengths, i.e., truncated. The proposed approach provably minimizes the width of the confidence intervals around the empirical estimates of the expected return of a policy. After discussing the theoretical properties of our method, we make use of our trajectory truncation mechanism to extend Policy Optimization via Importance Sampling (POIS, Metelli et al., 2018) algorithm. Finally, we conduct a numerical comparison between our algorithm and POIS: the results are consistent with our theory and show that an appropriate truncation of the trajectories can succeed in improving performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/28/2019

Beyond Exponentially Discounted Sum: Automatic Learning of Return Function

In reinforcement learning, Return, which is the weighted accumulated fut...
research
06/22/2023

Harnessing Mixed Offline Reinforcement Learning Datasets via Trajectory Weighting

Most offline reinforcement learning (RL) algorithms return a target poli...
research
04/27/2023

One-Step Distributional Reinforcement Learning

Reinforcement learning (RL) allows an agent interacting sequentially wit...
research
05/29/2019

On the Generalization Gap in Reparameterizable Reinforcement Learning

Understanding generalization in reinforcement learning (RL) is a signifi...
research
05/26/2020

Active Measure Reinforcement Learning for Observation Cost Minimization

Standard reinforcement learning (RL) algorithms assume that the observat...
research
10/08/2020

Maximum Reward Formulation In Reinforcement Learning

Reinforcement learning (RL) algorithms typically deal with maximizing th...
research
09/15/2020

Soft policy optimization using dual-track advantage estimator

In reinforcement learning (RL), we always expect the agent to explore as...

Please sign up or login with your details

Forgot password? Click here to reset