An Analysis of Frame-skipping in Reinforcement Learning

02/07/2021
by   Shivaram Kalyanakrishnan, et al.
0

In the practice of sequential decision making, agents are often designed to sense state at regular intervals of d time steps, d > 1, ignoring state information in between sensing steps. While it is clear that this practice can reduce sensing and compute costs, recent results indicate a further benefit. On many Atari console games, reinforcement learning (RL) algorithms deliver substantially better policies when run with d > 1 – in fact with d even as high as 180. In this paper, we investigate the role of the parameter d in RL; d is called the "frame-skip" parameter, since states in the Atari domain are images. For evaluating a fixed policy, we observe that under standard conditions, frame-skipping does not affect asymptotic consistency. Depending on other parameters, it can possibly even benefit learning. To use d > 1 in the control setting, one must first specify which d-step open-loop action sequences can be executed in between sensing steps. We focus on "action-repetition", the common restriction of this choice to d-length sequences of the same action. We define a task-dependent quantity called the "price of inertia", in terms of which we upper-bound the loss incurred by action-repetition. We show that this loss may be offset by the gain brought to learning by a smaller task horizon. Our analysis is supported by experiments on different tasks and learning algorithms.

READ FULL TEXT

page 5

page 8

research
09/13/2023

Investigating the Impact of Action Representations in Policy Gradient Algorithms

Reinforcement learning (RL) is a versatile framework for learning to sol...
research
05/26/2023

Reinforcement Learning with Simple Sequence Priors

Everything else being equal, simpler models should be preferred over mor...
research
07/05/2021

The Least Restriction for Offline Reinforcement Learning

Many practical applications of reinforcement learning (RL) constrain the...
research
05/17/2016

Dynamic Frame skip Deep Q Network

Deep Reinforcement Learning methods have achieved state of the art perfo...
research
01/01/2019

Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds

Strong worst-case performance bounds for episodic reinforcement learning...
research
05/03/2018

Open Loop Execution of Tree-Search Algorithms

In the context of tree-search stochastic planning algorithms where a gen...

Please sign up or login with your details

Forgot password? Click here to reset