On the Role of Discount Factor in Offline Reinforcement Learning

06/07/2022
by   Hao Hu, et al.
0

Offline reinforcement learning (RL) enables effective learning from previously collected data without exploration, which shows great promise in real-world applications when exploration is expensive or even infeasible. The discount factor, γ, plays a vital role in improving online RL sample efficiency and estimation accuracy, but the role of the discount factor in offline RL is not well explored. This paper examines two distinct effects of γ in offline RL with theoretical analysis, namely the regularization effect and the pessimism effect. On the one hand, γ is a regulator to trade-off optimality with sample efficiency upon existing offline techniques. On the other hand, lower guidance γ can also be seen as a way of pessimism where we optimize the policy's performance in the worst possible models. We empirically verify the above theoretical observation with tabular MDPs and standard D4RL tasks. The results show that the discount factor plays an essential role in the performance of offline RL algorithms, both under small data regimes upon existing offline methods and in large data regimes without other conservatisms.

READ FULL TEXT
research
01/25/2022

MOORe: Model-based Offline-to-Online Reinforcement Learning

With the success of offline reinforcement learning (RL), offline trained...
research
10/19/2021

Offline Reinforcement Learning with Value-based Episodic Memory

Offline reinforcement learning (RL) shows promise of applying RL to real...
research
12/15/2022

Bridging the Gap Between Offline and Online Reinforcement Learning Evaluation Methodologies

Reinforcement learning (RL) has shown great promise with algorithms lear...
research
02/23/2021

DeepThermal: Combustion Optimization for Thermal Power Generating Units Using Offline Reinforcement Learning

Thermal power generation plays a dominant role in the world's electricit...
research
10/31/2022

Agent-Controller Representations: Principled Offline RL with Rich Exogenous Information

Learning to control an agent from data collected offline in a rich pixel...
research
06/18/2019

Gap-Increasing Policy Evaluation for Efficient and Noise-Tolerant Reinforcement Learning

In real-world applications of reinforcement learning (RL), noise from in...
research
06/01/2023

Improving Offline RL by Blending Heuristics

We propose Heuristic Blending (HUBL), a simple performance-improving tec...

Please sign up or login with your details

Forgot password? Click here to reset