Single-Trajectory Distributionally Robust Reinforcement Learning

01/27/2023
by   Zhipeng Liang, et al.
0

As a framework for sequential decision-making, Reinforcement Learning (RL) has been regarded as an essential component leading to Artificial General Intelligence (AGI). However, RL is often criticized for having the same training environment as the test one, which also hinders its application in the real world. To mitigate this problem, Distributionally Robust RL (DRRL) is proposed to improve the worst performance in a set of environments that may contain the unknown test environment. Due to the nonlinearity of the robustness goal, most of the previous work resort to the model-based approach, learning with either an empirical distribution learned from the data or a simulator that can be sampled infinitely, which limits their applications in simple dynamics environments. In contrast, we attempt to design a DRRL algorithm that can be trained along a single trajectory, i.e., no repeated sampling from a state. Based on the standard Q-learning, we propose distributionally robust Q-learning with the single trajectory (DRQ) and its average-reward variant named differential DRQ. We provide asymptotic convergence guarantees and experiments for both settings, demonstrating their superiority in the perturbed environments against the non-robust ones.

READ FULL TEXT
research
06/04/2021

Robustifying Reinforcement Learning Policies with ℒ_1 Adaptive Control

A reinforcement learning (RL) policy trained in a nominal environment co...
research
10/05/2021

OTTR: Off-Road Trajectory Tracking using Reinforcement Learning

In this work, we present a novel Reinforcement Learning (RL) algorithm f...
research
08/04/2022

Towards Augmented Microscopy with Reinforcement Learning-Enhanced Workflows

Here, we report a case study implementation of reinforcement learning (R...
research
04/02/2023

Risk-Sensitive and Robust Model-Based Reinforcement Learning and Planning

Many sequential decision-making problems that are currently automated, s...
research
07/11/2021

Out-of-Distribution Dynamics Detection: RL-Relevant Benchmarks and Results

We study the problem of out-of-distribution dynamics (OODD) detection, w...
research
03/23/2022

NovGrid: A Flexible Grid World for Evaluating Agent Response to Novelty

A robust body of reinforcement learning techniques have been developed t...
research
07/13/2022

Brick Tic-Tac-Toe: Exploring the Generalizability of AlphaZero to Novel Test Environments

Traditional reinforcement learning (RL) environments typically are the s...

Please sign up or login with your details

Forgot password? Click here to reset