Regret Minimization for Partially Observable Deep Reinforcement Learning

10/31/2017
by   Peter H. Jin, et al.
0

Deep reinforcement learning algorithms that estimate state and state-action value functions have been shown to be effective in a variety of challenging domains, including learning control strategies from raw image pixels. However, algorithms that estimate state and state-action value functions typically assume a fully observed state and must compensate for partial or non-Markovian observations by using finite-length frame-history observations or recurrent networks. In this work, we propose a new deep reinforcement learning algorithm based on counterfactual regret minimization that iteratively updates an approximation to a cumulative clipped advantage function and is robust to partially observed state. We demonstrate that on several partially observed reinforcement learning tasks, this new class of algorithms can substantially outperform strong baseline methods: on Pong with single-frame observations, and on the challenging Doom (ViZDoom) and Minecraft (Malmö) first-person navigation benchmarks.

READ FULL TEXT
research
04/17/2018

On Improving Deep Reinforcement Learning for POMDPs

Deep Reinforcement Learning (RL) recently emerged as one of the most com...
research
06/02/2022

Deep Transformer Q-Networks for Partially Observable Reinforcement Learning

Real-world reinforcement learning tasks often involve some form of parti...
research
10/26/2021

The Difficulty of Passive Learning in Deep Reinforcement Learning

Learning to act from observational data without active environmental int...
research
02/26/2019

Can Meta-Interpretive Learning outperform Deep Reinforcement Learning of Evaluable Game strategies?

World-class human players have been outperformed in a number of complex ...
research
12/31/2021

SimSR: Simple Distance-based State Representation for Deep Reinforcement Learning

This work explores how to learn robust and generalizable state represent...
research
09/19/2021

Dual Behavior Regularized Reinforcement Learning

Reinforcement learning has been shown to perform a range of complex task...
research
03/28/2017

Inverse Reinforcement Learning from Incomplete Observation Data

Inverse reinforcement learning (IRL) aims to explain observed strategic ...

Please sign up or login with your details

Forgot password? Click here to reset