Blockwise Sequential Model Learning for Partially Observable Reinforcement Learning

12/10/2021
by   Giseung Park, et al.
0

This paper proposes a new sequential model learning architecture to solve partially observable Markov decision problems. Rather than compressing sequential information at every timestep as in conventional recurrent neural network-based methods, the proposed architecture generates a latent variable in each data block with a length of multiple timesteps and passes the most relevant information to the next block for policy optimization. The proposed blockwise sequential model is implemented based on self-attention, making the model capable of detailed sequential learning in partial observable settings. The proposed model builds an additional learning network to efficiently implement gradient estimation by using self-normalized importance sampling, which does not require the complex blockwise input data reconstruction in the model learning. Numerical results show that the proposed method significantly outperforms previous methods in various partially observable environments.

READ FULL TEXT

page 7

page 11

research
09/09/2019

Off-Policy Evaluation in Partially Observable Environments

This work studies the problem of batch off-policy evaluation for Reinfor...
research
05/25/2016

A PAC RL Algorithm for Episodic POMDPs

Many interesting real world domains involve reinforcement learning (RL) ...
research
04/17/2018

On Improving Deep Reinforcement Learning for POMDPs

Deep Reinforcement Learning (RL) recently emerged as one of the most com...
research
03/11/2019

Deep Recurrent Q-Learning vs Deep Q-Learning on a simple Partially Observable Markov Decision Process with Minecraft

Deep Q-Learning has been successfully applied to a wide variety of tasks...
research
09/29/2022

Optimistic MLE – A Generic Model-based Algorithm for Partially Observable Sequential Decision Making

This paper introduces a simple efficient learning algorithms for general...
research
09/03/2020

Learning to Infer User Hidden States for Online Sequential Advertising

To drive purchase in online advertising, it is of the advertiser's great...
research
04/26/2021

Universal Off-Policy Evaluation

When faced with sequential decision-making problems, it is often useful ...

Please sign up or login with your details

Forgot password? Click here to reset