The Wasserstein Believer: Learning Belief Updates for Partially Observable Environments through Reliable Latent Space Models

03/06/2023
by   Raphaël Avalos, et al.
0

Partially Observable Markov Decision Processes (POMDPs) are useful tools to model environments where the full state cannot be perceived by an agent. As such the agent needs to reason taking into account the past observations and actions. However, simply remembering the full history is generally intractable due to the exponential growth in the history space. Keeping a probability distribution that models the belief over what the true state is can be used as a sufficient statistic of the history, but its computation requires access to the model of the environment and is also intractable. Current state-of-the-art algorithms use Recurrent Neural Networks (RNNs) to compress the observation-action history aiming to learn a sufficient statistic, but they lack guarantees of success and can lead to suboptimal policies. To overcome this, we propose the Wasserstein-Belief-Updater (WBU), an RL algorithm that learns a latent model of the POMDP and an approximation of the belief update. Our approach comes with theoretical guarantees on the quality of our approximation ensuring that our outputted beliefs allow for learning the optimal value function.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/06/2022

Recurrent networks, hidden states and beliefs in partially observable environments

Reinforcement learning aims to learn optimal policies from interaction w...
research
06/09/2023

Approximate information state based convergence analysis of recurrent Q-learning

In spite of the large literature on reinforcement learning (RL) algorith...
research
11/06/2022

On learning history based policies for controlling Markov decision processes

Reinforcementlearning(RL)folkloresuggeststhathistory-basedfunctionapprox...
research
06/30/2011

Finding Approximate POMDP solutions Through Belief Compression

Standard value function approaches to finding policies for Partially Obs...
research
10/19/2020

Belief-Grounded Networks for Accelerated Robot Learning under Partial Observability

Many important robotics problems are partially observable in the sense t...
research
08/12/2020

Deceptive Kernel Function on Observations of Discrete POMDP

This paper studies the deception applied on agent in a partially observa...
research
02/27/2018

Human-in-the-Loop Synthesis for Partially Observable Markov Decision Processes

We study planning problems where autonomous agents operate inside enviro...

Please sign up or login with your details

Forgot password? Click here to reset