Combining information-seeking exploration and reward maximization: Unified inference on continuous state and action spaces under partial observability

12/15/2022
by   Parvin Malekzadeh, et al.
0

Reinforcement learning (RL) gained considerable attention by creating decision-making agents that maximize rewards received from fully observable environments. However, many real-world problems are partially or noisily observable by nature, where agents do not receive the true and complete state of the environment. Such problems are formulated as partially observable Markov decision processes (POMDPs). Some studies applied RL to POMDPs by recalling previous decisions and observations or inferring the true state of the environment from received observations. Nevertheless, aggregating observations and decisions over time is impractical for environments with high-dimensional continuous state and action spaces. Moreover, so-called inference-based RL approaches require large number of samples to perform well since agents eschew uncertainty in the inferred state for the decision-making. Active inference is a framework that is naturally formulated in POMDPs and directs agents to select decisions by minimising expected free energy (EFE). This supplies reward-maximising (exploitative) behaviour in RL, with an information-seeking (exploratory) behaviour. Despite this exploratory behaviour of active inference, its usage is limited to discrete state and action spaces due to the computational difficulty of the EFE. We propose a unified principle for joint information-seeking and reward maximization that clarifies a theoretical connection between active inference and RL, unifies active inference and RL, and overcomes their aforementioned limitations. Our findings are supported by strong theoretical analysis. The proposed framework's superior exploration property is also validated by experimental results on partial observable tasks with high-dimensional continuous state and action spaces. Moreover, the results show that our model solves reward-free problems, making task reward design optional.

READ FULL TEXT
research
09/17/2020

The relationship between dynamic programming and active inference: the discrete, finite-horizon case

Active inference is a normative framework for generating behaviour based...
research
02/28/2020

Reinforcement Learning through Active Inference

The central tenet of reinforcement learning (RL) is that agents seek to ...
research
11/24/2019

Scaling active inference

In reinforcement learning (RL), agents often operate in partially observ...
research
05/31/2016

VIME: Variational Information Maximizing Exploration

Scalable and effective exploration remains a key challenge in reinforcem...
research
06/04/2021

Online reinforcement learning with sparse rewards through an active inference capsule

Intelligent agents must pursue their goals in complex environments with ...
research
07/26/2019

A Unified Bellman Optimality Principle Combining Reward Maximization and Empowerment

Empowerment is an information-theoretic method that can be used to intri...
research
03/11/2021

Understanding the origin of information-seeking exploration in probabilistic objectives for control

The exploration-exploitation trade-off is central to the description of ...

Please sign up or login with your details

Forgot password? Click here to reset