Goal-oriented inference of environment from redundant observations

05/08/2023
by   Kazuki Takahashi, et al.
0

The agent learns to organize decision behavior to achieve a behavioral goal, such as reward maximization, and reinforcement learning is often used for this optimization. Learning an optimal behavioral strategy is difficult under the uncertainty that events necessary for learning are only partially observable, called as Partially Observable Markov Decision Process (POMDP). However, the real-world environment also gives many events irrelevant to reward delivery and an optimal behavioral strategy. The conventional methods in POMDP, which attempt to infer transition rules among the entire observations, including irrelevant states, are ineffective in such an environment. Supposing Redundantly Observable Markov Decision Process (ROMDP), here we propose a method for goal-oriented reinforcement learning to efficiently learn state transition rules among reward-related "core states” from redundant observations. Starting with a small number of initial core states, our model gradually adds new core states to the transition diagram until it achieves an optimal behavioral strategy consistent with the Bellman equation. We demonstrate that the resultant inference model outperforms the conventional method for POMDP. We emphasize that our model only containing the core states has high explainability. Furthermore, the proposed method suits online learning as it suppresses memory consumption and improves learning speed.

READ FULL TEXT
research
06/23/2022

Reinforcement Learning under Partial Observability Guided by Learned Environment Models

In practical applications, we can rarely assume full observability of a ...
research
09/03/2020

Learning to Infer User Hidden States for Online Sequential Advertising

To drive purchase in online advertising, it is of the advertiser's great...
research
04/19/2023

End-to-End Policy Gradient Method for POMDPs and Explainable Agents

Real-world decision-making problems are often partially observable, and ...
research
11/23/2021

Adaptive Multi-Goal Exploration

We introduce a generic strategy for provably efficient multi-goal explor...
research
12/26/2022

Learning Generalizable Representations for Reinforcement Learning via Adaptive Meta-learner of Behavioral Similarities

How to learn an effective reinforcement learning-based model for control...
research
11/20/2022

Reward is not Necessary: How to Create a Compositional Self-Preserving Agent for Life-Long Learning

We introduce a physiological model-based agent as proof-of-principle tha...
research
01/21/2021

E-commerce warehousing: learning a storage policy

E-commerce with major online retailers is changing the way people consum...

Please sign up or login with your details

Forgot password? Click here to reset