Learning How to Infer Partial MDPs for In-Context Adaptation and Exploration

02/08/2023
by   Chentian Jiang, et al.
0

To generalize across tasks, an agent should acquire knowledge from past tasks that facilitate adaptation and exploration in future tasks. We focus on the problem of in-context adaptation and exploration, where an agent only relies on context, i.e., history of states, actions and/or rewards, rather than gradient-based updates. Posterior sampling (extension of Thompson sampling) is a promising approach, but it requires Bayesian inference and dynamic programming, which often involve unknowns (e.g., a prior) and costly computations. To address these difficulties, we use a transformer to learn an inference process from training tasks and consider a hypothesis space of partial models, represented as small Markov decision processes that are cheap for dynamic programming. In our version of the Symbolic Alchemy benchmark, our method's adaptation speed and exploration-exploitation balance approach those of an exact posterior sampling oracle. We also show that even though partial models exclude relevant information from the environment, they can nevertheless lead to good policies.

READ FULL TEXT
research
11/11/2021

Agent Spaces

Exploration is one of the most important tasks in Reinforcement Learning...
research
07/31/2023

Distributed Dynamic Programming forNetworked Multi-Agent Markov Decision Processes

The main goal of this paper is to investigate distributed dynamic progra...
research
05/13/2023

Thompson Sampling for Parameterized Markov Decision Processes with Uninformative Actions

We study parameterized MDPs (PMDPs) in which the key parameters of inter...
research
12/12/2012

Inductive Policy Selection for First-Order MDPs

We select policies for large Markov Decision Processes (MDPs) with compa...
research
03/04/2020

Exploration-Exploitation in Constrained MDPs

In many sequential decision-making problems, the goal is to optimize a u...
research
10/19/2012

Symbolic Generalization for On-line Planning

Symbolic representations have been used successfully in off-line plannin...
research
05/11/2010

Feature Selection Using Regularization in Approximate Linear Programs for Markov Decision Processes

Approximate dynamic programming has been used successfully in a large va...

Please sign up or login with your details

Forgot password? Click here to reset