Learning What To Do by Simulating the Past

04/08/2021
by   David Lindner, et al.
0

Since reward functions are hard to specify, recent work has focused on learning policies from human feedback. However, such approaches are impeded by the expense of acquiring such feedback. Recent work proposed that agents have access to a source of information that is effectively free: in any environment that humans have acted in, the state will already be optimized for human preferences, and thus an agent can extract information about what humans want from the state. Such learning is possible in principle, but requires simulating all possible past trajectories that could have led to the observed state. This is feasible in gridworlds, but how do we scale it to complex tasks? In this work, we show that by combining a learned feature encoder with learned inverse models, we can enable agents to simulate human actions backwards in time to infer what they must have done. The resulting algorithm is able to reproduce a specific skill in MuJoCo environments given a single state sampled from the optimal policy for that skill.

READ FULL TEXT

page 8

page 19

page 20

page 21

page 22

page 23

page 24

research
06/09/2021

PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training

Conveying complex objectives to reinforcement learning (RL) agents can o...
research
05/04/2023

Maximum Causal Entropy Inverse Constrained Reinforcement Learning

When deploying artificial agents in real-world environments where they i...
research
10/23/2021

Guided Policy Search for Parameterized Skills using Adverbs

We present a method for using adverb phrases to adjust skill parameters ...
research
02/12/2019

Deep Reinforcement Learning from Policy-Dependent Human Feedback

To widen their accessibility and increase their utility, intelligent age...
research
02/12/2019

Preferences Implicit in the State of the World

Reinforcement learning (RL) agents optimize only the features specified ...
research
02/03/2023

Learning Zero-Shot Cooperation with Humans, Assuming Humans Are Biased

There is a recent trend of applying multi-agent reinforcement learning (...
research
02/13/2021

Mitigating Negative Side Effects via Environment Shaping

Agents operating in unstructured environments often produce negative sid...

Please sign up or login with your details

Forgot password? Click here to reset