-
Continuous Inverse Optimal Control with Locally Optimal Examples
Inverse optimal control, also known as inverse reinforcement learning, i...
06/18/2012 ∙ by Sergey Levine, et al. ∙0 ∙
share
read it
-
Towards Diverse Text Generation with Inverse Reinforcement Learning
Text generation is a crucial task in NLP. Recently, several adversarial ...
04/30/2018 ∙ by Zhan Shi, et al. ∙0 ∙
share
read it
-
Hierarchical Policy Search via Return-Weighted Density Estimation
Learning an optimal policy from a multi-modal reward function is a chall...
11/28/2017 ∙ by Takayuki Osa, et al. ∙0 ∙
share
read it
-
Learning convex bounds for linear quadratic control policy synthesis
Learning to make decisions from observed data in dynamic environments re...
06/01/2018 ∙ by Jack Umenberger, et al. ∙0 ∙
share
read it
-
Asking Easy Questions: A User-Friendly Approach to Active Reward Learning
Robots can learn the right reward function by querying a human expert. E...
10/10/2019 ∙ by Erdem Bıyık, et al. ∙23 ∙
share
read it
-
Impossibility of deducing preferences and rationality from human policy
Inverse reinforcement learning (IRL) attempts to infer human rewards or ...
12/15/2017 ∙ by Stuart Armstrong, et al. ∙0 ∙
share
read it
-
Learning to Emulate an Expert Projective Cone Scheduler
Projective cone scheduling defines a large class of rate-stabilizing pol...
01/30/2018 ∙ by Neal Master, et al. ∙0 ∙
share
read it
Apprenticeship Learning using Inverse Reinforcement Learning and Gradient Methods
In this paper we propose a novel gradient algorithm to learn a policy from an expert's observed behavior assuming that the expert behaves optimally with respect to some unknown reward function of a Markovian Decision Problem. The algorithm's aim is to find a reward function such that the resulting optimal policy matches well the expert's observed behavior. The main difficulty is that the mapping from the parameters to policies is both nonsmooth and highly redundant. Resorting to subdifferentials solves the first difficulty, while the second one is over- come by computing natural gradients. We tested the proposed method in two artificial domains and found it to be more reliable and efficient than some previous methods.
READ FULL TEXT