Inverse Reinforcement Learning with the Average Reward Criterion

05/24/2023
by   Feiyang Wu, et al.
0

We study the problem of Inverse Reinforcement Learning (IRL) with an average-reward criterion. The goal is to recover an unknown policy and a reward function when the agent only has samples of states and actions from an experienced agent. Previous IRL methods assume that the expert is trained in a discounted environment, and the discount factor is known. This work alleviates this assumption by proposing an average-reward framework with efficient learning algorithms. We develop novel stochastic first-order methods to solve the IRL problem under the average-reward setting, which requires solving an Average-reward Markov Decision Process (AMDP) as a subproblem. To solve the subproblem, we develop a Stochastic Policy Mirror Descent (SPMD) method under general state and action spaces that needs 𝒪(1/ε) steps of gradient computation. Equipped with SPMD, we propose the Inverse Policy Mirror Descent (IPMD) method for solving the IRL problem with a 𝒪(1/ε^2) complexity. To the best of our knowledge, the aforementioned complexity results are new in IRL. Finally, we corroborate our analysis with numerical experiments using the MuJoCo benchmark and additional control tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/13/2016

Inverse Reinforcement Learning with Simultaneous Estimation of Rewards and Dynamics

Inverse Reinforcement Learning (IRL) describes the problem of learning a...
research
07/15/2020

Identifying Reward Functions using Anchor Actions

We propose a reward function estimation framework for inverse reinforcem...
research
06/07/2021

Average-Reward Reinforcement Learning with Trust Region Methods

Most of reinforcement learning algorithms optimize the discounted criter...
research
09/24/2018

EpiRL: A Reinforcement Learning Agent to Facilitate Epistasis Detection

Epistasis (gene-gene interaction) is crucial to predicting genetic disea...
research
04/27/2020

Evolutionary Stochastic Policy Distillation

Solving the Goal-Conditioned Reward Sparse (GCRS) task is a challenging ...
research
11/01/1997

Dynamic Non-Bayesian Decision Making

The model of a non-Bayesian agent who faces a repeated game with incompl...
research
01/06/2022

Admissible Policy Teaching through Reward Design

We study reward design strategies for incentivizing a reinforcement lear...

Please sign up or login with your details

Forgot password? Click here to reset