Exploring compact reinforcement-learning representations with linear regression

05/09/2012
by   Thomas J. Walsh, et al.
0

This paper presents a new algorithm for online linear regression whose efficiency guarantees satisfy the requirements of the KWIK (Knows What It Knows) framework. The algorithm improves on the complexity bounds of the current state-of-the-art procedure in this setting. We explore several applications of this algorithm for learning compact reinforcement-learning representations. We show that KWIK linear regression can be used to learn the reward function of a factored MDP and the probabilities of action outcomes in Stochastic STRIPS and Object Oriented MDPs, none of which have been proven to be efficiently learnable in the RL setting before. We also combine KWIK linear regression with other KWIK learners to learn larger portions of these models, including experiments on learning factored MDP transition and reward functions together.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/26/2022

Reward-Free RL is No Harder Than Reward-Aware RL in Linear Markov Decision Processes

Reward-free reinforcement learning (RL) considers the setting where the ...
research
02/15/2021

Nearly Minimax Optimal Regret for Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation

We study reinforcement learning in an infinite-horizon average-reward se...
research
02/16/2021

Inverse Reinforcement Learning in the Continuous Setting with Formal Guarantees

Inverse Reinforcement Learning (IRL) is the problem of finding a reward ...
research
07/25/2022

Online Reinforcement Learning for Periodic MDP

We study learning in periodic Markov Decision Process(MDP), a special ty...
research
10/11/2022

Multi-User Reinforcement Learning with Low Rank Rewards

In this work, we consider the problem of collaborative multi-user reinfo...
research
01/21/2022

Meta Learning MDPs with Linear Transition Models

We study meta-learning in Markov Decision Processes (MDP) with linear tr...
research
06/27/2012

Chi-square Tests Driven Method for Learning the Structure of Factored MDPs

SDYNA is a general framework designed to address large stochastic reinfo...

Please sign up or login with your details

Forgot password? Click here to reset