T-Learning

12/31/2011
by   Vincent Graziano, et al.
0

Traditional Reinforcement Learning (RL) has focused on problems involving many states and few actions, such as simple grid worlds. Most real world problems, however, are of the opposite type, Involving Few relevant states and many actions. For example, to return home from a conference, humans identify only few subgoal states such as lobby, taxi, airport etc. Each valid behavior connecting two such states can be viewed as an action, and there are trillions of them. Assuming the subgoal identification problem is already solved, the quality of any RL method---in real-world settings---depends less on how well it scales with the number of states than on how well it scales with the number of actions. This is where our new method T-Learning excels, by evaluating the relatively few possible transits from one state to another in a policy-independent way, rather than a huge number of state-action pairs, or states in traditional policy-dependent ways. Illustrative experiments demonstrate that performance improvements of T-Learning over Q-learning can be arbitrarily large.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/08/2020

BRPO: Batch Residual Policy Optimization

In batch reinforcement learning (RL), one often constrains a learned pol...
research
08/26/2020

Identifying Critical States by the Action-Based Variance of Expected Return

The balance of exploration and exploitation plays a crucial role in acce...
research
05/11/2022

A State-Distribution Matching Approach to Non-Episodic Reinforcement Learning

While reinforcement learning (RL) provides a framework for learning thro...
research
10/24/2022

Causal Explanation for Reinforcement Learning: Quantifying State and Temporal Importance

Explainability plays an increasingly important role in machine learning....
research
11/21/2022

Model-based Trajectory Stitching for Improved Offline Reinforcement Learning

In many real-world applications, collecting large and high-quality datas...
research
02/29/2012

Fast Reinforcement Learning with Large Action Sets using Error-Correcting Output Codes for MDP Factorization

The use of Reinforcement Learning in real-world scenarios is strongly li...
research
02/08/2022

Local Explanations for Reinforcement Learning

Many works in explainable AI have focused on explaining black-box classi...

Please sign up or login with your details

Forgot password? Click here to reset