State Action Separable Reinforcement Learning

06/05/2020
by   Ziyao Zhang, et al.
0

Reinforcement Learning (RL) based methods have seen their paramount successes in solving serial decision-making and control problems in recent years. For conventional RL formulations, Markov Decision Process (MDP) and state-action-value function are the basis for the problem modeling and policy evaluation. However, several challenging issues still remain. Among most cited issues, the enormity of state/action space is an important factor that causes inefficiency in accurately approximating the state-action-value function. We observe that although actions directly define the agents' behaviors, for many problems the next state after a state transition matters more than the action taken, in determining the return of such a state transition. In this regard, we propose a new learning paradigm, State Action Separable Reinforcement Learning (sasRL), wherein the action space is decoupled from the value function learning process for higher efficiency. Then, a light-weight transition model is learned to assist the agent to determine the action that triggers the associated state transition. In addition, our convergence analysis reveals that under certain conditions, the convergence time of sasRL is O(T^1/k), where T is the convergence time for updating the value function in the MDP-based formulation and k is a weighting factor. Experiments on several gaming scenarios show that sasRL outperforms state-of-the-art MDP-based RL algorithms by up to 75%.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/08/2019

Rethinking the Discount Factor in Reinforcement Learning: A Decision Theoretic Approach

Reinforcement learning (RL) agents have traditionally been tasked with m...
research
07/22/2021

A reinforcement learning approach to resource allocation in genomic selection

Genomic selection (GS) is a technique that plant breeders use to select ...
research
07/29/2018

Optimal Tap Setting of Voltage Regulation Transformers Using Batch Reinforcement Learning

In this paper, we address the problem of setting the tap positions of vo...
research
06/29/2023

Eigensubspace of Temporal-Difference Dynamics and How It Improves Value Approximation in Reinforcement Learning

We propose a novel value approximation method, namely Eigensubspace Regu...
research
08/24/2023

Extreme Risk Mitigation in Reinforcement Learning using Extreme Value Theory

Risk-sensitive reinforcement learning (RL) has garnered significant atte...
research
06/24/2019

A Theoretical Connection Between Statistical Physics and Reinforcement Learning

Sequential decision making in the presence of uncertainty and stochastic...
research
12/12/2021

Tree-based Focused Web Crawling with Reinforcement Learning

A focused crawler aims at discovering as many web pages relevant to a ta...

Please sign up or login with your details

Forgot password? Click here to reset