Action Pick-up in Dynamic Action Space Reinforcement Learning

04/03/2023
by   Jiaqi Ye, et al.
0

Most reinforcement learning algorithms are based on a key assumption that Markov decision processes (MDPs) are stationary. However, non-stationary MDPs with dynamic action space are omnipresent in real-world scenarios. Yet problems of dynamic action space reinforcement learning have been studied by many previous works, how to choose valuable actions from new and unseen actions to improve learning efficiency remains unaddressed. To tackle this problem, we propose an intelligent Action Pick-up (AP) algorithm to autonomously choose valuable actions that are most likely to boost performance from a set of new actions. In this paper, we first theoretically analyze and find that a prior optimal policy plays an important role in action pick-up by providing useful knowledge and experience. Then, we design two different AP methods: frequency-based global method and state clustering-based local method, based on the prior optimal policy. Finally, we evaluate the AP on two simulated but challenging environments where action spaces vary over time. Experimental results demonstrate that our proposed AP has advantages over baselines in learning efficiency.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/09/2020

A Kernel-Based Approach to Non-Stationary Reinforcement Learning in Metric Spaces

In this work, we propose KeRNS: an algorithm for episodic reinforcement ...
research
05/03/2017

Answer Set Programming for Non-Stationary Markov Decision Processes

Non-stationary domains, where unforeseen changes happen, present a chall...
research
02/29/2012

Fast Reinforcement Learning with Large Action Sets using Error-Correcting Output Codes for MDP Factorization

The use of Reinforcement Learning in real-world scenarios is strongly li...
research
06/05/2019

Lifelong Learning with a Changing Action Set

In many real-world sequential decision making problems, the number of av...
research
06/12/2022

Reinforcement Learning for Vision-based Object Manipulation with Non-parametric Policy and Action Primitives

The object manipulation is a crucial ability for a service robot, but it...
research
05/18/2021

Sparsity Prior Regularized Q-learning for Sparse Action Tasks

In many decision-making tasks, some specific actions are limited in thei...
research
11/17/2018

Autonomous Extraction of a Hierarchical Structure of Tasks in Reinforcement Learning, A Sequential Associate Rule Mining Approach

Reinforcement learning (RL) techniques, while often powerful, can suffer...

Please sign up or login with your details

Forgot password? Click here to reset