Solving Continual Combinatorial Selection via Deep Reinforcement Learning

09/09/2019
by   HyungSeok Song, et al.
0

We consider the Markov Decision Process (MDP) of selecting a subset of items at each step, termed the Select-MDP (S-MDP). The large state and action spaces of S-MDPs make them intractable to solve with typical reinforcement learning (RL) algorithms especially when the number of items is huge. In this paper, we present a deep RL algorithm to solve this issue by adopting the following key ideas. First, we convert the original S-MDP into an Iterative Select-MDP (IS-MDP), which is equivalent to the S-MDP in terms of optimal actions. IS-MDP decomposes a joint action of selecting K items simultaneously into K iterative selections resulting in the decrease of actions at the expense of an exponential increase of states. Second, we overcome this state space explo-sion by exploiting a special symmetry in IS-MDPs with novel weight shared Q-networks, which prov-ably maintain sufficient expressive power. Various experiments demonstrate that our approach works well even when the item space is large and that it scales to environments with item spaces different from those used in training.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/18/2020

Exact Reduction of Huge Action Spaces in General Reinforcement Learning

The reinforcement learning (RL) framework formalizes the notion of learn...
research
09/27/2020

Scalable Deep Reinforcement Learning for Ride-Hailing

Ride-hailing services, such as Didi Chuxing, Lyft, and Uber, arrange tho...
research
06/30/2020

MDP Homomorphic Networks: Group Symmetries in Reinforcement Learning

This paper introduces MDP homomorphic networks for deep reinforcement le...
research
07/18/2021

A note on the article "On Exploiting Spectral Properties for Solving MDP with Large State Space"

We improve a theoretical result of the article "On Exploiting Spectral P...
research
02/08/2023

Predictable MDP Abstraction for Unsupervised Model-Based RL

A key component of model-based reinforcement learning (RL) is a dynamics...
research
10/18/2020

DeepAveragers: Offline Reinforcement Learning by Solving Derived Non-Parametric MDPs

We study an approach to offline reinforcement learning (RL) based on opt...
research
10/09/2019

Model-Based Reinforcement Learning Exploiting State-Action Equivalence

Leveraging an equivalence property in the state-space of a Markov Decisi...

Please sign up or login with your details

Forgot password? Click here to reset