DCT: Dual Channel Training of Action Embeddings for Reinforcement Learning with Large Discrete Action Spaces

06/28/2023
by   Pranavi Pathakota, et al.
0

The ability to learn robust policies while generalizing over large discrete action spaces is an open challenge for intelligent systems, especially in noisy environments that face the curse of dimensionality. In this paper, we present a novel framework to efficiently learn action embeddings that simultaneously allow us to reconstruct the original action as well as to predict the expected future state. We describe an encoder-decoder architecture for action embeddings with a dual channel loss that balances between action reconstruction and state prediction accuracy. We use the trained decoder in conjunction with a standard reinforcement learning algorithm that produces actions in the embedding space. Our architecture is able to outperform two competitive baselines in two diverse environments: a 2D maze environment with more than 4000 discrete noisy actions, and a product recommendation task that uses real-world e-commerce transaction data. Empirical results show that the model results in cleaner action embeddings, and the improved representations help learn better policies with earlier convergence.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/09/2020

Joint State-Action Embedding for Efficient Reinforcement Learning

While reinforcement learning has achieved considerable successes in rece...
research
09/19/2022

MAN: Multi-Action Networks Learning

Learning control policies with large action spaces is a challenging prob...
research
02/01/2019

Learning Action Representations for Reinforcement Learning

Most model-free reinforcement learning methods leverage state representa...
research
09/12/2021

HyAR: Addressing Discrete-Continuous Action Reinforcement Learning via Hybrid Action Representation

Discrete-continuous hybrid action space is a natural setting in many pra...
research
05/31/2023

Handling Large Discrete Action Spaces via Dynamic Neighborhood Construction

Large discrete action spaces remain a central challenge for reinforcemen...
research
05/06/2023

Learning Action Embeddings for Off-Policy Evaluation

Off-policy evaluation (OPE) methods allow us to compute the expected rew...
research
04/30/2020

Plan-Space State Embeddings for Improved Reinforcement Learning

Robot control problems are often structured with a policy function that ...

Please sign up or login with your details

Forgot password? Click here to reset