Implicit Two-Tower Policies

08/02/2022
by   Yunfan Zhao, et al.
0

We present a new class of structured reinforcement learning policy-architectures, Implicit Two-Tower (ITT) policies, where the actions are chosen based on the attention scores of their learnable latent representations with those of the input states. By explicitly disentangling action from state processing in the policy stack, we achieve two main goals: substantial computational gains and better performance. Our architectures are compatible with both: discrete and continuous action spaces. By conducting tests on 15 environments from OpenAI Gym and DeepMind Control Suite, we show that ITT-architectures are particularly suited for blackbox/evolutionary optimization and the corresponding policy training algorithms outperform their vanilla unstructured implicit counterparts as well as commonly used explicit policies. We complement our analysis by showing how techniques such as hashing and lazy tower updates, critically relying on the two-tower structure of ITTs, can be applied to obtain additional computational improvements.

READ FULL TEXT
research
09/01/2021

Implicit Behavioral Cloning

We find that across a wide range of robot policy learning scenarios, tre...
research
07/26/2019

Environment Probing Interaction Policies

A key challenge in reinforcement learning (RL) is environment generaliza...
research
11/03/2020

Control with adaptive Q-learning

This paper evaluates adaptive Q-learning (AQL) and single-partition adap...
research
06/27/2019

Quantile Regression Deep Reinforcement Learning

Policy gradient based reinforcement learning algorithms coupled with neu...
research
06/11/2022

Development of a Novel Framework for the Design of Transport Policies to Achieve Environmental Targets

The formulation of policies requires the selection and configuration of ...
research
01/28/2022

Efficient Embedding of Semantic Similarity in Control Policies via Entangled Bisimulation

Learning generalizeable policies from visual input in the presence of vi...
research
04/06/2018

Structured Evolution with Compact Architectures for Scalable Policy Optimization

We present a new method of blackbox optimization via gradient approximat...

Please sign up or login with your details

Forgot password? Click here to reset