Transformers as Policies for Variable Action Environments

01/09/2023

∙

In this project we demonstrate the effectiveness of the transformer encoder as a viable architecture for policies in variable action environments. Using it, we train an agent using Proximal Policy Optimisation (PPO) on multiple maps against scripted opponents in the Gym-μRTS environment. The final agent is able to achieve a higher return using half the computational resources of the next-best RL agent, which used the GridNet architecture. The source code and pre-trained models are available here: https://github.com/NiklasZ/transformers-for-variable-action-envs

READ FULL TEXT

Transformers as Policies for Variable Action Environments

Sign in with Google

Consider DeepAI Pro