Discovering Hierarchical Achievements in Reinforcement Learning via Contrastive Learning

07/07/2023
by   Seungyong Moon, et al.
0

Discovering achievements with a hierarchical structure on procedurally generated environments poses a significant challenge. This requires agents to possess a broad range of abilities, including generalization and long-term reasoning. Many prior methods are built upon model-based or hierarchical approaches, with the belief that an explicit module for long-term planning would be beneficial for learning hierarchical achievements. However, these methods require an excessive amount of environment interactions or large model sizes, limiting their practicality. In this work, we identify that proximal policy optimization (PPO), a simple and versatile model-free algorithm, outperforms the prior methods with recent implementation practices. Moreover, we find that the PPO agent can predict the next achievement to be unlocked to some extent, though with low confidence. Based on this observation, we propose a novel contrastive learning method, called achievement distillation, that strengthens the agent's capability to predict the next achievement. Our method exhibits a strong capacity for discovering hierarchical achievements and shows state-of-the-art performance on the challenging Crafter environment using fewer model parameters in a sample-efficient regime.

READ FULL TEXT

page 3

page 14

research
11/06/2019

MBCAL: A Simple and Efficient Reinforcement Learning Method for Recommendation Systems

It has been widely regarded that only considering the immediate user fee...
research
01/06/2019

What Should I Do Now? Marrying Reinforcement Learning and Symbolic Planning

Long-term planning poses a major difficulty to many reinforcement learni...
research
10/27/2022

Meta-Reinforcement Learning Using Model Parameters

In meta-reinforcement learning, an agent is trained in multiple differen...
research
06/11/2020

From proprioception to long-horizon planning in novel environments: A hierarchical RL model

For an intelligent agent to flexibly and efficiently operate in complex ...
research
03/31/2021

DEALIO: Data-Efficient Adversarial Learning for Imitation from Observation

In imitation learning from observation IfO, a learning agent seeks to im...
research
10/28/2022

Using Contrastive Samples for Identifying and Leveraging Possible Causal Relationships in Reinforcement Learning

A significant challenge in reinforcement learning is quantifying the com...
research
09/14/2021

Benchmarking the Spectrum of Agent Capabilities

Evaluating the general abilities of intelligent agents requires complex ...

Please sign up or login with your details

Forgot password? Click here to reset