Emergence of Communication in an Interactive World with Consistent Speakers

09/03/2018
by   Ben Bogin, et al.
0

Training agents to communicate with one another given task-based supervision only has attracted considerable attention recently, due to the growing interest in developing models for human-agent interaction. Prior work on the topic focused on simple environments, where training using policy gradient was feasible despite the non-stationarity of the agents during training. In this paper, we present a more challenging environment for testing the emergence of communication from raw pixels, where training using policy gradient fails. We propose a new model and training algorithm, that utilizes the structure of a learned representation space to produce more consistent speakers at the initial phases of training, which stabilizes learning. We empirically show that our algorithm substantially improves performance compared to policy gradient. We also propose a new alignment-based metric for measuring context-independence in emerged communication and find our method increases context-independence compared to policy gradient and other competitive baselines.

READ FULL TEXT
research
10/20/2020

Proximal Policy Gradient: PPO with Policy Gradient

In this paper, we propose a new algorithm PPG (Proximal Policy Gradient)...
research
10/29/2020

Low-Variance Policy Gradient Estimation with World Models

In this paper, we propose World Model Policy Gradient (WMPG), an approac...
research
08/04/2023

Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization

Recent months have seen the emergence of a powerful new trend in which l...
research
07/14/2020

Lifelong Policy Gradient Learning of Factored Policies for Faster Training Without Forgetting

Policy gradient methods have shown success in learning control policies ...
research
06/08/2018

Policy Gradient as a Proxy for Dynamic Oracles in Constituency Parsing

Dynamic oracles provide strong supervision for training constituency par...
research
07/02/2018

Improving Goal-Oriented Visual Dialog Agents via Advanced Recurrent Nets with Tempered Policy Gradient

Learning goal-oriented dialogues by means of deep reinforcement learning...
research
06/04/2019

Unsupervised Emergence of Egocentric Spatial Structure from Sensorimotor Prediction

Despite its omnipresence in robotics application, the nature of spatial ...

Please sign up or login with your details

Forgot password? Click here to reset