The Bottom-up Evolution of Representations in the Transformer: A Study with Machine Translation and Language Modeling Objectives

09/03/2019
by   Elena Voita, et al.
0

We seek to understand how the representations of individual tokens and the structure of the learned feature space evolve between layers in deep neural networks under different learning objectives. We focus on the Transformers for our analysis as they have been shown effective on various tasks, including machine translation (MT), standard left-to-right language models (LM) and masked language modeling (MLM). Previous work used black-box probing tasks to show that the representations learned by the Transformer differ significantly depending on the objective. In this work, we use canonical correlation analysis and mutual information estimators to study how information flows across Transformer layers and how this process depends on the choice of learning objective. For example, as you go from bottom to top layers, information about the past in left-to-right language models gets vanished and predictions about the future get formed. In contrast, for MLM, representations initially acquire information about the context around the token, partially forgetting the token identity and producing a more generalized token representation. The token identity then gets recreated at the top MLM layers.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/23/2018

The Importance of Generation Order in Language Modeling

Neural language models are a critical component of state-of-the-art syst...
research
02/21/2020

Accessing Higher-level Representations in Sequential Transformers with Feedback Memory

Transformers are feedforward networks that can process input tokens in p...
research
04/19/2019

Constant-Time Machine Translation with Conditional Masked Language Models

Most machine translation systems generate text autoregressively, by sequ...
research
10/19/2020

Unsupervised Pretraining for Neural Machine Translation Using Elastic Weight Consolidation

This work presents our ongoing research of unsupervised pretraining in n...
research
05/26/2021

Bilingual Mutual Information Based Adaptive Training for Neural Machine Translation

Recently, token-level adaptive training has achieved promising improveme...
research
10/07/2022

Understanding Transformer Memorization Recall Through Idioms

To produce accurate predictions, language models (LMs) must balance betw...
research
07/07/2020

WLCG Authorisation from X.509 to Tokens

The WLCG Authorisation Working Group was formed in July 2017 with the ob...

Please sign up or login with your details

Forgot password? Click here to reset