Word Order Matters when you Increase Masking

11/08/2022
by   Karim Lasri, et al.
0

Word order, an essential property of natural languages, is injected in Transformer-based neural language models using position encoding. However, recent experiments have shown that explicit position encoding is not always useful, since some models without such feature managed to achieve state-of-the art performance on some tasks. To understand better this phenomenon, we examine the effect of removing position encodings on the pre-training objective itself (i.e., masked language modelling), to test whether models can reconstruct position information from co-occurrences alone. We do so by controlling the amount of masked tokens in the input sentence, as a proxy to affect the importance of position information for the task. We find that the necessity of position information increases with the amount of masking, and that masked language models without position encodings are not able to reconstruct this information on the task. These findings point towards a direct relationship between the amount of masking and the ability of Transformers to capture order-sensitive aspects of language using position encoding.

READ FULL TEXT
research
10/23/2022

The Curious Case of Absolute Position Embeddings

Transformer language models encode the notion of word order using positi...
research
10/09/2022

Improve Transformer Pre-Training with Decoupled Directional Relative Position Encoding and Representation Differentiations

In this work, we revisit the Transformer-based pre-trained language mode...
research
04/18/2021

Demystifying the Better Performance of Position Encoding Variants for Transformer

Transformers are state of the art models in NLP that map a given input s...
research
02/22/2021

Position Information in Transformers: An Overview

Transformers are arguably the main workhorse in recent Natural Language ...
research
03/30/2022

Transformer Language Models without Positional Encodings Still Learn Positional Information

Transformers typically require some form of positional encoding, such as...
research
03/13/2020

Learning to Encode Position for Transformer with Continuous Dynamical Model

We introduce a new way of learning to encode position information for no...
research
04/18/2022

Dynamic Position Encoding for Transformers

Recurrent models have been dominating the field of neural machine transl...

Please sign up or login with your details

Forgot password? Click here to reset