Meta-Embeddings Based On Self-Attention

03/03/2020
by   Qichen Li, et al.
0

Creating meta-embeddings for better performance in language modelling has received attention lately, and methods based on concatenation or merely calculating the arithmetic mean of more than one separately trained embeddings to perform meta-embeddings have shown to be beneficial. In this paper, we devise a new meta-embedding model based on the self-attention mechanism, namely the Duo. With less than 0.4M parameters, the Duo mechanism achieves state-of-the-art accuracy in text classification tasks such as 20NG. Additionally, we propose a new meta-embedding sequece-to-sequence model for machine translation, which to the best of our knowledge, is the first machine translation model based on more than one word-embedding. Furthermore, it has turned out that our model outperform the Transformer not only in terms of achieving a better result, but also a faster convergence on recognized benchmarks, such as the WMT 2014 English-to-French translation task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/14/2018

Frustratingly Easy Meta-Embedding -- Computing Meta-Embeddings by Averaging Source Word Embeddings

Creating accurate meta-embeddings from pre-trained source embeddings has...
research
11/06/2017

Weighted Transformer Network for Machine Translation

State-of-the-art results on neural machine translation often use attenti...
research
10/23/2020

Adversarial Learning of Feature-based Meta-Embeddings

Certain embedding types outperform others in different scenarios, e.g., ...
research
03/02/2020

Transformer++

Recent advancements in attention mechanisms have replaced recurrent neur...
research
06/13/2019

Interpretable ICD Code Embeddings with Self- and Mutual-Attention Mechanisms

We propose a novel and interpretable embedding method to represent the i...
research
11/11/2019

BP-Transformer: Modelling Long-Range Context via Binary Partitioning

The Transformer model is widely successful on many natural language proc...
research
06/03/2019

Assessing the Ability of Self-Attention Networks to Learn Word Order

Self-attention networks (SAN) have attracted a lot of interests due to t...

Please sign up or login with your details

Forgot password? Click here to reset