Improving Multilingual Sentence Embedding using Bi-directional Dual Encoder with Additive Margin Softmax

02/22/2019
by   Yinfei Yang, et al.
0

In this paper, we present an approach to learn multilingual sentence embeddings using a bi-directional dual-encoder with additive margin softmax. The embeddings are able to achieve state-of-the-art results on the United Nations (UN) parallel corpus retrieval task. In all the languages tested, the system achieves P@1 of 86 train NMT models that achieve similar performance to models trained on gold pairs. We explore simple document-level embeddings constructed by averaging our sentence embeddings. On the UN document-level retrieval task, document embeddings achieve around 97 Lastly, we evaluate the proposed model on the BUCC mining task. The learned embeddings with raw cosine similarity scores achieve competitive results compared to current state-of-the-art models, and with a second-stage scorer we achieve a new state-of-the-art level on this task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/20/2019

Hierarchical Document Encoder for Parallel Corpus Mining

We explore using multilingual document embeddings for nearest neighbor m...
research
07/31/2018

Effective Parallel Corpus Mining using Bilingual Sentence Embeddings

This paper presents an effective approach for parallel corpus mining usi...
research
07/09/2019

Multilingual Universal Sentence Encoder for Semantic Retrieval

We introduce two pre-trained retrieval focused multilingual sentence enc...
research
11/03/2018

Margin-based Parallel Corpus Mining with Multilingual Sentence Embeddings

Machine translation is highly sensitive to the size and quality of the t...
research
09/20/2019

Learning to Create Sentence Semantic Relation Graphs for Multi-Document Summarization

Linking facts across documents is a challenging task, as the language us...
research
02/16/2023

LEALLA: Learning Lightweight Language-agnostic Sentence Embeddings with Knowledge Distillation

Large-scale language-agnostic sentence embedding models such as LaBSE (F...
research
03/27/2023

Improving Dual-Encoder Training through Dynamic Indexes for Negative Mining

Dual encoder models are ubiquitous in modern classification and retrieva...

Please sign up or login with your details

Forgot password? Click here to reset