Extracting Parallel Sentences with Bidirectional Recurrent Neural Networks to Improve Machine Translation

06/13/2018
by   Francis Grégoire, et al.
0

Parallel sentence extraction is a task addressing the data sparsity problem found in multilingual natural language processing applications. We propose a bidirectional recurrent neural network based approach to extract parallel sentences from collections of multilingual texts. Our experiments with noisy parallel corpora show that we can achieve promising results against a competitive baseline by removing the need of specific feature engineering or additional external resources. To justify the utility of our approach, we extract sentence pairs from Wikipedia articles to train machine translation systems and show significant improvements in translation performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/28/2017

A Deep Neural Network Approach To Parallel Sentence Extraction

Parallel sentence extraction is a task addressing the data sparsity prob...
research
06/25/2018

Neural Machine Translation for Low Resource Languages using Bilingual Lexicon Induced from Comparable Corpora

Resources for the non-English languages are scarce and this paper addres...
research
12/16/2021

Idiomatic Expression Paraphrasing without Strong Supervision

Idiomatic expressions (IEs) play an essential role in natural language. ...
research
05/24/2018

Filtering and Mining Parallel Data in a Joint Multilingual Space

We learn a joint multilingual sentence embedding and use the distance be...
research
05/14/2011

Semantic Vector Machines

We first present our work in machine translation, during which we used a...
research
08/19/2020

Transformer based Multilingual document Embedding model

One of the current state-of-the-art multilingual document embedding mode...
research
04/29/2020

Bilingual Text Extraction as Reading Comprehension

In this paper, we propose a method to extract bilingual texts automatica...

Please sign up or login with your details

Forgot password? Click here to reset