A Deep Neural Network Approach To Parallel Sentence Extraction

09/28/2017
by   Francis Grégoire, et al.
0

Parallel sentence extraction is a task addressing the data sparsity problem found in multilingual natural language processing applications. We propose an end-to-end deep neural network approach to detect translational equivalence between sentences in two different languages. In contrast to previous approaches, which typically rely on multiples models and various word alignment features, by leveraging continuous vector representation of sentences we remove the need of any domain specific feature engineering. Using a siamese bidirectional recurrent neural networks, our results against a strong baseline based on a state-of-the-art parallel sentence extraction system show a significant improvement in both the quality of the extracted parallel sentences and the translation performance of statistical machine translation systems. We believe this study is the first one to investigate deep learning for the parallel sentence extraction task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/13/2018

Extracting Parallel Sentences with Bidirectional Recurrent Neural Networks to Improve Machine Translation

Parallel sentence extraction is a task addressing the data sparsity prob...
research
06/25/2018

Neural Machine Translation for Low Resource Languages using Bilingual Lexicon Induced from Comparable Corpora

Resources for the non-English languages are scarce and this paper addres...
research
05/14/2011

Semantic Vector Machines

We first present our work in machine translation, during which we used a...
research
09/28/2020

Generative latent neural models for automatic word alignment

Word alignments identify translational correspondences between words in ...
research
12/02/2013

Bidirectional Recursive Neural Networks for Token-Level Labeling with Structure

Recently, deep architectures, such as recurrent and recursive neural net...
research
03/29/2016

A Parallel-Hierarchical Model for Machine Comprehension on Sparse Data

Understanding unstructured text is a major goal within natural language ...
research
03/23/2023

Return of the RNN: Residual Recurrent Networks for Invertible Sentence Embeddings

This study presents a novel model for invertible sentence embeddings usi...

Please sign up or login with your details

Forgot password? Click here to reset