DICTDIS: Dictionary Constrained Disambiguation for Improved NMT

10/13/2022
by   Ayush Maheshwari, et al.
0

Domain-specific neural machine translation (NMT) systems (e.g., in educational applications) are socially significant with the potential to help make information accessible to a diverse set of users in multilingual societies. It is desirable that such NMT systems be lexically constrained and draw from domain-specific dictionaries. Dictionaries could present multiple candidate translations for a source words/phrases on account of the polysemous nature of words. The onus is then on the NMT model to choose the contextually most appropriate candidate. Prior work has largely ignored this problem and focused on the single candidate setting where the target word or phrase is replaced by a single constraint. In this work we present DICTDIS, a lexically constrained NMT system that disambiguates between multiple candidate translations derived from dictionaries. We achieve this by augmenting training data with multiple dictionary candidates to actively encourage disambiguation during training. We demonstrate the utility of DICTDIS via extensive experiments on English-Hindi sentences in a variety of domains including news, finance, medicine and engineering. We obtain superior disambiguation performance on all domains with improved fluency in some domains of up to 4 BLEU points, when compared with existing approaches for lexically constrained and unconstrained NMT.

READ FULL TEXT
research
10/24/2016

Bridging Neural Machine Translation and Bilingual Dictionaries

Neural Machine Translation (NMT) has become the new state-of-the-art in ...
research
05/25/2018

Phrase Table as Recommendation Memory for Neural Machine Translation

Neural Machine Translation (NMT) has drawn much attention due to its pro...
research
05/12/2021

Improving Lexically Constrained Neural Machine Translation with Source-Conditioned Masked Span Prediction

Generating accurate terminology is a crucial component for the practical...
research
04/19/2019

Code-Switching for Enhancing NMT with Pre-Specified Translation

Leveraging user-provided translation to constrain NMT has practical sign...
research
06/07/2018

Multi-Source Neural Machine Translation with Missing Data

Multi-source translation is an approach to exploit multiple inputs (e.g....
research
10/24/2022

Specializing Multi-domain NMT via Penalizing Low Mutual Information

Multi-domain Neural Machine Translation (NMT) trains a single model with...

Please sign up or login with your details

Forgot password? Click here to reset