Towards Accurate Translation via Semantically Appropriate Application of Lexical Constraints

06/21/2023
by   Yujin Baek, et al.
0

Lexically-constrained NMT (LNMT) aims to incorporate user-provided terminology into translations. Despite its practical advantages, existing work has not evaluated LNMT models under challenging real-world conditions. In this paper, we focus on two important but under-studied issues that lie in the current evaluation process of LNMT studies. The model needs to cope with challenging lexical constraints that are "homographs" or "unseen" during training. To this end, we first design a homograph disambiguation module to differentiate the meanings of homographs. Moreover, we propose PLUMCOT, which integrates contextually rich information about unseen lexical constraints from pre-trained language models and strengthens a copy mechanism of the pointer network via direct supervision of a copying score. We also release HOLLY, an evaluation benchmark for assessing the ability of a model to cope with "homographic" and "unseen" lexical constraints. Experiments on HOLLY and the previous test setup show the effectiveness of our method. The effects of PLUMCOT are shown to be remarkable in "unseen" constraints. Our dataset is available at https://github.com/papago-lab/HOLLY-benchmark

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/31/2022

Improving Temporal Generalization of Pre-trained Language Models with Lexical Semantic Change

Recent research has revealed that neural language models at scale suffer...
research
04/18/2021

Extract, Denoise, and Enforce: Evaluating and Predicting Lexical Constraints for Conditional Text Generation

Recently, pre-trained language models (PLMs) have dominated conditional ...
research
05/27/2023

Disambiguated Lexically Constrained Neural Machine Translation

Lexically constrained neural machine translation (LCNMT), which controls...
research
03/17/2021

ENCONTER: Entity Constrained Progressive Sequence Generation via Insertion-based Transformer

Pretrained using large amount of data, autoregressive language models ar...
research
04/22/2017

Lexical Features in Coreference Resolution: To be Used With Caution

Lexical features are a major source of information in state-of-the-art c...
research
07/17/2021

On the Copying Behaviors of Pre-Training for Neural Machine Translation

Previous studies have shown that initializing neural machine translation...
research
04/28/2022

Neighbors Are Not Strangers: Improving Non-Autoregressive Translation under Low-Frequency Lexical Constraints

However, current autoregressive approaches suffer from high latency. In ...

Please sign up or login with your details

Forgot password? Click here to reset