Better Neural Machine Translation by Extracting Linguistic Information from BERT

04/07/2021
by   Hassan S. Shavarani, et al.
0

Adding linguistic information (syntax or semantics) to neural machine translation (NMT) has mostly focused on using point estimates from pre-trained models. Directly using the capacity of massive pre-trained contextual word embedding models such as BERT (Devlin et al., 2019) has been marginally useful in NMT because effective fine-tuning is difficult to obtain for NMT without making training brittle and unreliable. We augment NMT by extracting dense fine-tuned vector-based linguistic information from BERT instead of using point estimates. Experimental results show that our method of incorporating linguistic information helps NMT to generalize better in a variety of training contexts and is no more difficult to train than conventional Transformer-based NMT.

READ FULL TEXT
research
02/17/2020

Incorporating BERT into Neural Machine Translation

The recently proposed BERT has shown great power on a variety of natural...
research
09/09/2021

BERT, mBERT, or BiBERT? A Study on Contextualized Embeddings for Neural Machine Translation

The success of bidirectional encoders using masked language models, such...
research
08/15/2019

Towards Making the Most of BERT in Neural Machine Translation

GPT-2 and BERT demonstrate the effectiveness of using pre-trained langua...
research
06/05/2018

How Do Source-side Monolingual Word Embeddings Impact Neural Machine Translation?

Using pre-trained word embeddings as input layer is a common practice in...
research
12/17/2022

Better Datastore, Better Translation: Generating Datastores from Pre-Trained Models for Nearest Neural Machine Translation

Nearest Neighbor Machine Translation (kNNMT) is a simple and effective m...
research
04/17/2018

When and Why are Pre-trainedWord Embeddings Useful for Neural Machine Translation?

The performance of Neural Machine Translation (NMT) systems often suffer...
research
05/09/2020

It's Morphin' Time! Combating Linguistic Discrimination with Inflectional Perturbations

Training on only perfect Standard English corpora predisposes pre-traine...

Please sign up or login with your details

Forgot password? Click here to reset