Fast Vocabulary Projection Method via Clustering for Multilingual Machine Translation on GPU

08/14/2022
by   hossam-amer, et al.
0

Multilingual Neural Machine Translation has been showing great success using transformer models. Deploying these models is challenging because they usually require large vocabulary (vocab) sizes for various languages. This limits the speed of predicting the output tokens in the last vocab projection layer. To alleviate these challenges, this paper proposes a fast vocabulary projection method via clustering which can be used for multilingual transformers on GPUs. First, we offline split the vocab search space into disjoint clusters given the hidden context vector of the decoder output, which results in much smaller vocab columns for vocab projection. Second, at inference time, the proposed method predicts the clusters and candidate active tokens for hidden context vectors at the vocab projection. This paper also includes analysis of different ways of building these clusters in multilingual settings. Our results show end-to-end speed gains in float16 GPU inference up to 25 BLEU score and slightly increasing memory cost. The proposed method speeds up the vocab projection step itself by up to 2.6x. We also conduct an extensive human evaluation to verify the proposed method preserves the quality of the translations from the original model.

READ FULL TEXT

page 4

page 10

research
09/14/2021

Efficient Inference for Multilingual Neural Machine Translation

Multilingual NMT has become an attractive solution for MT deployment in ...
research
05/18/2023

On the Off-Target Problem of Zero-Shot Multilingual Neural Machine Translation

While multilingual neural machine translation has achieved great success...
research
05/23/2023

Beyond Shared Vocabulary: Increasing Representational Word Similarities across Languages for Multilingual Machine Translation

Using a shared vocabulary is common practice in Multilingual Neural Mach...
research
08/25/2019

Multilingual Neural Machine Translation with Language Clustering

Multilingual neural machine translation (NMT), which translates multiple...
research
03/15/2022

Multilingual Mix: Example Interpolation Improves Multilingual Neural Machine Translation

Multilingual neural machine translation models are trained to maximize t...
research
10/29/2018

Learning to Screen for Fast Softmax Inference on Large Vocabulary Neural Networks

Neural language models have been widely used in various NLP tasks, inclu...
research
12/20/2022

T-Projection: High Quality Annotation Projection for Sequence Labeling Tasks

In the absence of readily available labeled data for a given task and la...

Please sign up or login with your details

Forgot password? Click here to reset