Learning Light-Weight Translation Models from Deep Transformer

12/27/2020
by   Bei Li, et al.
19

Recently, deep models have shown tremendous improvements in neural machine translation (NMT). However, systems of this kind are computationally expensive and memory intensive. In this paper, we take a natural step towards learning strong but light-weight NMT systems. We proposed a novel group-permutation based knowledge distillation approach to compressing the deep Transformer model into a shallow model. The experimental results on several benchmarks validate the effectiveness of our method. Our compressed model is 8X shallower than the deep model, with almost no loss in BLEU. To further enhance the teacher model, we present a Skipping Sub-Layer method to randomly omit sub-layers to introduce perturbation into training, which achieves a BLEU score of 30.63 on English-German newstest2014. The code is publicly available at https://github.com/libeineu/GPKD.

READ FULL TEXT
10/08/2020

Shallow-to-Deep Training for Neural Machine Translation

Deep encoders have been proven to be effective in improving neural machi...
09/16/2021

The NiuTrans System for WNGT 2020 Efficiency Task

This paper describes the submissions of the NiuTrans Team to the WNGT 20...
09/16/2021

The NiuTrans System for the WMT21 Efficiency Task

This paper describes the NiuTrans system for the WMT21 translation effic...
06/04/2019

Exploiting Sentential Context for Neural Machine Translation

In this work, we present novel approaches to exploit sentential context ...
12/22/2021

Joint-training on Symbiosis Networks for Deep Nueral Machine Translation models

Deep encoders have been proven to be effective in improving neural machi...
09/08/2021

What's Hidden in a One-layer Randomly Weighted Transformer?

We demonstrate that, hidden within one-layer randomly weighted neural ne...
06/28/2021

R-Drop: Regularized Dropout for Neural Networks

Dropout is a powerful and widely used technique to regularize the traini...

Code Repositories

GPKD

The codebase of paper:Learning Light-Weight Translation Models from Deep Transformer, which is accepted by AAAI2021 conference.


view repo