Learning Light-Weight Translation Models from Deep Transformer

12/27/2020
by   Bei Li, et al.
19

Recently, deep models have shown tremendous improvements in neural machine translation (NMT). However, systems of this kind are computationally expensive and memory intensive. In this paper, we take a natural step towards learning strong but light-weight NMT systems. We proposed a novel group-permutation based knowledge distillation approach to compressing the deep Transformer model into a shallow model. The experimental results on several benchmarks validate the effectiveness of our method. Our compressed model is 8X shallower than the deep model, with almost no loss in BLEU. To further enhance the teacher model, we present a Skipping Sub-Layer method to randomly omit sub-layers to introduce perturbation into training, which achieves a BLEU score of 30.63 on English-German newstest2014. The code is publicly available at https://github.com/libeineu/GPKD.

READ FULL TEXT
10/08/2020

Shallow-to-Deep Training for Neural Machine Translation

Deep encoders have been proven to be effective in improving neural machi...
09/16/2021

The NiuTrans System for WNGT 2020 Efficiency Task

This paper describes the submissions of the NiuTrans Team to the WNGT 20...
05/27/2021

Selective Knowledge Distillation for Neural Machine Translation

Neural Machine Translation (NMT) models achieve state-of-the-art perform...
12/22/2021

Joint-training on Symbiosis Networks for Deep Nueral Machine Translation models

Deep encoders have been proven to be effective in improving neural machi...
09/16/2021

The NiuTrans System for the WMT21 Efficiency Task

This paper describes the NiuTrans system for the WMT21 translation effic...
09/08/2021

What's Hidden in a One-layer Randomly Weighted Transformer?

We demonstrate that, hidden within one-layer randomly weighted neural ne...
06/02/2023

Binary and Ternary Natural Language Generation

Ternary and binary neural networks enable multiplication-free computatio...

Code Repositories

GPKD

The codebase of paper:Learning Light-Weight Translation Models from Deep Transformer, which is accepted by AAAI2021 conference.


view repo

Please sign up or login with your details

Forgot password? Click here to reset