Normalization of Input-output Shared Embeddings in Text Generation Models

01/22/2020
by   Jinyang Liu, et al.
0

Neural Network based models have been state-of-the-art models for various Natural Language Processing tasks, however, the input and output dimension problem in the networks has still not been fully resolved, especially in text generation tasks (e.g. Machine Translation, Text Summarization), in which input and output both have huge sizes of vocabularies. Therefore, input-output embedding weight sharing has been introduced and adopted widely, which remains to be improved. Based on linear algebra and statistical theories, this paper locates the shortcoming of existed input-output embedding weight sharing method, then raises methods for improving input-output weight shared embedding, among which methods of normalization of embedding weight matrices show best performance. These methods are nearly computational cost-free, can get combined with other embedding techniques, and show good effectiveness when applied on state-of-the-art Neural Network models. For Transformer-big models, the normalization techniques can get at best 0.6 BLEU improvement compared to the original version of model on WMT'16 En-De dataset, and similar BLEU improvements on IWSLT 14' datasets. For DynamicConv models, 0.5 BLEU improvement can be attained on WMT'16 En-De dataset, and 0.41 BLEU improvement on IWSLT 14' De-En translation task is achieved.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/31/2018

Beyond Weight Tying: Learning Joint Input-Output Embeddings for Neural Machine Translation

Tying the weights of the target word embeddings with the target word cla...
research
05/21/2018

Category coding with neural network application

In many applications of neural network, it is common to introduce huge a...
research
08/20/2016

Using the Output Embedding to Improve Language Models

We study the topmost weight matrix of neural network language models. We...
research
06/02/2023

Binary and Ternary Natural Language Generation

Ternary and binary neural networks enable multiplication-free computatio...
research
04/16/2020

Non-Autoregressive Machine Translation with Latent Alignments

This paper investigates two latent alignment models for non-autoregressi...
research
06/13/2017

Getting deep recommenders fit: Bloom embeddings for sparse binary input/output networks

Recommendation algorithms that incorporate techniques from deep learning...
research
08/19/2023

Data-to-text Generation for Severely Under-Resourced Languages with GPT-3.5: A Bit of Help Needed from Google Translate

LLMs like GPT are great at tasks involving English which dominates in th...

Please sign up or login with your details

Forgot password? Click here to reset