Improving Neural Language Modeling via Adversarial Training

06/10/2019
by   Dilin Wang, et al.
0

Recently, substantial progress has been made in language modeling by using deep neural networks. However, in practice, large scale neural language models have been shown to be prone to overfitting. In this paper, we present a simple yet highly effective adversarial training mechanism for regularizing neural language models. The idea is to introduce adversarial noise to the output embedding layer while training the models. We show that the optimal adversarial noise yields a simple closed-form solution, thus allowing us to develop a simple and time efficient algorithm. Theoretically, we show that our adversarial mechanism effectively encourages the diversity of the embedding vectors, helping to increase the robustness of models. Empirically, we show that our method improves on the single model state-of-the-art results for language modeling on Penn Treebank (PTB) and Wikitext-2, achieving test perplexity scores of 46.01 and 38.07, respectively. When applied to machine translation, our method improves over various transformer-based translation baselines in BLEU scores on the WMT14 English-German and IWSLT14 German-English tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/29/2021

Korean-English Machine Translation with Multiple Tokenization Strategy

This work was conducted to find out how tokenization methods affect the ...
research
06/28/2021

R-Drop: Regularized Dropout for Neural Networks

Dropout is a powerful and widely used technique to regularize the traini...
research
09/25/2018

Fast and Simple Mixture of Softmaxes with BPE and Hybrid-LightRNN for Language Generation

Mixture of Softmaxes (MoS) has been shown to be effective at addressing ...
research
09/25/2019

Reducing Transformer Depth on Demand with Structured Dropout

Overparameterized transformer networks have obtained state of the art re...
research
11/08/2016

Unsupervised Pretraining for Sequence to Sequence Learning

Sequence to sequence models are successful tools for supervised sequence...
research
08/20/2015

Auto-Sizing Neural Networks: With Applications to n-gram Language Models

Neural networks have been shown to improve performance across a range of...
research
09/29/2020

A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation

Adversarial training has been shown effective at endowing the learned re...

Please sign up or login with your details

Forgot password? Click here to reset