Fraternal Dropout

10/31/2017
by   Konrad Zolna, et al.
0

Recurrent neural networks (RNNs) are important class of architectures among neural networks useful for language modeling and sequential prediction. However, optimizing RNNs is known to be harder compared to feed-forward neural networks. A number of techniques have been proposed in literature to address this problem. In this paper we propose a simple technique called fraternal dropout that takes advantage of dropout to achieve this goal. Specifically, we propose to train two identical copies of an RNN (that share parameters) with different dropout masks while minimizing the difference between their (pre-softmax) predictions. In this way our regularization encourages the representations of RNNs to be invariant to dropout mask, thus being robust. We show that our regularization term is upper bounded by the expectation-linear dropout objective which has been shown to address the gap due to the difference between the train and inference phases of dropout. We evaluate our model and achieve state-of-the-art results in sequence modeling tasks on two benchmark datasets - Penn Treebank and Wikitext-2. We also show that our approach leads to performance improvement by a significant margin in image captioning (Microsoft COCO) and semi-supervised (CIFAR-10) tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/22/2019

Adversarial Dropout for Recurrent Neural Networks

Successful application processing sequential data, such as text and spee...
research
09/26/2016

Dropout with Expectation-linear Regularization

Dropout, a simple and effective way to train deep neural networks, has l...
research
10/22/2018

An Exploration of Dropout with RNNs for Natural Language Inference

Dropout is a crucial regularization technique for the Recurrent Neural N...
research
07/31/2017

Bayesian Sparsification of Recurrent Neural Networks

Recurrent neural networks show state-of-the-art results in many text ana...
research
10/21/2014

Regularizing Recurrent Networks - On Injected Noise and Norm-based Methods

Advancements in parallel processing have lead to a surge in multilayer p...
research
11/04/2013

On Fast Dropout and its Applicability to Recurrent Networks

Recurrent Neural Networks (RNNs) are rich models for the processing of s...
research
01/23/2013

Regularization and nonlinearities for neural language models: when are they needed?

Neural language models (LMs) based on recurrent neural networks (RNN) ar...

Please sign up or login with your details

Forgot password? Click here to reset