Stochastic Shared Embeddings: Data-driven Regularization of Embedding Layers

05/25/2019
by   Liwei Wu, et al.
12

In deep neural nets, lower level embedding layers account for a large portion of the total number of parameters. Tikhonov regularization, graph-based regularization, and hard parameter sharing are approaches that introduce explicit biases into training in a hope to reduce statistical complexity. Alternatively, we propose stochastically shared embeddings (SSE), a data-driven approach to regularizing embedding layers, which stochastically transitions between embeddings during stochastic gradient descent (SGD). Because SSE integrates seamlessly with existing SGD algorithms, it can be used with only minor modifications when training large scale neural networks. We develop two versions of SSE: SSE-Graph using knowledge graphs of embeddings; SSE-SE using no prior information. We provide theoretical guarantees for our method and show its empirical effectiveness on 6 distinct tasks, from simple neural networks with one hidden layer in recommender systems, to the transformer and BERT in natural languages. We find that when used along with widely-used regularization methods such as weight decay and dropout, our proposed SSE can further reduce overfitting, which often leads to more favorable generalization results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/15/2019

The Impact of Neural Network Overparameterization on Gradient Confusion and Stochastic Gradient Descent

The goal of this paper is to study why stochastic gradient descent (SGD)...
research
11/17/2020

Contrastive Weight Regularization for Large Minibatch SGD

The minibatch stochastic gradient descent method (SGD) is widely applied...
research
11/12/2018

Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers

Neural networks have great success in many machine learning applications...
research
02/06/2021

The Implicit Biases of Stochastic Gradient Descent on Deep Neural Networks with Batch Normalization

Deep neural networks with batch normalization (BN-DNNs) are invariant to...
research
08/15/2015

A Comparative Study on Regularization Strategies for Embedding-based Neural Networks

This paper aims to compare different regularization strategies to addres...
research
05/30/2023

Auto-tune: PAC-Bayes Optimization over Prior and Posterior for Neural Networks

It is widely recognized that the generalization ability of neural networ...
research
04/06/2023

Spectral Gap Regularization of Neural Networks

We introduce Fiedler regularization, a novel approach for regularizing n...

Please sign up or login with your details

Forgot password? Click here to reset