LABO: Towards Learning Optimal Label Regularization via Bi-level Optimization

05/08/2023
by   Peng Lu, et al.
0

Regularization techniques are crucial to improving the generalization performance and training efficiency of deep neural networks. Many deep learning algorithms rely on weight decay, dropout, batch/layer normalization to converge faster and generalize. Label Smoothing (LS) is another simple, versatile and efficient regularization which can be applied to various supervised classification tasks. Conventional LS, however, regardless of the training instance assumes that each non-target class is equally likely. In this work, we present a general framework for training with label regularization, which includes conventional LS but can also model instance-specific variants. Based on this formulation, we propose an efficient way of learning LAbel regularization by devising a Bi-level Optimization (LABO) problem. We derive a deterministic and interpretable solution of the inner loop as the optimal label smoothing without the need to store the parameters or the output of a trained model. Finally, we conduct extensive experiments and demonstrate our LABO consistently yields improvement over conventional label regularization on various fields, including seven machine translation and three image classification tasks across various

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/15/2019

Adaptive Regularization of Labels

Recently, a variety of regularization techniques have been widely applie...
research
06/15/2022

ALASCA: Rethinking Label Smoothing for Deep Learning Under Label Noise

As label noise, one of the most popular distribution shifts, severely de...
research
03/06/2022

Focus on the Target's Vocabulary: Masked Label Smoothing for Machine Translation

Label smoothing and vocabulary sharing are two widely used techniques in...
research
12/19/2022

Rethinking Label Smoothing on Multi-hop Question Answering

Label smoothing is a regularization technique widely used in supervised ...
research
08/02/2018

Normalization Before Shaking Toward Learning Symmetrically Distributed Representation Without Margin in Speech Emotion Recognition

Regularization is crucial to the success of many practical deep learning...
research
05/05/2021

Modulating Regularization Frequency for Efficient Compression-Aware Model Training

While model compression is increasingly important because of large neura...
research
01/05/2020

Self-Orthogonality Module: A Network Architecture Plug-in for Learning Orthogonal Filters

In this paper, we investigate the empirical impact of orthogonality regu...

Please sign up or login with your details

Forgot password? Click here to reset