Adaptive Optimizers with Sparse Group Lasso for Neural Networks in CTR Prediction

07/30/2021
by   Yun Yue, et al.
0

We develop a novel framework that adds the regularizers of the sparse group lasso to a family of adaptive optimizers in deep learning, such as Momentum, Adagrad, Adam, AMSGrad, AdaHessian, and create a new class of optimizers, which are named Group Momentum, Group Adagrad, Group Adam, Group AMSGrad and Group AdaHessian, etc., accordingly. We establish theoretically proven convergence guarantees in the stochastic convex settings, based on primal-dual methods. We evaluate the regularized effect of our new optimizers on three large-scale real-world ad click datasets with state-of-the-art deep learning models. The experimental results reveal that compared with the original optimizers with the post-processing procedure which uses the magnitude pruning method, the performance of the models can be significantly improved on the same sparsity level. Furthermore, in comparison to the cases without magnitude pruning, our methods can achieve extremely high sparsity with significantly better or highly competitive performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/05/2022

sparsegl: An R Package for Estimating Sparse Group Lasso

The sparse group lasso is a high-dimensional regression technique that i...
research
08/24/2020

Hierarchical Adaptive Lasso: Learning Sparse Neural Networks with Shrinkage via Single Stage Training

Deep neural networks achieve state-of-the-art performance in a variety o...
research
08/08/2020

Error Bounds for Generalized Group Sparsity

In high-dimensional statistical inference, sparsity regularizations have...
research
07/11/2018

Modified Regularized Dual Averaging Method for Training Sparse Convolutional Neural Networks

We proposed a modified regularized dual averaging method for training sp...
research
06/07/2010

C-HiLasso: A Collaborative Hierarchical Sparse Modeling Framework

Sparse modeling is a powerful framework for data analysis and processing...
research
08/17/2023

A comprehensive study of spike and slab shrinkage priors for structurally sparse Bayesian neural networks

Network complexity and computational efficiency have become increasingly...
research
11/29/2019

Sparsely Grouped Input Variables for Neural Networks

In genomic analysis, biomarker discovery, image recognition, and other s...

Please sign up or login with your details

Forgot password? Click here to reset