Drill the Cork of Information Bottleneck by Inputting the Most Important Data

05/15/2021
by   Xinyu Peng, et al.
20

Deep learning has become the most powerful machine learning tool in the last decade. However, how to efficiently train deep neural networks remains to be thoroughly solved. The widely used minibatch stochastic gradient descent (SGD) still needs to be accelerated. As a promising tool to better understand the learning dynamic of minibatch SGD, the information bottleneck (IB) theory claims that the optimization process consists of an initial fitting phase and the following compression phase. Based on this principle, we further study typicality sampling, an efficient data selection method, and propose a new explanation of how it helps accelerate the training process of the deep networks. We show that the fitting phase depicted in the IB theory will be boosted with a high signal-to-noise ratio of gradient approximation if the typicality sampling is appropriately adopted. Furthermore, this finding also implies that the prior information of the training set is critical to the optimization process and the better use of the most important data can help the information flow through the bottleneck faster. Both theoretical analysis and experimental results on synthetic and real-world datasets demonstrate our conclusions.

READ FULL TEXT

page 1

page 6

page 7

page 9

research
12/24/2022

Visualizing Information Bottleneck through Variational Inference

The Information Bottleneck theory provides a theoretical and computation...
research
07/01/2023

Residual-based attention and connection to information bottleneck theory in PINNs

Driven by the need for more efficient and seamless integration of physic...
research
11/09/2019

Information Bottleneck Methods on Convolutional Neural Networks

Recent year, many researches attempt to open the black box of deep neura...
research
04/12/2018

Asynchronous Parallel Sampling Gradient Boosting Decision Tree

With the development of big data technology, Gradient Boosting Decision ...
research
04/12/2018

Asynch-SGBDT: Asynchronous Parallel Stochastic Gradient Boosting Decision Tree based on Parameters Server

Gradient Boosting Decision Tree, i.e. GBDT, becomes one of the most impo...
research
09/20/2018

Sparsified SGD with Memory

Huge scale machine learning problems are nowadays tackled by distributed...
research
10/15/2019

Extracting robust and accurate features via a robust information bottleneck

We propose a novel strategy for extracting features in supervised learni...

Please sign up or login with your details

Forgot password? Click here to reset