PoF: Post-Training of Feature Extractor for Improving Generalization

07/05/2022
by   Ikuro Sato, et al.
0

It has been intensively investigated that the local shape, especially flatness, of the loss landscape near a minimum plays an important role for generalization of deep models. We developed a training algorithm called PoF: Post-Training of Feature Extractor that updates the feature extractor part of an already-trained deep model to search a flatter minimum. The characteristics are two-fold: 1) Feature extractor is trained under parameter perturbations in the higher-layer parameter space, based on observations that suggest flattening higher-layer parameter space, and 2) the perturbation range is determined in a data-driven manner aiming to reduce a part of test loss caused by the positive loss curvature. We provide a theoretical analysis that shows the proposed algorithm implicitly reduces the target Hessian components as well as the loss. Experimental results show that PoF improved model performance against baseline methods on both CIFAR-10 and CIFAR-100 datasets for only 10-epoch post-training, and on SVHN dataset for 50-epoch post-training. Source code is available at: <https://github.com/DensoITLab/PoF-v1>

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/06/2022

LilNetX: Lightweight Networks with EXtreme Model Compression and Structured Sparsification

We introduce LilNetX, an end-to-end trainable technique for neural netwo...
research
09/14/2023

Interpretability-Aware Vision Transformer

Vision Transformers (ViTs) have become prominent models for solving vari...
research
02/12/2020

Stabilizing Differentiable Architecture Search via Perturbation-based Regularization

Differentiable architecture search (DARTS) is a prevailing NAS solution ...
research
06/27/2023

One-class systems seamlessly fit in the forward-forward algorithm

The forward-forward algorithm presents a new method of training neural n...
research
03/14/2020

Investigating Generalization in Neural Networks under Optimally Evolved Training Perturbations

In this paper, we study the generalization properties of neural networks...
research
06/12/2020

Online Sequential Extreme Learning Machines: Features Combined From Hundreds of Midlayers

In this paper, we develop an algorithm called hierarchal online sequenti...
research
02/08/2022

Phase-Stretch Adaptive Gradient-Field Extractor (PAGE)

Phase-Stretch Adaptive Gradient-Field Extractor (PAGE) is an edge detect...

Please sign up or login with your details

Forgot password? Click here to reset