DeepAI AI Chat
Log In Sign Up

Making EfficientNet More Efficient: Exploring Batch-Independent Normalization, Group Convolutions and Reduced Resolution Training

by   Dominic Masters, et al.

Much recent research has been dedicated to improving the efficiency of training and inference for image classification. This effort has commonly focused on explicitly improving theoretical efficiency, often measured as ImageNet validation accuracy per FLOP. These theoretical savings have, however, proven challenging to achieve in practice, particularly on high-performance training accelerators. In this work, we focus on improving the practical efficiency of the state-of-the-art EfficientNet models on a new class of accelerator, the Graphcore IPU. We do this by extending this family of models in the following ways: (i) generalising depthwise convolutions to group convolutions; (ii) adding proxy-normalized activations to match batch normalization performance with batch-independent statistics; (iii) reducing compute by lowering the training resolution and inexpensively fine-tuning at higher resolution. We find that these three methods improve the practical efficiency for both training and inference. Our code will be made available online.


page 1

page 2

page 3

page 4


Proxy-Normalizing Activations to Match Batch Normalization while Removing Batch Dependence

We investigate the reasons for the performance degradation incurred with...

High-Performance Large-Scale Image Recognition Without Normalization

Batch normalization is a key component of most image classification mode...

Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-Normalized Models

Batch Normalization is quite effective at accelerating and improving the...

Group Whitening: Balancing Learning Efficiency and Representational Capacity

Batch normalization (BN) is an important technique commonly incorporated...

Group Normalization

Batch Normalization (BN) is a milestone technique in the development of ...

Stochastic Normalizations as Bayesian Learning

In this work we investigate the reasons why Batch Normalization (BN) imp...

Dense Prediction on Sequences with Time-Dilated Convolutions for Speech Recognition

In computer vision pixelwise dense prediction is the task of predicting ...