SplitNet: Divide and Co-training

by   Shuai Zhao, et al.

The width of a neural network matters since increasing the width will necessarily increase the model capacity. However, the performance of a network does not improve linearly with the width and soon gets saturated. To tackle this problem, we propose to increase the number of networks rather than purely scaling up the width. To prove it, one large network is divided into several small ones, and each of these small networks has a fraction of the original one's parameters. We then train these small networks together and make them see various views of the same data to learn different and complementary knowledge. During this co-training process, networks can also learn from each other. As a result, small networks can achieve better ensemble performance than the large one with few or no extra parameters or FLOPs. This reveals that the number of networks is a new dimension of effective model scaling, besides depth/width/resolution. Small networks can also achieve faster inference speed than the large one by concurrent running on different devices. We validate the idea – increasing the number of networks is a new dimension of effective model scaling – with different network architectures on common benchmarks through extensive experiments. The code is available at <https://github.com/mzhaoshuai/SplitNet-Divide-and-Co-training>.


page 1

page 2

page 3

page 4


EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

Convolutional Neural Networks (ConvNets) are commonly developed at a fix...

Revisiting Residual Networks for Adversarial Robustness: An Architectural Perspective

Efforts to improve the adversarial robustness of convolutional neural ne...

Universally Slimmable Networks and Improved Training Techniques

Slimmable networks are a family of neural networks that can instantly ad...

Fast and Accurate Model Scaling

In this work we analyze strategies for convolutional neural network scal...

Truncating Wide Networks using Binary Tree Architectures

Recent study shows that a wide deep network can obtain accuracy comparab...

Scaled-YOLOv4: Scaling Cross Stage Partial Network

We show that the YOLOv4 object detection neural network based on the CSP...

Please sign up or login with your details

Forgot password? Click here to reset