Training wide residual networks for deployment using a single bit for each weight

by   Mark D. McDonnell, et al.

For fast and energy-efficient deployment of trained deep neural networks on resource-constrained embedded hardware, each learned weight parameter should ideally be represented and stored using a single bit. Error-rates usually increase when this requirement is imposed. Here, we report large improvements in error rates on multiple datasets, for deep convolutional neural networks deployed with 1-bit-per-weight. Using wide residual networks as our main baseline, our approach simplifies existing methods that binarize weights by applying the sign function in training; we apply scaling factors for each layer with constant unlearned values equal to the layer-specific standard deviations used for initialization. For CIFAR-10, CIFAR-100 and ImageNet, and models with 1-bit-per-weight requiring less than 10 MB of parameter memory, we achieve error rates of 3.9 also considered MNIST, SVHN and ImageNet32, achieving 1-bit-per-weight test results of 0.27 rates halve previously reported values, and are within about 1 error-rates for the same network with full-precision weights. For networks that overfit, we also show significant improvements in error rate by not learning batch normalization scale and offset parameters. This applies to both full precision and 1-bit-per-weight networks. Using a warm-restart learning-rate schedule, we found that training for 1-bit-per-weight is just as fast as full-precision networks, with better accuracy than standard schedules, and achieved about 98 CIFAR-10/100. For full training code and trained models in MATLAB, Keras and PyTorch see .


page 1

page 2

page 3

page 4


S^3: Sign-Sparse-Shift Reparametrization for Effective Training of Low-bit Shift Networks

Shift neural networks reduce computation complexity by removing expensiv...

Efficient Stochastic Inference of Bitwise Deep Neural Networks

Recently published methods enable training of bitwise neural networks wh...

Towards Explainable Bit Error Tolerance of Resistive RAM-Based Binarized Neural Networks

Non-volatile memory, such as resistive RAM (RRAM), is an emerging energy...

Binary Input Layer: Training of CNN models with binary input data

For the efficient execution of deep convolutional neural networks (CNN) ...

Magnetoresistive RAM for error resilient XNOR-Nets

We trained three Binarized Convolutional Neural Network architectures (L...

Single-bit-per-weight deep convolutional neural networks without batch-normalization layers for embedded systems

Batch-normalization (BN) layers are thought to be an integrally importan...

SiMaN: Sign-to-Magnitude Network Binarization

Binary neural networks (BNNs) have attracted broad research interest due...

Please sign up or login with your details

Forgot password? Click here to reset