No Multiplication? No Floating Point? No Problem! Training Networks for Efficient Inference

09/24/2018
by   Shumeet Baluja, et al.
0

For successful deployment of deep neural networks on highly--resource-constrained devices (hearing aids, earbuds, wearables), we must simplify the types of operations and the memory/power resources used during inference. Completely avoiding inference-time floating-point operations is one of the simplest ways to design networks for these highly-constrained environments. By discretizing both our in-network non-linearities and our network weights, we can move to simple, compact networks without floating point operations, without multiplications, and avoid all non-linear function computations. Our approach allows us to explore the spectrum of possible networks, ranging from fully continuous versions down to networks with bi-level weights and activations. Our results show that discretization can be done without loss of performance and that we can train a network that will successfully operate without floating-point, without multiplication, and with less RAM on both regression tasks (auto encoding) and multi-class classification tasks (ImageNet). The memory needed to deploy our discretized networks is less than one third of the equivalent architecture that does use floating-point operations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/01/2018

Binarized Convolutional Neural Networks for Efficient Inference on GPUs

Convolutional neural networks have recently achieved significant breakth...
research
11/18/2017

MorphNet: Fast & Simple Resource-Constrained Structure Learning of Deep Networks

We present MorphNet, an approach to automate the design of neural networ...
research
10/27/2016

Accelerating BLAS and LAPACK via Efficient Floating Point Architecture Design

Basic Linear Algebra Subprograms (BLAS) and Linear Algebra Package (LAPA...
research
03/07/2023

Run, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks

To design fast neural networks, many works have been focusing on reducin...
research
05/26/2023

Hardware-Efficient Transformer Training via Piecewise Affine Operations

Multiplications are responsible for most of the computational cost invol...
research
01/16/2022

Instant Neural Graphics Primitives with a Multiresolution Hash Encoding

Neural graphics primitives, parameterized by fully connected neural netw...
research
12/07/2020

Deep Neural Network Training without Multiplications

Is multiplication really necessary for deep neural networks? Here we pro...

Please sign up or login with your details

Forgot password? Click here to reset