Backprop with Approximate Activations for Memory-efficient Network Training
Larger and deeper neural network architectures deliver improved accuracy on a variety of tasks, but also require a large amount of memory for training to store intermediate activations for back-propagation. We introduce an approximation strategy to significantly reduce this memory footprint, with minimal effect on training performance and negligible computational cost. Our method replaces intermediate activations with lower-precision approximations to free up memory, after the full-precision versions have been used for computation in subsequent layers in the forward pass. Only these approximate activations are retained for use in the backward pass. Compared to naive low-precision computation, our approach limits the accumulation of errors across layers and allows the use of much lower-precision approximations without affecting training accuracy. Experiments on CIFAR and ImageNet show that our method yields performance comparable to full-precision training, while storing activations at a fraction of the memory cost with 8- and even 4-bit fixed-point precision.
READ FULL TEXT