Accelerating SLIDE Deep Learning on Modern CPUs: Vectorization, Quantizations, Memory Optimizations, and More

03/06/2021
by   Shabnam Daghaghi, et al.
0

Deep learning implementations on CPUs (Central Processing Units) are gaining more traction. Enhanced AI capabilities on commodity x86 architectures are commercially appealing due to the reuse of existing hardware and virtualization ease. A notable work in this direction is the SLIDE system. SLIDE is a C++ implementation of a sparse hash table based back-propagation, which was shown to be significantly faster than GPUs in training hundreds of million parameter neural models. In this paper, we argue that SLIDE's current implementation is sub-optimal and does not exploit several opportunities available in modern CPUs. In particular, we show how SLIDE's computations allow for a unique possibility of vectorization via AVX (Advanced Vector Extensions)-512. Furthermore, we highlight opportunities for different kinds of memory optimization and quantizations. Combining all of them, we obtain up to 7x speedup in the computations on the same hardware. Our experiments are focused on large (hundreds of millions of parameters) recommendation and NLP models. Our work highlights several novel perspectives and opportunities for implementing randomized algorithms for deep learning on modern CPUs. We provide the code and benchmark scripts at https://github.com/RUSH-LAB/SLIDE

READ FULL TEXT
research
03/07/2019

SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems

Deep Learning (DL) algorithms are the central focus of modern machine le...
research
02/25/2022

PLSSVM: A (multi-)GPGPU-accelerated Least Squares Support Vector Machine

Machine learning algorithms must be able to efficiently cope with massiv...
research
09/16/2020

WarpCore: A Library for fast Hash Tables on GPUs

Hash tables are ubiquitous. Properties such as an amortized constant tim...
research
08/11/2020

Woodpecker-DL: Accelerating Deep Neural Networks via Hardware-Aware Multifaceted Optimizations

Accelerating deep model training and inference is crucial in practice. E...
research
07/02/2020

Hardware Acceleration of Sparse and Irregular Tensor Computations of ML Models: A Survey and Insights

Machine learning (ML) models are widely used in many domains including m...
research
10/08/2022

Bottleneck Analysis of Dynamic Graph Neural Network Inference on CPU and GPU

Dynamic graph neural network (DGNN) is becoming increasingly popular bec...
research
05/28/2018

NengoDL: Combining deep learning and neuromorphic modelling methods

NengoDL is a software framework designed to combine the strengths of neu...

Please sign up or login with your details

Forgot password? Click here to reset