BOLT: An Automated Deep Learning Framework for Training and Deploying Large-Scale Neural Networks on Commodity CPU Hardware

03/30/2023
by   Nicholas Meisburger, et al.
5

Efficient large-scale neural network training and inference on commodity CPU hardware is of immense practical significance in democratizing deep learning (DL) capabilities. Presently, the process of training massive models consisting of hundreds of millions to billions of parameters requires the extensive use of specialized hardware accelerators, such as GPUs, which are only accessible to a limited number of institutions with considerable financial resources. Moreover, there is often an alarming carbon footprint associated with training and deploying these models. In this paper, we address these challenges by introducing BOLT, a sparse deep learning library for training massive neural network models on standard CPU hardware. BOLT provides a flexible, high-level API for constructing models that will be familiar to users of existing popular DL frameworks. By automatically tuning specialized hyperparameters, BOLT also abstracts away the algorithmic details of sparse network training. We evaluate BOLT on a number of machine learning tasks drawn from recommendations, search, natural language processing, and personalization. We find that our proposed system achieves competitive performance with state-of-the-art techniques at a fraction of the cost and energy consumption and an order-of-magnitude faster inference time. BOLT has also been successfully deployed by multiple businesses to address critical problems, and we highlight one customer deployment case study in the field of e-commerce.

READ FULL TEXT
research
03/07/2019

SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems

Deep Learning (DL) algorithms are the central focus of modern machine le...
research
06/18/2020

Caffe Barista: Brewing Caffe with FPGAs in the Training Loop

As the complexity of deep learning (DL) models increases, their compute ...
research
09/03/2023

FusionAI: Decentralized Training and Deploying LLMs with Massive Consumer-Level GPUs

The rapid growth of memory and computation requirements of large languag...
research
01/29/2022

Distributed SLIDE: Enabling Training Large Neural Networks on Low Bandwidth and Simple CPU-Clusters via Model Parallelism and Sparsity

More than 70 of these idle compute are cheap CPUs with few cores that ar...
research
09/14/2018

Hardware-Aware Machine Learning: Modeling and Optimization

Recent breakthroughs in Deep Learning (DL) applications have made DL mod...
research
10/08/2021

M6-10T: A Sharing-Delinking Paradigm for Efficient Multi-Trillion Parameter Pretraining

Recent expeditious developments in deep learning algorithms, distributed...
research
12/04/2018

Auto-tuning TensorFlow Threading Model for CPU Backend

TensorFlow is a popular deep learning framework used by data scientists ...

Please sign up or login with your details

Forgot password? Click here to reset