Parallel training of linear models without compromising convergence

11/05/2018
by   Nikolas Ioannou, et al.
0

In this paper we analyze, evaluate, and improve the performance of training generalized linear models on modern CPUs. We start with a state-of-the-art asynchronous parallel training algorithm, identify system-level performance bottlenecks, and apply optimizations that improve data parallelism, cache line locality, and cache line prefetching of the algorithm. These modifications reduce the per-epoch run-time significantly, but take a toll on algorithm convergence in terms of the required number of epochs. To alleviate these shortcomings of our systems-optimized version, we propose a novel, dynamic data partitioning scheme across threads which allows us to approach the convergence of the sequential version. The combined set of optimizations result in a consistent bottom line speedup in convergence of up to ×12 compared to the initial asynchronous parallel training algorithm and up to ×42, compared to state of the art implementations (scikit-learn and h2o) on a range of multi-core CPU architectures.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/18/2019

SySCD: A System-Aware Parallel Coordinate Descent Algorithm

In this paper we propose a novel parallel stochastic coordinate descent ...
research
06/15/2016

ASAGA: Asynchronous Parallel SAGA

We describe ASAGA, an asynchronous parallel version of the incremental g...
research
02/24/2018

Stochastic Gradient Descent on Highly-Parallel Architectures

There is an increased interest in building data analytics frameworks wit...
research
10/15/2019

Breadth-first, Depth-next Training of Random Forests

In this paper we analyze, evaluate, and improve the performance of train...
research
10/22/2015

ZNN - A Fast and Scalable Algorithm for Training 3D Convolutional Networks on Multi-Core and Many-Core Shared Memory Machines

Convolutional networks (ConvNets) have become a popular approach to comp...
research
12/22/2022

Accelerating Barnes-Hut t-SNE Algorithm by Efficient Parallelization on Multi-Core CPUs

t-SNE remains one of the most popular embedding techniques for visualizi...
research
05/14/2020

MixML: A Unified Analysis of Weakly Consistent Parallel Learning

Parallelism is a ubiquitous method for accelerating machine learning alg...

Please sign up or login with your details

Forgot password? Click here to reset