
CoSA: Scheduling by Constrained Optimization for Spatial Accelerators
Recent advances in Deep Neural Networks (DNNs) have led to active develo...
read it

Avoiding Communication in Logistic Regression
Stochastic gradient descent (SGD) is one of the most widely used optimiz...
read it

Training EfficientNets at Supercomputer Scale: 83 Accuracy in One Hour
EfficientNets are a family of stateoftheart image classification mode...
read it

The Limit of the Batch Size
Largebatch training is an efficient approach for current distributed de...
read it

CommunicationOptimal Tilings for Projective Nested Loops with Arbitrary Bounds
Reducing communication  either between levels of a memory hierarchy or ...
read it

AutoPrecision Scaling for Distributed Deep Learning
In recent years, largebatch optimization is becoming the key of distrib...
read it

An improved analysis and unified perspective on deterministic and randomized low rank matrix approximations
We introduce a Generalized LUFactorization (GLU) for lowrank matrix ap...
read it

A Generalized Randomized RankRevealing Factorization
We introduce a Generalized Randomized QRdecomposition that may be appli...
read it

Reducing BERT PreTraining Time from 3 Days to 76 Minutes
Largebatch training is key to speeding up deep neural network training ...
read it

LargeBatch Training for LSTM and Beyond
Largebatch training approaches have enabled researchers to utilize larg...
read it

A 3D Parallel Algorithm for QR Decomposition
Interprocessor communication often dominates the runtime of large matrix...
read it

Accurate, Fast and Scalable Kernel Ridge Regression on Parallel and Distributed Systems
We propose two new methods to address the weak scaling problems of KRR: ...
read it

CommunicationOptimal Convolutional Neural Nets
Efficiently executing convolutional neural nets (CNNs) is important in m...
read it

Avoiding Synchronization in FirstOrder Methods for Sparse Convex Optimization
Parallel computing has played an important role in speeding up convex op...
read it

Avoiding Communication in Proximal Methods for Convex Optimization Problems
The fast iterative soft thresholding algorithm (FISTA) is used to solve ...
read it

ImageNet Training in Minutes
Finishing 90epoch ImageNet1k training with ResNet50 on a NVIDIA M40 G...
read it

Communication Lower Bounds of Bilinear Algorithms for Symmetric Tensor Contractions
Accurate numerical calculations of electronic structure are often domina...
read it
James Demmel
is this you? claim profile
EECS Department Chair and Professor of Mathematics and Computer Science at University of California Berkeley