
Sparse sketches with small inversion bias
For a tall n× d matrix A and a random m× n sketching matrix S, the sketc...
read it

HAWQV3: Dyadic Neural Network Quantization
Quantization is one of the key techniques used to make Neural Networks (...
read it

A Statistical Framework for Lowbitwidth Training of Deep Neural Networks
Fully quantized training (FQT), which uses lowbitwidth hardware by quan...
read it

Fast Distributed Training of Deep Neural Networks: Dynamic Communication Thresholding for Model and Data Parallelism
Data Parallelism (DP) and Model Parallelism (MP) are two common paradigm...
read it

MAF: Multimodal Alignment Framework for WeaklySupervised Phrase Grounding
Phrase localization is a task that studies the mapping from textual phra...
read it

Sparse Quantized Spectral Clustering
Given a large data matrix, sparsifying, quantizing, and/or performing ot...
read it

Benchmarking Semisupervised Federated Learning
Federated learning promises to use the computational power of edge devic...
read it

Boundary thickness and robustness in learning models
Robustness of machine learning models to various adversarial and nonadv...
read it

Debiasing Distributed Second Order Optimization with Surrogate Sketching and Scaled Regularization
In distributed second order optimization, a standard strategy is to aver...
read it

Good linear classifiers are abundant in the interpolating regime
Within the machine learning community, the widelyused uniform convergen...
read it

Precise expressions for random projections: Lowrank approximation and randomized Newton
It is often desirable to reduce the dimensionality of a large dataset by...
read it

Multiplicative noise and heavy tails in stochastic optimization
Although stochastic optimization is central to modern machine learning, ...
read it

A random matrix analysis of random Fourier features: beyond the Gaussian kernel, a precise phase transition, and the corresponding double descent
This article characterizes the exact asymptotics of random Fourier featu...
read it

ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning
We introduce AdaHessian, a second order stochastic optimization algorith...
read it

Determinantal Point Processes in Randomized Numerical Linear Algebra
Randomized Numerical Linear Algebra (RandNLA) uses randomness to develop...
read it

Error Estimation for Sketched SVD via the Bootstrap
In order to compute fast approximations to the singular value decomposit...
read it

Forecasting Sequential Data using Consistent Koopman Autoencoders
Recurrent neural networks are widely used on time series data, yet such ...
read it

Asymptotic Analysis of Sampling Estimators for Randomized Numerical Linear Algebra Algorithms
The statistical analysis of Randomized Numerical Linear Algebra (RandNLA...
read it

Stochastic Normalizing Flows
We introduce stochastic normalizing flows, an extension of continuous no...
read it

Improved guarantees and a multipledescent curve for the Column Subset Selection Problem and the Nyström method
The Column Subset Selection Problem (CSSP) and the Nyström method are am...
read it

Predicting trends in the quality of stateoftheart neural networks without access to training or testing data
In many applications, one works with deep neural network (DNN) models tr...
read it

ZeroQ: A Novel Zero Shot Quantization Framework
Quantization is a promising approach for reducing the inference time and...
read it

Exact expressions for double descent and implicit regularization via surrogate random design
Double descent refers to the phase transition that is exhibited by the g...
read it

LSAR: Efficient Leverage Score Sampling Algorithm for the Analysis of Big Time Series Data
We apply methods from randomized numerical linear algebra (RandNLA) to d...
read it

HAWQV2: Hessian Aware traceWeighted Quantization of Neural Networks
Quantization is an effective method for reducing memory footprint and in...
read it

Running Alchemist on Cray XC and CS Series Supercomputers: Dask and PySpark Interfaces, Deployment Options, and Data Transfer Times
Newly developed interfaces for Python, Dask, and PySpark enable the use ...
read it

Limit theorems for outofsample extensions of the adjacency and Laplacian spectral embeddings
Graph embeddings, a class of dimensionality reduction techniques designe...
read it

Bootstrapping the Operator Norm in High Dimensions: Error Estimation for Covariance Matrices and Sketching
Although the operator (spectral) norm is one of the most widely used met...
read it

QBERT: Hessian Based Ultra Low Precision Quantization of BERT
Transformer based architectures have become defacto models used for a r...
read it

The Difficulties of Addressing Interdisciplinary Challenges at the Foundations of Data Science
The National Science Foundation's Transdisciplinary Research in Principl...
read it

On Linear Convergence of Weighted Kernel Herding
We provide a novel convergence analysis of two popular sampling algorith...
read it

Statistical guarantees for local graph clustering
Local graph clustering methods aim to find small clusters in very large ...
read it

Bayesian experimental design using regularized determinantal point processes
In experimental design, we are given n vectors in d dimensions, and our ...
read it

Residual Networks as Nonlinear Systems: Stability Analysis using Linearization
We regard pretrained residual networks (ResNets) as nonlinear systems a...
read it

Distributed estimation of the inverse Hessian by determinantal averaging
In distributed optimization and distributed numerical linear algebra, we...
read it

Physicsinformed Autoencoders for Lyapunovstable Fluid Flow Prediction
In addition to providing highprofile successes in computer vision and n...
read it

JumpReLU: A Retrofit Defense Strategy for Adversarial Attacks
It has been demonstrated that very simple attacks can fool highlysophis...
read it

OverSketched Newton: Fast Convex Optimization for Serverless Systems
Motivated by recent developments in serverless systems for largescale m...
read it

Inefficiency of KFAC for Large Batch Size Training
In stochastic optimization, large batch training can leverage parallel r...
read it

Shallow Learning for Fluid Flow Reconstruction with Limited Sensors and Limited Data
In many applications, it is important to reconstruct a fluid flow field,...
read it

Minimax experimental design: Bridging the gap between statistical and worstcase approaches to least squares regression
In experimental design, we are given a large collection of vectors, each...
read it

HeavyTailed Universality Predicts Trends in Test Accuracies for Very Large PreTrained Deep Neural Networks
Given two or more Deep Neural Networks (DNNs) with the same or similar a...
read it

Traditional and HeavyTailed Self Regularization in Neural Network Models
Random Matrix Theory (RMT) is applied to analyze the weight matrices of ...
read it

On the Computational Inefficiency of Large Batch Sizes for Stochastic Gradient Descent
Increasing the minibatch size for stochastic gradient descent offers si...
read it

Implicit SelfRegularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning
Random Matrix Theory (RMT) is applied to analyze weight matrices of Deep...
read it

NewtonMR: Newton's Method Without Smoothness or Convexity
Establishing global convergence of the classical Newton's method has lon...
read it

Distributed Secondorder Convex Optimization
Convex optimization problems arise frequently in diverse machine learnin...
read it

Alchemist: An Apache Spark <=> MPI Interface
The Apache Spark framework for distributed computation is popular in the...
read it

Accelerating LargeScale Data Analysis by Offloading to HighPerformance Computing Libraries using Alchemist
Apache Spark is a popular system aimed at the analysis of large data set...
read it

Error Estimation for Randomized LeastSquares Algorithms via the Bootstrap
Over the course of the past decade, a variety of randomized algorithms h...
read it
Michael W. Mahoney
is this you? claim profile
Associate Professor at UC Berkeley, Adviser and Data Scientist at Vieu Labs, Research Scientist Stanford University