
Learning from learning machines: a new generation of AI technology to meet the needs of science
We outline emerging opportunities and challenges to enhance the utility ...
read it

Doubly Adaptive Scaled Algorithm for Machine Learning Using SecondOrder Information
We present a novel adaptive optimization algorithm for largescale machi...
read it

What's Hidden in a Onelayer Randomly Weighted Transformer?
We demonstrate that, hidden within onelayer randomly weighted neural ne...
read it

Characterizing possible failure modes in physicsinformed neural networks
Recent work in scientific machine learning has developed socalled physi...
read it

Generalization Properties of Stochastic Optimizers via Trajectory Analysis
Despite the ubiquitous use of stochastic optimization algorithms in mach...
read it

Taxonomizing local versus global structure in neural network loss landscapes
Viewing neural network models in terms of their loss landscapes has a lo...
read it

NewtonLESS: Sparsification without Tradeoffs for the Sketched Newton Update
In secondorder optimization, a potential bottleneck can be computing th...
read it

Postmortem on a deep learning contest: a Simpson's paradox and the complementary roles of scale metrics versus shape metrics
To understand better the causes of good generalization performance in st...
read it

MLPruning: A Multilevel Structured Pruning Framework for Transformerbased Models
Pruning is an effective method to reduce the memory footprint and comput...
read it

ActNN: Reducing Training Memory Footprint via 2Bit Activation Compressed Training
The increasing size of neural network models has been critical for impro...
read it

QASR: Integeronly Zeroshot Quantization for Efficient Speech Recognition
Endtoend neural network models achieve improved performance on various...
read it

A Survey of Quantization Methods for Efficient Neural Network Inference
As soon as abstract mathematical computations were adapted to computatio...
read it

Hessian Eigenspectra of More Realistic Nonlinear Models
Given an optimization problem, the Hessian matrix and its eigenspectrum ...
read it

HessianAware Pruning and Optimal Neural Implant
Pruning is an effective method to reduce the memory footprint and FLOPs ...
read it

IBERT: Integeronly BERT Quantization
Transformer based models, like BERT and RoBERTa, have achieved stateof...
read it

Sparse sketches with small inversion bias
For a tall n× d matrix A and a random m× n sketching matrix S, the sketc...
read it

HAWQV3: Dyadic Neural Network Quantization
Quantization is one of the key techniques used to make Neural Networks (...
read it

A Statistical Framework for Lowbitwidth Training of Deep Neural Networks
Fully quantized training (FQT), which uses lowbitwidth hardware by quan...
read it

Fast Distributed Training of Deep Neural Networks: Dynamic Communication Thresholding for Model and Data Parallelism
Data Parallelism (DP) and Model Parallelism (MP) are two common paradigm...
read it

MAF: Multimodal Alignment Framework for WeaklySupervised Phrase Grounding
Phrase localization is a task that studies the mapping from textual phra...
read it

Sparse Quantized Spectral Clustering
Given a large data matrix, sparsifying, quantizing, and/or performing ot...
read it

Benchmarking Semisupervised Federated Learning
Federated learning promises to use the computational power of edge devic...
read it

Boundary thickness and robustness in learning models
Robustness of machine learning models to various adversarial and nonadv...
read it

Debiasing Distributed Second Order Optimization with Surrogate Sketching and Scaled Regularization
In distributed second order optimization, a standard strategy is to aver...
read it

Good linear classifiers are abundant in the interpolating regime
Within the machine learning community, the widelyused uniform convergen...
read it

Precise expressions for random projections: Lowrank approximation and randomized Newton
It is often desirable to reduce the dimensionality of a large dataset by...
read it

Multiplicative noise and heavy tails in stochastic optimization
Although stochastic optimization is central to modern machine learning, ...
read it

A random matrix analysis of random Fourier features: beyond the Gaussian kernel, a precise phase transition, and the corresponding double descent
This article characterizes the exact asymptotics of random Fourier featu...
read it

ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning
We introduce AdaHessian, a second order stochastic optimization algorith...
read it

Determinantal Point Processes in Randomized Numerical Linear Algebra
Randomized Numerical Linear Algebra (RandNLA) uses randomness to develop...
read it

Error Estimation for Sketched SVD via the Bootstrap
In order to compute fast approximations to the singular value decomposit...
read it

Forecasting Sequential Data using Consistent Koopman Autoencoders
Recurrent neural networks are widely used on time series data, yet such ...
read it

Asymptotic Analysis of Sampling Estimators for Randomized Numerical Linear Algebra Algorithms
The statistical analysis of Randomized Numerical Linear Algebra (RandNLA...
read it

Stochastic Normalizing Flows
We introduce stochastic normalizing flows, an extension of continuous no...
read it

Improved guarantees and a multipledescent curve for the Column Subset Selection Problem and the Nyström method
The Column Subset Selection Problem (CSSP) and the Nyström method are am...
read it

Predicting trends in the quality of stateoftheart neural networks without access to training or testing data
In many applications, one works with deep neural network (DNN) models tr...
read it

ZeroQ: A Novel Zero Shot Quantization Framework
Quantization is a promising approach for reducing the inference time and...
read it

Exact expressions for double descent and implicit regularization via surrogate random design
Double descent refers to the phase transition that is exhibited by the g...
read it

LSAR: Efficient Leverage Score Sampling Algorithm for the Analysis of Big Time Series Data
We apply methods from randomized numerical linear algebra (RandNLA) to d...
read it

HAWQV2: Hessian Aware traceWeighted Quantization of Neural Networks
Quantization is an effective method for reducing memory footprint and in...
read it

Running Alchemist on Cray XC and CS Series Supercomputers: Dask and PySpark Interfaces, Deployment Options, and Data Transfer Times
Newly developed interfaces for Python, Dask, and PySpark enable the use ...
read it

Limit theorems for outofsample extensions of the adjacency and Laplacian spectral embeddings
Graph embeddings, a class of dimensionality reduction techniques designe...
read it

Bootstrapping the Operator Norm in High Dimensions: Error Estimation for Covariance Matrices and Sketching
Although the operator (spectral) norm is one of the most widely used met...
read it

QBERT: Hessian Based Ultra Low Precision Quantization of BERT
Transformer based architectures have become defacto models used for a r...
read it

The Difficulties of Addressing Interdisciplinary Challenges at the Foundations of Data Science
The National Science Foundation's Transdisciplinary Research in Principl...
read it

On Linear Convergence of Weighted Kernel Herding
We provide a novel convergence analysis of two popular sampling algorith...
read it

Statistical guarantees for local graph clustering
Local graph clustering methods aim to find small clusters in very large ...
read it

Bayesian experimental design using regularized determinantal point processes
In experimental design, we are given n vectors in d dimensions, and our ...
read it

Residual Networks as Nonlinear Systems: Stability Analysis using Linearization
We regard pretrained residual networks (ResNets) as nonlinear systems a...
read it

Distributed estimation of the inverse Hessian by determinantal averaging
In distributed optimization and distributed numerical linear algebra, we...
read it
Michael W. Mahoney
is this you? claim profile
Associate Professor at UC Berkeley, Adviser and Data Scientist at Vieu Labs, Research Scientist Stanford University