
HAWQV2: Hessian Aware traceWeighted Quantization of Neural Networks
Quantization is an effective method for reducing memory footprint and in...
read it

Minimax experimental design: Bridging the gap between statistical and worstcase approaches to least squares regression
In experimental design, we are given a large collection of vectors, each...
read it

ZeroQ: A Novel Zero Shot Quantization Framework
Quantization is a promising approach for reducing the inference time and...
read it

Inefficiency of KFAC for Large Batch Size Training
In stochastic optimization, large batch training can leverage parallel r...
read it

ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning
We introduce AdaHessian, a second order stochastic optimization algorith...
read it

HeavyTailed Universality Predicts Trends in Test Accuracies for Very Large PreTrained Deep Neural Networks
Given two or more Deep Neural Networks (DNNs) with the same or similar a...
read it

DCAR: A Discriminative and Compact Audio Representation to Improve Event Detection
This paper presents a novel twophase method for audio representation, D...
read it

Rethinking generalization requires revisiting old ideas: statistical mechanics approaches and complex learning behavior
We describe an approach to understand the peculiar and counterintuitive ...
read it

GIANT: Globally Improved Approximate Newton Method for Distributed Optimization
For distributed computing environments, we consider the canonical machin...
read it

Capacity Releasing Diffusion for Speed and Locality
Diffusions and related random walk procedures are of central importance ...
read it

SecondOrder Optimization for NonConvex Machine Learning: An Empirical Study
The resurgence of deep learning, as a highly effective machine learning ...
read it

Scalable Kernel KMeans Clustering with Nystrom Approximation: RelativeError Bounds
Kernel kmeans clustering can correctly identify and extract a far more ...
read it

Union of Intersections (UoI) for Interpretable Data Driven Discovery and Prediction
The increasing size and complexity of scientific data could dramatically...
read it

Mapping the Similarities of Spectra: Global and Locallybiased Approaches to SDSS Galaxy Data
We apply a novel spectral graph technique, that of locallybiased semis...
read it

Lecture Notes on Spectral Graph Methods
These are lecture notes that are based on the lectures from a class I ta...
read it

Lecture Notes on Randomized Linear Algebra
These are lecture notes that are based on the lectures from a class I ta...
read it

Subsampled Newton Methods with Nonuniform Sampling
We consider the problem of finding the minimizer of a convex function F:...
read it

FLAG n' FLARE: Fast LinearlyCoupled Adaptive Gradient Methods
We consider first order gradient methods for effectively optimizing a co...
read it

SubSampled Newton Methods II: Local Convergence Rates
Many datafitting applications require the solution of an optimization p...
read it

SubSampled Newton Methods I: Globally Convergent Algorithms
Large scale optimization problems are ubiquitous in machine learning and...
read it

Optimal Subsampling Approaches for Large Sample Linear Regression
A significant hurdle for analyzing large sample data is the lack of effe...
read it

Structured Block Basis Factorization for Scalable Kernel Matrix Evaluation
Kernel matrices are popular in machine learning and scientific computing...
read it

Weighted SGD for ℓ_p Regression with Randomized Preconditioning
In recent years, stochastic gradient descent (SGD) methods and randomize...
read it

Implementing Randomized Matrix Algorithms in Parallel and Distributed Environments
In this era of largescale data, distributed systems built on top of clu...
read it

Fast Randomized Kernel Methods With Statistical Guarantees
One approach to improving the running time of kernelbased machine learn...
read it

A Statistical Perspective on Algorithmic Leveraging
One popular method for dealing with largescale data sets is sampling. F...
read it

Quantile Regression for Largescale Applications
Quantile regression is a method to estimate the quantiles of the conditi...
read it

Semisupervised Eigenvectors for Largescale Locallybiased Learning
In many applications, one has side information, e.g., labels that are pr...
read it

The Fast Cauchy Transform and Faster Robust Linear Regression
We provide fast algorithms for overconstrained ℓ_p regression and relate...
read it

Approximating HigherOrder Distances Using Random Projections
We provide a simple method and relevant theoretical analysis for efficie...
read it

Approximate Computation and Implicit Regularization for Very Largescale Data Analysis
Database theory and database practice are typically the domain of comput...
read it

Regularized Laplacian Estimation and Fast Eigenvector Approximation
Recently, Mahoney and Orecchia demonstrated that popular diffusionbased...
read it

CUR from a Sparse Optimization Viewpoint
The CUR decomposition provides an approximation of a matrix X that has l...
read it

Algorithmic and Statistical Perspectives on LargeScale Data Analysis
In recent years, ideas from statistics and scientific computing have beg...
read it

Implementing regularization implicitly via approximate eigenvector computation
Regularization is a powerful technique for extracting useful information...
read it

Sketched Ridge Regression: Optimization Perspective, Statistical Perspective, and Model Averaging
We address the statistical and optimization impacts of using classical s...
read it

A Berkeley View of Systems Challenges for AI
With the increasing commoditization of computer vision, speech recogniti...
read it

NewtonType Methods for NonConvex Optimization Under Inexact Hessian Information
We consider variants of trustregion and cubic regularization methods fo...
read it

Outofsample extension of graph adjacency spectral embedding
Many popular dimensionality reduction procedures have outofsample exte...
read it

Lectures on Randomized Numerical Linear Algebra
This chapter is based on lectures on Randomized Numerical Linear Algebra...
read it

GPU Accelerated SubSampled Newton's Method
First order methods, which solely rely on gradient information, are comm...
read it

Avoiding Synchronization in FirstOrder Methods for Sparse Convex Optimization
Parallel computing has played an important role in speeding up convex op...
read it

Hessianbased Analysis of Large Batch Training and Robustness to Adversaries
Large batch size training of Neural Networks has been shown to incur acc...
read it

A Bootstrap Method for Error Estimation in Randomized Matrix Multiplication
In recent years, randomized methods for numerical linear algebra have re...
read it

Distributed Secondorder Convex Optimization
Convex optimization problems arise frequently in diverse machine learnin...
read it

Error Estimation for Randomized LeastSquares Algorithms via the Bootstrap
Over the course of the past decade, a variety of randomized algorithms h...
read it

NewtonMR: Newton's Method Without Smoothness or Convexity
Establishing global convergence of the classical Newton's method has lon...
read it

Implicit SelfRegularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning
Random Matrix Theory (RMT) is applied to analyze weight matrices of Deep...
read it

Accelerating LargeScale Data Analysis by Offloading to HighPerformance Computing Libraries using Alchemist
Apache Spark is a popular system aimed at the analysis of large data set...
read it

Alchemist: An Apache Spark <=> MPI Interface
The Apache Spark framework for distributed computation is popular in the...
read it
Michael W. Mahoney
is this you? claim profile
Associate Professor at UC Berkeley, Adviser and Data Scientist at Vieu Labs, Research Scientist Stanford University