
HAWQV2: Hessian Aware traceWeighted Quantization of Neural Networks
Quantization is an effective method for reducing memory footprint and in...
Minimax experimental design: Bridging the gap between statistical and worstcase approaches to least squares regression
In experimental design, we are given a large collection of vectors, each...
ZeroQ: A Novel Zero Shot Quantization Framework
Quantization is a promising approach for reducing the inference time and...
Inefficiency of KFAC for Large Batch Size Training
In stochastic optimization, large batch training can leverage parallel r...
ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning
We introduce AdaHessian, a second order stochastic optimization algorith...
HeavyTailed Universality Predicts Trends in Test Accuracies for Very Large PreTrained Deep Neural Networks
Given two or more Deep Neural Networks (DNNs) with the same or similar a...
DCAR: A Discriminative and Compact Audio Representation to Improve Event Detection
This paper presents a novel twophase method for audio representation, D...
Rethinking generalization requires revisiting old ideas: statistical mechanics approaches and complex learning behavior
We describe an approach to understand the peculiar and counterintuitive ...
GIANT: Globally Improved Approximate Newton Method for Distributed Optimization
For distributed computing environments, we consider the canonical machin...
Capacity Releasing Diffusion for Speed and Locality
Diffusions and related random walk procedures are of central importance ...
SecondOrder Optimization for NonConvex Machine Learning: An Empirical Study
The resurgence of deep learning, as a highly effective machine learning ...
Scalable Kernel KMeans Clustering with Nystrom Approximation: RelativeError Bounds
Kernel kmeans clustering can correctly identify and extract a far more ...
Union of Intersections (UoI) for Interpretable Data Driven Discovery and Prediction
The increasing size and complexity of scientific data could dramatically...
Mapping the Similarities of Spectra: Global and Locallybiased Approaches to SDSS Galaxy Data
We apply a novel spectral graph technique, that of locallybiased semis...
Lecture Notes on Spectral Graph Methods
These are lecture notes that are based on the lectures from a class I ta...
Lecture Notes on Randomized Linear Algebra
These are lecture notes that are based on the lectures from a class I ta...
Subsampled Newton Methods with Nonuniform Sampling
We consider the problem of finding the minimizer of a convex function F:...
FLAG n' FLARE: Fast LinearlyCoupled Adaptive Gradient Methods
We consider first order gradient methods for effectively optimizing a co...
SubSampled Newton Methods II: Local Convergence Rates
Many datafitting applications require the solution of an optimization p...
SubSampled Newton Methods I: Globally Convergent Algorithms
Large scale optimization problems are ubiquitous in machine learning and...
Optimal Subsampling Approaches for Large Sample Linear Regression
A significant hurdle for analyzing large sample data is the lack of effe...
Structured Block Basis Factorization for Scalable Kernel Matrix Evaluation
Kernel matrices are popular in machine learning and scientific computing...
Weighted SGD for ℓ_p Regression with Randomized Preconditioning
In recent years, stochastic gradient descent (SGD) methods and randomize...
Implementing Randomized Matrix Algorithms in Parallel and Distributed Environments
In this era of largescale data, distributed systems built on top of clu...
Fast Randomized Kernel Methods With Statistical Guarantees
One approach to improving the running time of kernelbased machine learn...
A Statistical Perspective on Algorithmic Leveraging
One popular method for dealing with largescale data sets is sampling. F...
Quantile Regression for Largescale Applications
Quantile regression is a method to estimate the quantiles of the conditi...
Semisupervised Eigenvectors for Largescale Locallybiased Learning
In many applications, one has side information, e.g., labels that are pr...
The Fast Cauchy Transform and Faster Robust Linear Regression
We provide fast algorithms for overconstrained ℓ_p regression and relate...
Approximating HigherOrder Distances Using Random Projections
We provide a simple method and relevant theoretical analysis for efficie...
Approximate Computation and Implicit Regularization for Very Largescale Data Analysis
Database theory and database practice are typically the domain of comput...
Regularized Laplacian Estimation and Fast Eigenvector Approximation
Recently, Mahoney and Orecchia demonstrated that popular diffusionbased...
CUR from a Sparse Optimization Viewpoint
The CUR decomposition provides an approximation of a matrix X that has l...
Algorithmic and Statistical Perspectives on LargeScale Data Analysis
In recent years, ideas from statistics and scientific computing have beg...
Implementing regularization implicitly via approximate eigenvector computation
Regularization is a powerful technique for extracting useful information...
Sketched Ridge Regression: Optimization Perspective, Statistical Perspective, and Model Averaging
We address the statistical and optimization impacts of using classical s...
A Berkeley View of Systems Challenges for AI
With the increasing commoditization of computer vision, speech recogniti...
NewtonType Methods for NonConvex Optimization Under Inexact Hessian Information
We consider variants of trustregion and cubic regularization methods fo...
Outofsample extension of graph adjacency spectral embedding
Many popular dimensionality reduction procedures have outofsample exte...
Lectures on Randomized Numerical Linear Algebra
This chapter is based on lectures on Randomized Numerical Linear Algebra...
GPU Accelerated SubSampled Newton's Method
First order methods, which solely rely on gradient information, are comm...
Avoiding Synchronization in FirstOrder Methods for Sparse Convex Optimization
Parallel computing has played an important role in speeding up convex op...
Hessianbased Analysis of Large Batch Training and Robustness to Adversaries
Large batch size training of Neural Networks has been shown to incur acc...
A Bootstrap Method for Error Estimation in Randomized Matrix Multiplication
In recent years, randomized methods for numerical linear algebra have re...
Distributed Secondorder Convex Optimization
Convex optimization problems arise frequently in diverse machine learnin...
Error Estimation for Randomized LeastSquares Algorithms via the Bootstrap
Over the course of the past decade, a variety of randomized algorithms h...
NewtonMR: Newton's Method Without Smoothness or Convexity
Establishing global convergence of the classical Newton's method has lon...
Implicit SelfRegularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning
Random Matrix Theory (RMT) is applied to analyze weight matrices of Deep...
Accelerating LargeScale Data Analysis by Offloading to HighPerformance Computing Libraries using Alchemist
Apache Spark is a popular system aimed at the analysis of large data set...
Alchemist: An Apache Spark <=> MPI Interface
The Apache Spark framework for distributed computation is popular in the...
Michael W. Mahoney
Associate Professor at UC Berkeley, Adviser and Data Scientist at Vieu Labs, Research Scientist Stanford University