Michael W. Mahoney

research

∙ 08/30/2023

Surrogate-based Autotuning for Randomized Sketching Algorithms in Regression Problems

Algorithms from Randomized Numerical Linear Algebra (RandNLA) are known ...

0 Younghyun Cho, et al. ∙

research

∙ 07/15/2023

The Interpolating Information Criterion for Overparameterized Models

The problem of model selection is considered for the setting of interpol...

0 Liam Hodgkinson, et al. ∙

research

∙ 07/07/2023

GEANN: Scalable Graph Augmentations for Multi-Horizon Time Series Forecasting

Encoder-decoder deep neural networks have been increasingly studied for ...

0 Sitan Yang, et al. ∙

research

∙ 06/15/2023

A Heavy-Tailed Algebra for Probabilistic Programming

Despite the successes of probabilistic models based on passing noise thr...

0 Feynman Liang, et al. ∙

research

∙ 06/13/2023

SqueezeLLM: Dense-and-Sparse Quantization

Generative Large Language Models (LLMs) have demonstrated remarkable res...

0 Sehoon Kim, et al. ∙

research

∙ 05/28/2023

A Three-regime Model of Network Pruning

Recent work has highlighted the complex influence training hyperparamete...

0 Yefan Zhou, et al. ∙

research

∙ 05/28/2023

Constrained Optimization via Exact Augmented Lagrangian and Randomized Iterative Sketching

We consider solving equality-constrained nonlinear, nonconvex optimizati...

0 Ilgee Hong, et al. ∙

research

∙ 05/21/2023

When are ensembles really effective?

Ensembling has a long history in statistical data analysis, with many im...

0 Ryan Theisen, et al. ∙

research

∙ 02/27/2023

Full Stack Optimization of Transformer Inference: a Survey

Recent advances in state-of-the-art DNN architecture design have been mo...

0 Sehoon Kim, et al. ∙

research

∙ 02/21/2023

Learning Physical Models that Can Respect Conservation Laws

Recent work in scientific machine learning (SciML) has focused on incorp...

0 Derek Hansen, et al. ∙

research

∙ 02/15/2023

Big Little Transformer Decoder

The recent emergence of Large Language Models based on the Transformer a...

0 Sehoon Kim, et al. ∙

research

∙ 11/29/2022

Fully Stochastic Trust-Region Sequential Quadratic Programming for Equality-Constrained Optimization Problems

We propose a trust-region stochastic sequential quadratic programming al...

0 Yuchen Fang, et al. ∙

research

∙ 10/14/2022

Monotonicity and Double Descent in Uncertainty Estimation with Gaussian Processes

The quality of many modern machine learning models improves as model com...

0 Liam Hodgkinson, et al. ∙

research

∙ 10/02/2022

Gradient Gating for Deep Multi-Rate Learning on Graphs

We present Gradient Gating (G^2), a novel framework for improving the pe...

20 T. Konstantin Rusch, et al. ∙

research

∙ 07/08/2022

Adaptive Self-supervision Algorithms for Physics-informed Neural Networks

Physics-informed neural networks (PINNs) incorporate physical knowledge ...

8 Shashank Subramanian, et al. ∙

research

∙ 06/12/2022

Neurotoxin: Durable Backdoors in Federated Learning

Due to their decentralized nature, federated learning (FL) systems have ...

0 Zhengming Zhang, et al. ∙

research

∙ 06/02/2022

Squeezeformer: An Efficient Transformer for Automatic Speech Recognition

The recently proposed Conformer model has become the de facto backbone m...

29 Sehoon Kim, et al. ∙

research

∙ 05/27/2022

Asymptotic Convergence Rate and Statistical Inference for Stochastic Sequential Quadratic Programming

We apply a stochastic sequential quadratic programming (StoSQP) algorith...

0 Sen Na, et al. ∙

research

∙ 05/16/2022

Fat-Tailed Variational Inference with Anisotropic Tail Adaptive Flows

While fat-tailed densities commonly arise as posterior and marginal dist...

0 Feynman Liang, et al. ∙

research

∙ 05/14/2022

The Sky Above The Clouds

Technology ecosystems often undergo significant transformations as they ...

0 Sarah Chasins, et al. ∙

research

∙ 04/20/2022

Hessian Averaging in Stochastic Newton Methods Achieves Superlinear Convergence

We consider minimizing a smooth and strongly convex objective function u...

0 Sen Na, et al. ∙

research

∙ 02/28/2022

Fast Feature Selection with Fairness Constraints

We study the fundamental problem of selecting optimal features for model...

0 Francesco Quinzan, et al. ∙

research

∙ 02/06/2022

Evaluating natural language processing models with generalization metrics that do not need access to any training or testing data

The search for effective and robust generalization metrics has been the ...

3 Yaoqing Yang, et al. ∙

research

∙ 11/27/2021

Learning from learning machines: a new generation of AI technology to meet the needs of science

We outline emerging opportunities and challenges to enhance the utility ...

26 Luca Pion-Tonachini, et al. ∙

research

∙ 09/11/2021

Doubly Adaptive Scaled Algorithm for Machine Learning Using Second-Order Information

We present a novel adaptive optimization algorithm for large-scale machi...

12 Majid Jahani, et al. ∙

research

∙ 09/08/2021

What's Hidden in a One-layer Randomly Weighted Transformer?

We demonstrate that, hidden within one-layer randomly weighted neural ne...

17 Sheng Shen, et al. ∙

research

∙ 09/02/2021

Characterizing possible failure modes in physics-informed neural networks

Recent work in scientific machine learning has developed so-called physi...

6 Aditi S. Krishnapriyan, et al. ∙

research

∙ 08/02/2021

Generalization Properties of Stochastic Optimizers via Trajectory Analysis

Despite the ubiquitous use of stochastic optimization algorithms in mach...

0 Liam Hodgkinson, et al. ∙

research

∙ 07/23/2021

Taxonomizing local versus global structure in neural network loss landscapes

Viewing neural network models in terms of their loss landscapes has a lo...

9 Yaoqing Yang, et al. ∙

research

∙ 07/15/2021

Newton-LESS: Sparsification without Trade-offs for the Sketched Newton Update

In second-order optimization, a potential bottleneck can be computing th...

9 Michał Dereziński, et al. ∙

research

∙ 06/01/2021

Post-mortem on a deep learning contest: a Simpson's paradox and the complementary roles of scale metrics versus shape metrics

To understand better the causes of good generalization performance in st...

8 Charles H. Martin, et al. ∙

research

∙ 05/30/2021

MLPruning: A Multilevel Structured Pruning Framework for Transformer-based Models

Pruning is an effective method to reduce the memory footprint and comput...

21 Zhewei Yao, et al. ∙

research

∙ 04/29/2021

ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training

The increasing size of neural network models has been critical for impro...

13 Jianfei Chen, et al. ∙

research

∙ 03/31/2021

Q-ASR: Integer-only Zero-shot Quantization for Efficient Speech Recognition

End-to-end neural network models achieve improved performance on various...

0 Sehoon Kim, et al. ∙

research

∙ 03/25/2021

A Survey of Quantization Methods for Efficient Neural Network Inference

As soon as abstract mathematical computations were adapted to computatio...

10 Amir Gholami, et al. ∙

research

∙ 03/02/2021

Hessian Eigenspectra of More Realistic Nonlinear Models

Given an optimization problem, the Hessian matrix and its eigenspectrum ...

0 Zhenyu Liao, et al. ∙

research

∙ 01/22/2021

Hessian-Aware Pruning and Optimal Neural Implant

Pruning is an effective method to reduce the memory footprint and FLOPs ...

1 Shixing Yu, et al. ∙

research

∙ 01/05/2021

I-BERT: Integer-only BERT Quantization

Transformer based models, like BERT and RoBERTa, have achieved state-of-...

0 Sehoon Kim, et al. ∙

research

∙ 11/21/2020

Sparse sketches with small inversion bias

For a tall n× d matrix A and a random m× n sketching matrix S, the sketc...

0 Michał Dereziński, et al. ∙

research

∙ 11/20/2020

HAWQV3: Dyadic Neural Network Quantization

Quantization is one of the key techniques used to make Neural Networks (...

7 Zhewei Yao, et al. ∙

research

∙ 10/27/2020

A Statistical Framework for Low-bitwidth Training of Deep Neural Networks

Fully quantized training (FQT), which uses low-bitwidth hardware by quan...

0 Jianfei Chen, et al. ∙

research

∙ 10/18/2020

Fast Distributed Training of Deep Neural Networks: Dynamic Communication Thresholding for Model and Data Parallelism

Data Parallelism (DP) and Model Parallelism (MP) are two common paradigm...

8 Vipul Gupta, et al. ∙

research

∙ 10/12/2020

MAF: Multimodal Alignment Framework for Weakly-Supervised Phrase Grounding

Phrase localization is a task that studies the mapping from textual phra...

1 Qinxin Wang, et al. ∙

research

∙ 10/03/2020

Sparse Quantized Spectral Clustering

Given a large data matrix, sparsifying, quantizing, and/or performing ot...

0 Zhenyu Liao, et al. ∙

research

∙ 08/26/2020

Benchmarking Semi-supervised Federated Learning

Federated learning promises to use the computational power of edge devic...

15 Zhengming Zhang, et al. ∙

research

∙ 07/09/2020

Boundary thickness and robustness in learning models

Robustness of machine learning models to various adversarial and non-adv...

14 Yaoqing Yang, et al. ∙

research

∙ 07/02/2020

Debiasing Distributed Second Order Optimization with Surrogate Sketching and Scaled Regularization

In distributed second order optimization, a standard strategy is to aver...

1 Michał Dereziński, et al. ∙

research

∙ 06/22/2020

Good linear classifiers are abundant in the interpolating regime

Within the machine learning community, the widely-used uniform convergen...

0 Ryan Theisen, et al. ∙

research

∙ 06/18/2020

Precise expressions for random projections: Low-rank approximation and randomized Newton

It is often desirable to reduce the dimensionality of a large dataset by...

0 Michał Dereziński, et al. ∙

research

∙ 06/11/2020

Multiplicative noise and heavy tails in stochastic optimization

Although stochastic optimization is central to modern machine learning, ...

0 Liam Hodgkinson, et al. ∙

Michael W. Mahoney

Featured Co-authors

Sign in with Google

Consider DeepAI Pro