
Learning from learning machines: a new generation of AI technology to meet the needs of science
We outline emerging opportunities and challenges to enhance the utility ...
Doubly Adaptive Scaled Algorithm for Machine Learning Using SecondOrder Information
We present a novel adaptive optimization algorithm for largescale machi...
What's Hidden in a Onelayer Randomly Weighted Transformer?
We demonstrate that, hidden within onelayer randomly weighted neural ne...
Characterizing possible failure modes in physicsinformed neural networks
Recent work in scientific machine learning has developed socalled physi...
Generalization Properties of Stochastic Optimizers via Trajectory Analysis
Despite the ubiquitous use of stochastic optimization algorithms in mach...
Taxonomizing local versus global structure in neural network loss landscapes
Viewing neural network models in terms of their loss landscapes has a lo...
NewtonLESS: Sparsification without Tradeoffs for the Sketched Newton Update
In secondorder optimization, a potential bottleneck can be computing th...
Postmortem on a deep learning contest: a Simpson's paradox and the complementary roles of scale metrics versus shape metrics
To understand better the causes of good generalization performance in st...
MLPruning: A Multilevel Structured Pruning Framework for Transformerbased Models
Pruning is an effective method to reduce the memory footprint and comput...
ActNN: Reducing Training Memory Footprint via 2Bit Activation Compressed Training
The increasing size of neural network models has been critical for impro...
QASR: Integeronly Zeroshot Quantization for Efficient Speech Recognition
Endtoend neural network models achieve improved performance on various...
A Survey of Quantization Methods for Efficient Neural Network Inference
As soon as abstract mathematical computations were adapted to computatio...
Hessian Eigenspectra of More Realistic Nonlinear Models
Given an optimization problem, the Hessian matrix and its eigenspectrum ...
HessianAware Pruning and Optimal Neural Implant
Pruning is an effective method to reduce the memory footprint and FLOPs ...
IBERT: Integeronly BERT Quantization
Transformer based models, like BERT and RoBERTa, have achieved stateof...
Sparse sketches with small inversion bias
For a tall n× d matrix A and a random m× n sketching matrix S, the sketc...
HAWQV3: Dyadic Neural Network Quantization
Quantization is one of the key techniques used to make Neural Networks (...
A Statistical Framework for Lowbitwidth Training of Deep Neural Networks
Fully quantized training (FQT), which uses lowbitwidth hardware by quan...
Fast Distributed Training of Deep Neural Networks: Dynamic Communication Thresholding for Model and Data Parallelism
Data Parallelism (DP) and Model Parallelism (MP) are two common paradigm...
MAF: Multimodal Alignment Framework for WeaklySupervised Phrase Grounding
Phrase localization is a task that studies the mapping from textual phra...
Sparse Quantized Spectral Clustering
Given a large data matrix, sparsifying, quantizing, and/or performing ot...
Benchmarking Semisupervised Federated Learning
Federated learning promises to use the computational power of edge devic...
Boundary thickness and robustness in learning models
Robustness of machine learning models to various adversarial and nonadv...
Debiasing Distributed Second Order Optimization with Surrogate Sketching and Scaled Regularization
In distributed second order optimization, a standard strategy is to aver...
Good linear classifiers are abundant in the interpolating regime
Within the machine learning community, the widelyused uniform convergen...
Precise expressions for random projections: Lowrank approximation and randomized Newton
It is often desirable to reduce the dimensionality of a large dataset by...
Multiplicative noise and heavy tails in stochastic optimization
Although stochastic optimization is central to modern machine learning, ...
A random matrix analysis of random Fourier features: beyond the Gaussian kernel, a precise phase transition, and the corresponding double descent
This article characterizes the exact asymptotics of random Fourier featu...
ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning
We introduce AdaHessian, a second order stochastic optimization algorith...
Determinantal Point Processes in Randomized Numerical Linear Algebra
Randomized Numerical Linear Algebra (RandNLA) uses randomness to develop...
Error Estimation for Sketched SVD via the Bootstrap
In order to compute fast approximations to the singular value decomposit...
Forecasting Sequential Data using Consistent Koopman Autoencoders
Recurrent neural networks are widely used on time series data, yet such ...
Asymptotic Analysis of Sampling Estimators for Randomized Numerical Linear Algebra Algorithms
The statistical analysis of Randomized Numerical Linear Algebra (RandNLA...
Stochastic Normalizing Flows
We introduce stochastic normalizing flows, an extension of continuous no...
Improved guarantees and a multipledescent curve for the Column Subset Selection Problem and the Nyström method
The Column Subset Selection Problem (CSSP) and the Nyström method are am...
Predicting trends in the quality of stateoftheart neural networks without access to training or testing data
In many applications, one works with deep neural network (DNN) models tr...
ZeroQ: A Novel Zero Shot Quantization Framework
Quantization is a promising approach for reducing the inference time and...
Exact expressions for double descent and implicit regularization via surrogate random design
Double descent refers to the phase transition that is exhibited by the g...
LSAR: Efficient Leverage Score Sampling Algorithm for the Analysis of Big Time Series Data
We apply methods from randomized numerical linear algebra (RandNLA) to d...
HAWQV2: Hessian Aware traceWeighted Quantization of Neural Networks
Quantization is an effective method for reducing memory footprint and in...
Running Alchemist on Cray XC and CS Series Supercomputers: Dask and PySpark Interfaces, Deployment Options, and Data Transfer Times
Newly developed interfaces for Python, Dask, and PySpark enable the use ...
Limit theorems for outofsample extensions of the adjacency and Laplacian spectral embeddings
Graph embeddings, a class of dimensionality reduction techniques designe...
Bootstrapping the Operator Norm in High Dimensions: Error Estimation for Covariance Matrices and Sketching
Although the operator (spectral) norm is one of the most widely used met...
QBERT: Hessian Based Ultra Low Precision Quantization of BERT
Transformer based architectures have become defacto models used for a r...
The Difficulties of Addressing Interdisciplinary Challenges at the Foundations of Data Science
The National Science Foundation's Transdisciplinary Research in Principl...
On Linear Convergence of Weighted Kernel Herding
We provide a novel convergence analysis of two popular sampling algorith...
Statistical guarantees for local graph clustering
Local graph clustering methods aim to find small clusters in very large ...
Bayesian experimental design using regularized determinantal point processes
In experimental design, we are given n vectors in d dimensions, and our ...
Residual Networks as Nonlinear Systems: Stability Analysis using Linearization
We regard pretrained residual networks (ResNets) as nonlinear systems a...
Distributed estimation of the inverse Hessian by determinantal averaging
In distributed optimization and distributed numerical linear algebra, we...
