Since its inception in "Attention Is All You Need", transformer architec...
Determining the memory capacity of two-layer neural networks with m hidd...
Supervised contrastive loss (SCL) is a competitive and often superior
al...
Prompt-tuning is an emerging strategy to adapt large language models (LL...
In this paper, we investigate the memorization capabilities of multi-hea...
The popularity of bi-level optimization (BO) in deep learning has spurre...
Normalized gradient descent has shown substantial success in speeding up...
Modern machine learning models are often over-parameterized and as a res...
Various logit-adjusted parameterizations of the cross-entropy (CE) loss ...
We investigate the generalization and optimization of k-homogeneous shal...
Decentralized learning offers privacy and communication efficiency when ...
Neural Collapse refers to the remarkable structural properties character...
Overparameterized models fail to generalize well in the presence of data...
Driven by the empirical success and wide use of deep neural networks,
un...
In this work we investigate meta-learning (or learning-to-learn) approac...
Standard federated optimization methods successfully apply to stochastic...
Imbalanced datasets are commonplace in modern machine learning problems....
We consider a general class of regression models with normally distribut...
The growing literature on "benign overfitting" in overparameterized mode...
Safety in reinforcement learning has become increasingly important in re...
Out of the rich family of generalized linear bandits, perhaps the most w...
Label-imbalanced and group-sensitive classification seeks to appropriate...
Deep networks are typically trained with many more parameters than the s...
We study decentralized stochastic linear bandits, where a network of N
a...
Deep neural networks generalize well despite being exceedingly
overparam...
Contemporary machine learning applications often involve classification ...
It is widely known that several machine learning models are susceptible ...
We study stage-wise conservative linear stochastic bandits: an instance ...
We study the problem of recovering an unknown signal x given
measurement...
Model pruning is an essential procedure for building compact and
computa...
Empirical Risk Minimization (ERM) algorithms are widely used in a variet...
Many applications require a learner to make sequential decisions given
u...
We study convex empirical risk minimization for high-dimensional inferen...
Extensive empirical evidence reveals that, for a wide range of different...
We consider a model for logistic regression where only a subset of featu...
The design and performance analysis of bandit algorithms in the presence...
Bandit algorithms have various application in safety-critical systems, w...
We study the performance of a wide class of convex optimization-based
es...
The deployment of massive MIMO systems has revived much of the interest ...
In the problem of structured signal recovery from high-dimensional linea...
We study algorithms for solving quadratic systems of equations based on
...
The ability to see around corners, i.e., recover details of a hidden sce...
We study the problem of recovering a structured signal x_0 from
high-dim...
The maximum-likelihood (ML) decoder for symbol detection in large
multip...