
Teacher's pet: understanding and mitigating biases in distillation
Knowledge distillation is widely used as a means of improving the perfor...
read it

Demystifying the Better Performance of Position Encoding Variants for Transformer
Transformers are state of the art models in NLP that map a given input s...
read it

Understanding Robustness of Transformers for Image Classification
Deep Convolutional Neural Networks (CNNs) have long been the architectur...
read it

On the Reproducibility of Neural Network Predictions
Standard training techniques for neural networks involve multiple source...
read it

Modifying Memories in Transformer Models
Large Transformer models have achieved impressive performance in many na...
read it

An efficient nonconvex reformulation of stagewise convex optimization problems
Convex optimization problems with staged structure appear in several con...
read it

Coping with Label Shift via Distributionally Robust Optimisation
The label shift problem refers to the supervised learning setting where ...
read it

O(n) Connections are Expressive Enough: Universal Approximability of Sparse Transformers
Transformer networks use pairwise attention to compute contextual embedd...
read it

Does label smoothing mitigate label noise?
Label smoothing is commonly used in training deep learning models, where...
read it

LowRank Bottleneck in Multihead Attention Models
Attention based Transformer architecture has enabled significant advance...
read it

Are Transformers universal approximators of sequencetosequence functions?
Despite the widespread adoption of Transformer models for NLP tasks, the...
read it

Towards Understanding the Role of OverParametrization in Generalization of Neural Networks
Despite existing work on ensuring generalization of neural networks in t...
read it

Smoothed analysis for lowrank solutions to semidefinite programs in quadratic penalty form
Semidefinite programs (SDP) are important in learning and combinatorial ...
read it

Provable quantum state tomography via nonconvex methods
With nowadays steadily growing quantum processors, it is required to dev...
read it

Implicit Regularization in Matrix Factorization
We study implicit regularization when optimizing an underdetermined quad...
read it

Stabilizing GAN Training with Multiple Random Projections
Training generative adversarial networks is unstable in highdimensions ...
read it

Single Pass PCA of Matrix Products
In this paper we present a new algorithm for computing a low rank approx...
read it

Provable BurerMonteiro factorization for a class of normconstrained matrix problems
We study the projected gradient descent method on lowrank matrix proble...
read it

Global Optimality of Local Search for Low Rank Matrix Recovery
We show that there are no spurious local minima in the nonconvex factor...
read it

Dropping Convexity for Faster Semidefinite Optimization
We study the minimization of a convex function f(X) over the set of n× n...
read it

A New Sampling Technique for Tensors
In this paper we propose new techniques to sample arbitrary thirdorder ...
read it

Tighter Lowrank Approximation via Sampling the Leveraged Element
In this work, we propose a new randomized algorithm for computing a low...
read it

Universal Matrix Completion
The problem of lowrank matrix completion has recently generated a lot o...
read it

Completing Any Lowrank Matrix, Provably
Matrix completion, i.e., the exact and provable recovery of a lowrank m...
read it
Srinadh Bhojanapalli
is this you? claim profile
Research Assistant Professor at Toyota Technological Institute at Chicago, Ph.D. in Electrical and Computer Engineering from The University of Texas at Austin 2015, Intern at Microsoft research India and ebay research labs.