
Compressing Gradient Optimizers via CountSketches
Many popular firstorder optimization methods (e.g., Momentum, AdaGrad, ...
read it

Statistical inference using SGD
We present a novel method for frequentist statistical inference in Mest...
read it

Nonsquare matrix sensing without spurious local minima via the BurerMonteiro approach
We consider the nonsquare matrix sensing problem, under restricted isom...
read it

Provable BurerMonteiro factorization for a class of normconstrained matrix problems
We study the projected gradient descent method on lowrank matrix proble...
read it

A simple and provable algorithm for sparse diagonal CCA
Given two sets of variables, derived from a common set of samples, spars...
read it

Algorithms for Learning Sparse Additive Models with Interactions in High Dimensions
A function f: R^d →R is a Sparse Additive Model (SPAM), if it is of the ...
read it

Learning Sparse Additive Models with Interactions in High Dimensions
A function f: R^d →R is referred to as a Sparse Additive Model (SPAM), i...
read it

Tradingoff variance and complexity in stochastic gradient descent
Stochastic gradient descent is the method of choice for largescale mach...
read it

Convex blocksparse linear regression with expanders  provably
Sparse matrices are favorable objects in machine learning and optimizati...
read it

Bipartite Correlation Clustering  Maximizing Agreements
In Bipartite Correlation Clustering (BCC) we are given a complete bipart...
read it

A singlephase, proximal pathfollowing framework
We propose a new proximal, pathfollowing framework for a class of const...
read it

Dropping Convexity for Faster Semidefinite Optimization
We study the minimization of a convex function f(X) over the set of n× n...
read it

Sparse PCA via Bipartite Matchings
We consider the following multicomponent sparse PCA problem: given a se...
read it

Stay on path: PCA along graph paths
We introduce a variant of (sparse) PCA in which the set of feasible supp...
read it

Compressive Mining: Fast and Optimal Data Mining in the Compressed Domain
Realworld data typically contain repeated and periodic patterns. This s...
read it

Scalable sparse covariance estimation via selfconcordance
We consider the class of convex minimization problems, composed of a sel...
read it

Provable Deterministic Leverage Score Sampling
We explain theoretically a curious empirical phenomenon: "Approximating ...
read it

Approximate Matrix Multiplication with Application to Linear Embeddings
In this paper, we study the problem of approximately computing the produ...
read it

Nonuniform Feature Sampling for Decision Tree Ensembles
We study the effectiveness of nonuniform randomized feature selection i...
read it

Composite SelfConcordant Minimization
We propose a variable metric framework for minimizing the sum of a self...
read it

GroupSparse Model Selection: Hardness and Relaxations
Groupbased sparsity models are proven instrumental in linear regression...
read it

A proximal Newton framework for composite minimization: Graph learning without Cholesky decompositions and matrix inversions
We propose an algorithmic framework for convex minimization problems of ...
read it

Provable quantum state tomography via nonconvex methods
With nowadays steadily growing quantum processors, it is required to dev...
read it

Implicit regularization and solution uniqueness in overparameterized matrix sensing
We consider whether algorithmic choices in overparameterized linear mat...
read it

Run Procrustes, Run! On the convergence of accelerated Procrustes Flow
In this work, we present theoretical results on the convergence of nonc...
read it

Approximate Newtonbased statistical inference using only stochastic gradients
We present a novel inference framework for convex empirical risk minimiz...
read it

Minimum norm solutions do not always generalize well for overparameterized problems
Stochastic gradient descent is the de facto algorithm for training deep ...
read it

IHT dies hard: Provable accelerated Iterative Hard Thresholding
We study both in theory and practice the use of momentum motions in ...
read it

Simple and practical algorithms for ℓ_pnorm lowrank approximation
We propose practical algorithms for entrywise ℓ_pnorm lowrank approxim...
read it

SysML: The New Frontier of Machine Learning Systems
Machine learning (ML) techniques are enjoying rapidly increasing adoptio...
read it

Distributed Learning of Deep Neural Networks using Independent Subnet Training
Stochastic gradient descent (SGD) is the method of choice for distribute...
read it

Decaying momentum helps neural network training
Momentum is a simple and popular technique in deep learning for gradient...
read it

Learning Sparse Distributions using Iterative Hard Thresholding
Iterative hard thresholding (IHT) is a projected gradient descent algori...
read it

Negative sampling in semisupervised learning
We introduce Negative Sampling in SemiSupervised Learning (NS3L), a sim...
read it

Optimal MiniBatch Size Selection for Fast Gradient Descent
This paper presents a methodology for selecting the minibatch size that...
read it

FourierSAT: A Fourier ExpansionBased Algebraic Framework for Solving Hybrid Boolean Constraints
The Boolean SATisfiability problem (SAT) is of central importance in com...
read it

Bayesian Coresets: An Optimization Perspective
Bayesian coresets have emerged as a promising approach for implementing ...
read it
Anastasios Kyrillidis
is this you? claim profile
Assistant Professor of Computer Science, RICE; Goldstine Fellow at IBM Watson Research Center