
REX: Revisiting Budgeted Training with an Improved Schedule
Deep learning practitioners often operate on a computational and monetar...
ResIST: LayerWise Decomposition of ResNets for Distributed Training
We propose , a novel distributed training protocol for Residual Networks...
Mitigating deep double descent by concatenating inputs
The double descent curve is one of the most intriguing properties of dee...
Momentuminspired LowRank Coordinate Descent for Diagonally Constrained SDPs
We present a novel, practical, and provable approach for solving diagona...
Fast quantum state reconstruction via accelerated nonconvex programming
We propose a new quantum state reconstruction method that combines ideas...
GIST: Distributed Training for LargeScale Graph Convolutional Networks
The graph convolutional network (GCN) is a goto solution for machine le...
RankOne Measurements of LowRank PSD Matrices Have Small Feasible Sets
We study the role of the constraint set in determining the solution to l...
On Continuous Local BDDBased Search for Hybrid SAT Solving
We explore the potential of continuous local search (CLS) in SAT solving...
On Generalization of Adaptive Methods for Overparameterized Linear Regression
Overparameterization and adaptive methods have played a crucial role in...
ImCLR: Implicit Contrastive Learning for Image Classification
Contrastive learning is an effective method for learning visual represen...
Bayesian Coresets: An Optimization Perspective
Bayesian coresets have emerged as a promising approach for implementing ...
FourierSAT: A Fourier ExpansionBased Algebraic Framework for Solving Hybrid Boolean Constraints
The Boolean SATisfiability problem (SAT) is of central importance in com...
Optimal MiniBatch Size Selection for Fast Gradient Descent
This paper presents a methodology for selecting the minibatch size that...
Negative sampling in semisupervised learning
We introduce Negative Sampling in SemiSupervised Learning (NS3L), a sim...
Learning Sparse Distributions using Iterative Hard Thresholding
Iterative hard thresholding (IHT) is a projected gradient descent algori...
Decaying momentum helps neural network training
Momentum is a simple and popular technique in deep learning for gradient...
Distributed Learning of Deep Neural Networks using Independent Subnet Training
Stochastic gradient descent (SGD) is the method of choice for distribute...
SysML: The New Frontier of Machine Learning Systems
Machine learning (ML) techniques are enjoying rapidly increasing adoptio...
Compressing Gradient Optimizers via CountSketches
Many popular firstorder optimization methods (e.g., Momentum, AdaGrad, ...
Minimum norm solutions do not always generalize well for overparameterized problems
Stochastic gradient descent is the de facto algorithm for training deep ...
Implicit regularization and solution uniqueness in overparameterized matrix sensing
We consider whether algorithmic choices in overparameterized linear mat...
Run Procrustes, Run! On the convergence of accelerated Procrustes Flow
In this work, we present theoretical results on the convergence of nonc...
Simple and practical algorithms for ℓ_pnorm lowrank approximation
We propose practical algorithms for entrywise ℓ_pnorm lowrank approxim...
Approximate Newtonbased statistical inference using only stochastic gradients
We present a novel inference framework for convex empirical risk minimiz...
IHT dies hard: Provable accelerated Iterative Hard Thresholding
We study both in theory and practice the use of momentum motions in ...
Provable quantum state tomography via nonconvex methods
With nowadays steadily growing quantum processors, it is required to dev...
Statistical inference using SGD
We present a novel method for frequentist statistical inference in Mest...
Nonsquare matrix sensing without spurious local minima via the BurerMonteiro approach
We consider the nonsquare matrix sensing problem, under restricted isom...
Provable BurerMonteiro factorization for a class of normconstrained matrix problems
We study the projected gradient descent method on lowrank matrix proble...
A simple and provable algorithm for sparse diagonal CCA
Given two sets of variables, derived from a common set of samples, spars...
Algorithms for Learning Sparse Additive Models with Interactions in High Dimensions
A function f: R^d →R is a Sparse Additive Model (SPAM), if it is of the ...
Learning Sparse Additive Models with Interactions in High Dimensions
A function f: R^d →R is referred to as a Sparse Additive Model (SPAM), i...
Tradingoff variance and complexity in stochastic gradient descent
Stochastic gradient descent is the method of choice for largescale mach...
Convex blocksparse linear regression with expanders  provably
Sparse matrices are favorable objects in machine learning and optimizati...
Bipartite Correlation Clustering  Maximizing Agreements
In Bipartite Correlation Clustering (BCC) we are given a complete bipart...
A singlephase, proximal pathfollowing framework
We propose a new proximal, pathfollowing framework for a class of const...
Dropping Convexity for Faster Semidefinite Optimization
We study the minimization of a convex function f(X) over the set of n× n...
Sparse PCA via Bipartite Matchings
We consider the following multicomponent sparse PCA problem: given a se...
Stay on path: PCA along graph paths
We introduce a variant of (sparse) PCA in which the set of feasible supp...
Compressive Mining: Fast and Optimal Data Mining in the Compressed Domain
Realworld data typically contain repeated and periodic patterns. This s...
Scalable sparse covariance estimation via selfconcordance
We consider the class of convex minimization problems, composed of a sel...
Provable Deterministic Leverage Score Sampling
We explain theoretically a curious empirical phenomenon: "Approximating ...
Approximate Matrix Multiplication with Application to Linear Embeddings
In this paper, we study the problem of approximately computing the produ...
Nonuniform Feature Sampling for Decision Tree Ensembles
We study the effectiveness of nonuniform randomized feature selection i...
Composite SelfConcordant Minimization
We propose a variable metric framework for minimizing the sum of a self...
GroupSparse Model Selection: Hardness and Relaxations
Groupbased sparsity models are proven instrumental in linear regression...
A proximal Newton framework for composite minimization: Graph learning without Cholesky decompositions and matrix inversions
We propose an algorithmic framework for convex minimization problems of ...
Anastasios Kyrillidis
Assistant Professor of Computer Science, RICE; Goldstine Fellow at IBM Watson Research Center