
MemoryEfficient Differentiable Transformer Architecture Search
Differentiable architecture search (DARTS) is successfully applied in ma...
read it

Directional Convergence Analysis under Spherically Symmetric Distribution
We consider the fundamental problem of learning linear predictors (i.e.,...
read it

Nonasymptotic Performances of Robust Markov Decision Processes
In this paper, we study the nonasymptotic performance of optimal policy...
read it

MetaRegularization: An Approach to Adaptive Choice of the Learning Rate in Gradient Descent
We propose MetaRegularization, a novel approach for the adaptive choice...
read it

Lower Complexity Bounds of FiniteSum Optimization Problems: The Results and Construction
The contribution of this paper includes two aspects. First, we study the...
read it

DIPPA: An improved Method for Bilinear Saddle Point Problems
This paper studies bilinear saddle point problems min_xmax_y g(x) + x^⊤A...
read it

Landscape of Sparse Linear Network: A Brief Investigation
Network pruning, or sparse network has a long history and practical sign...
read it

Optimal Quantization for Batch Normalization in Neural Network Deployments and Beyond
Quantized Neural Networks (QNNs) use low bitwidth fixedpoint numbers f...
read it

Intervention Generative Adversarial Networks
In this paper we propose a novel approach for stabilizing the training p...
read it

An Asymptotically Optimal MultiArmed Bandit Algorithm and Hyperparameter Optimization
The evaluation of hyperparameters, neural architectures, or data augment...
read it

Communication Efficient Decentralized Training with Multiple Local Updates
Communication efficiency plays a significant role in decentralized optim...
read it

Distillation ≈ Early Stopping? Harvesting Dark Knowledge Utilizing Anisotropic Information Retrieval For Overparameterized Neural Network
Distillation is a method to transfer knowledge from one model to another...
read it

A Stochastic Proximal Point Algorithm for SaddlePoint Problems
We consider saddle point problems which objective functions are the aver...
read it

A General Analysis Framework of Lower Complexity Bounds for FiniteSum Optimization
This paper studies the lower bound complexity for the optimization probl...
read it

Towards Better Generalization: BPSVRG in Training Deep Neural Networks
Stochastic variancereduced gradient (SVRG) is a classical optimization ...
read it

On the Convergence of FedAvg on NonIID Data
Federated learning enables a large amount of edge computing devices to l...
read it

A GramGaussNewton Method Learning Overparameterized Deep Neural Networks for Regression Problems
Firstorder methods such as stochastic gradient descent (SGD) are curren...
read it

A Unified Framework for Regularized Reinforcement Learning
We propose and study a general framework for regularized Markov decision...
read it

Lipschitz Generative Adversarial Nets
In this paper we study the convergence of generative adversarial network...
read it

Do Subsampled Newton Methods Work for HighDimensional Data?
Subsampled Newton methods approximate Hessian matrices through subsampli...
read it

Hierarchical Attention: What Really Counts in Various NLP Tasks
Attention mechanisms in sequence to sequence models have shown great abi...
read it

Interpolatron: Interpolation or Extrapolation Schemes to Accelerate Optimization for Deep Neural Networks
In this paper we explore acceleration techniques for large scale nonconv...
read it

A Unifying Framework for Convergence Analysis of Approximate Newton Methods
Many machine learning models are reformulated as optimization problems. ...
read it

An Efficient CharacterLevel Neural Machine Translation
Neural machine translation aims at building a single large neural networ...
read it

A Proximal Stochastic QuasiNewton Algorithm
In this paper, we discuss the problem of minimizing the sum of two conve...
read it

Wishart Mechanism for Differentially Private Principal Components Analysis
We propose a new input perturbation mechanism for publishing a covarianc...
read it

Nonconvex Penalization in Sparse Estimation: An Approach Based on the Bernstein Function
In this paper we study nonconvex penalization using Bernstein functions ...
read it

A Parallel algorithm for XArmed bandits
The target of Xarmed bandit problem is to find the global maximum of an...
read it

A Scalable and Extensible Framework for SuperpositionStructured Models
In many learning tasks, structural models usually lead to better interpr...
read it

Adjusting Leverage Scores by Row Weighting: A Practical Approach to Coherent Matrix Completion
Lowrank matrix completion is an important problem with extensive realw...
read it

Group Orbit Optimization: A Unified Approach to Data Normalization
In this paper we propose and study an optimization problem over a matrix...
read it

The Bernstein Function: A Unifying Framework of Nonconvex Penalization in Sparse Estimation
In this paper we study nonconvex penalization using Bernstein functions....
read it

The Matrix Ridge Approximation: Algorithms and Applications
We are concerned with an approximation problem for a symmetric positive ...
read it

Compound Poisson Processes, Latent Shrinkage Priors and Bayesian Nonconvex Penalization
In this paper we discuss Bayesian nonconvex penalization for sparse lear...
read it

Kinetic Energy Plus Penalty Functions for Sparse Estimation
In this paper we propose and study a family of sparsityinducing penalty...
read it

A Scalable CUR Matrix Decomposition Algorithm: Lower Time Complexity and Tighter Bound
The CUR matrix decomposition is an important extension of Nyström approx...
read it

Bayesian Multicategory Support Vector Machines
We show that the multiclass support vector machine (MSVM) proposed by L...
read it

EPGIG Priors and Applications in Bayesian Sparse Learning
In this paper we propose a novel framework for the construction of spars...
read it

Coherence Functions with Applications in LargeMargin Classification Methods
Support vector machines (SVMs) naturally embody sparseness due to their ...
read it