
MemoryEfficient Differentiable Transformer Architecture Search
Differentiable architecture search (DARTS) is successfully applied in ma...
Directional Convergence Analysis under Spherically Symmetric Distribution
We consider the fundamental problem of learning linear predictors (i.e.,...
Nonasymptotic Performances of Robust Markov Decision Processes
In this paper, we study the nonasymptotic performance of optimal policy...
MetaRegularization: An Approach to Adaptive Choice of the Learning Rate in Gradient Descent
We propose MetaRegularization, a novel approach for the adaptive choice...
Lower Complexity Bounds of FiniteSum Optimization Problems: The Results and Construction
The contribution of this paper includes two aspects. First, we study the...
DIPPA: An improved Method for Bilinear Saddle Point Problems
This paper studies bilinear saddle point problems min_xmax_y g(x) + x^⊤A...
Landscape of Sparse Linear Network: A Brief Investigation
Network pruning, or sparse network has a long history and practical sign...
Optimal Quantization for Batch Normalization in Neural Network Deployments and Beyond
Quantized Neural Networks (QNNs) use low bitwidth fixedpoint numbers f...
Intervention Generative Adversarial Networks
In this paper we propose a novel approach for stabilizing the training p...
An Asymptotically Optimal MultiArmed Bandit Algorithm and Hyperparameter Optimization
The evaluation of hyperparameters, neural architectures, or data augment...
Communication Efficient Decentralized Training with Multiple Local Updates
Communication efficiency plays a significant role in decentralized optim...
Distillation ≈ Early Stopping? Harvesting Dark Knowledge Utilizing Anisotropic Information Retrieval For Overparameterized Neural Network
Distillation is a method to transfer knowledge from one model to another...
A Stochastic Proximal Point Algorithm for SaddlePoint Problems
We consider saddle point problems which objective functions are the aver...
A General Analysis Framework of Lower Complexity Bounds for FiniteSum Optimization
This paper studies the lower bound complexity for the optimization probl...
Towards Better Generalization: BPSVRG in Training Deep Neural Networks
Stochastic variancereduced gradient (SVRG) is a classical optimization ...
On the Convergence of FedAvg on NonIID Data
Federated learning enables a large amount of edge computing devices to l...
A GramGaussNewton Method Learning Overparameterized Deep Neural Networks for Regression Problems
Firstorder methods such as stochastic gradient descent (SGD) are curren...
A Unified Framework for Regularized Reinforcement Learning
We propose and study a general framework for regularized Markov decision...
Lipschitz Generative Adversarial Nets
In this paper we study the convergence of generative adversarial network...
Do Subsampled Newton Methods Work for HighDimensional Data?
Subsampled Newton methods approximate Hessian matrices through subsampli...
Hierarchical Attention: What Really Counts in Various NLP Tasks
Attention mechanisms in sequence to sequence models have shown great abi...
Interpolatron: Interpolation or Extrapolation Schemes to Accelerate Optimization for Deep Neural Networks
In this paper we explore acceleration techniques for large scale nonconv...
A Unifying Framework for Convergence Analysis of Approximate Newton Methods
Many machine learning models are reformulated as optimization problems. ...
An Efficient CharacterLevel Neural Machine Translation
Neural machine translation aims at building a single large neural networ...
A Proximal Stochastic QuasiNewton Algorithm
In this paper, we discuss the problem of minimizing the sum of two conve...
Wishart Mechanism for Differentially Private Principal Components Analysis
We propose a new input perturbation mechanism for publishing a covarianc...
Nonconvex Penalization in Sparse Estimation: An Approach Based on the Bernstein Function
In this paper we study nonconvex penalization using Bernstein functions ...
A Parallel algorithm for XArmed bandits
The target of Xarmed bandit problem is to find the global maximum of an...
A Scalable and Extensible Framework for SuperpositionStructured Models
In many learning tasks, structural models usually lead to better interpr...
Adjusting Leverage Scores by Row Weighting: A Practical Approach to Coherent Matrix Completion
Lowrank matrix completion is an important problem with extensive realw...
Group Orbit Optimization: A Unified Approach to Data Normalization
In this paper we propose and study an optimization problem over a matrix...
The Bernstein Function: A Unifying Framework of Nonconvex Penalization in Sparse Estimation
In this paper we study nonconvex penalization using Bernstein functions....
The Matrix Ridge Approximation: Algorithms and Applications
We are concerned with an approximation problem for a symmetric positive ...
Compound Poisson Processes, Latent Shrinkage Priors and Bayesian Nonconvex Penalization
In this paper we discuss Bayesian nonconvex penalization for sparse lear...
Kinetic Energy Plus Penalty Functions for Sparse Estimation
In this paper we propose and study a family of sparsityinducing penalty...
A Scalable CUR Matrix Decomposition Algorithm: Lower Time Complexity and Tighter Bound
The CUR matrix decomposition is an important extension of Nyström approx...
Bayesian Multicategory Support Vector Machines
We show that the multiclass support vector machine (MSVM) proposed by L...
EPGIG Priors and Applications in Bayesian Sparse Learning
In this paper we propose a novel framework for the construction of spars...
Coherence Functions with Applications in LargeMargin Classification Methods
Support vector machines (SVMs) naturally embody sparseness due to their ...
