
-
Improving Self-supervised Pre-training via a Fully-Explored Masked Language Model
Masked Language Model (MLM) framework has been widely adopted for self-s...
read it
-
Jointly Modeling Intra- and Inter-transaction Dependencies with Hierarchical Attentive Transaction Embeddings for Next-item Recommendation
A transaction-based recommender system (TBRS) aims to predict the next i...
read it
-
Statistically Preconditioned Accelerated Gradient Method for Distributed Optimization
We consider the setting of distributed empirical risk minimization where...
read it
-
Statistical Adaptive Stochastic Gradient Methods
We propose a statistical adaptive procedure called SALSA for automatical...
read it
-
Understanding the Role of Momentum in Stochastic Gradient Methods
The use of momentum in stochastic gradient methods has become a widespre...
read it
-
Joint Computation and Communication Design for UAV-Assisted Mobile Edge Computing in IoT
Unmanned aerial vehicle (UAV)-assisted mobile edge computing (MEC) syste...
read it
-
Using Statistics to Automate Stochastic Optimization
Despite the development of numerous adaptive optimizers, tuning the lear...
read it
-
Multi-Level Composite Stochastic Optimization via Nested Variance Reduction
We consider multi-level composite optimization problems where each mappi...
read it
-
A Stochastic Composite Gradient Method with Incremental Variance Reduction
We consider the problem of minimizing the composition of a smooth (nonco...
read it
-
Hyperbolic Interaction Model For Hierarchical Multi-Label Classification
Different from the traditional classification tasks which assume mutual ...
read it
-
Label-aware Document Representation via Hybrid Attention for Extreme Multi-Label Text Classification
Extreme multi-label text classification (XMTC) aims at tagging a documen...
read it
-
Secrecy Energy Efficiency Maximization for UAV-Enabled Mobile Relaying
This paper investigates the secrecy energy efficiency (SEE) maximization...
read it
-
Learning SMaLL Predictors
We present a new machine learning technique for training small resource-...
read it
-
Smoothed Dual Embedding Control
We revisit the Bellman optimality equation with Nesterov's smoothing tec...
read it
-
DSCOVR: Randomized Primal-Dual Block Coordinate Algorithms for Asynchronous Distributed Optimization
Machine learning with big data often involves large optimization models....
read it
-
Exploiting Strong Convexity from Data with Primal-Dual First-Order Algorithms
We consider empirical risk minimization of linear predictors with convex...
read it
-
Stochastic Variance Reduction Methods for Policy Evaluation
Policy evaluation is a crucial step in many reinforcement-learning proce...
read it
-
Communication-Efficient Distributed Optimization of Self-Concordant Empirical Loss
We consider distributed convex optimization problems originated from sam...
read it
-
A Proximal Stochastic Gradient Method with Progressive Variance Reduction
We consider the problem of minimizing the sum of two convex functions: o...
read it
-
A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming
We propose a randomized nonmonotone block proximal gradient (RNBPG) meth...
read it
-
On the Complexity Analysis of Randomized Block-Coordinate Descent Methods
In this paper we analyze the randomized block-coordinate descent (RBCD) ...
read it
-
A Proximal-Gradient Homotopy Method for the Sparse Least-Squares Problem
We consider solving the ℓ_1-regularized least-squares (ℓ_1-LS) problem i...
read it