
PositiveNegative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization
It is wellknown that stochastic gradient noise (SGN) acts as implicit r...
Amata: An Annealing Mechanism for Adversarial Training Acceleration
Despite the empirical success in various domains, it has been revealed t...
Knowledge Distillation in Wide Neural Networks: Risk Bound, Data Efficiency and Imperfect Teacher
Knowledge distillation is a strategy of training a student network with ...
Neural Approximate Sufficient Statistics for Implicit Models
We consider the fundamental problem of how to automatically construct su...
Informative Dropout for Robust Representation Learning: A Shapebias Perspective
Convolutional Neural Networks (CNNs) are known to rely more on local tex...
Spherical Motion Dynamics of Deep Neural Networks with Batch Normalization and Weight Decay
We comprehensively reveal the learning dynamics of deep neural networks ...
Classify and Generate Reciprocally: Simultaneous PositiveUnlabelled Learning and Conditional Generation with Extra Data
The scarcity of classlabeled data is a ubiquitous bottleneck in a wide ...
Global Robustness Verification Networks
The wide deployment of deep neural networks, though achieving great succ...
BlackBox Certification with Randomized Smoothing: A Functional Optimization Based Framework
Randomized classifiers have been shown to provide a promising approach f...
Patchlevel Neighborhood Interpolation: A General and Effective Graphbased Regularization Strategy
Regularization plays a crucial role in machine learning models, especial...
Towards Making Deep Transfer Learning Never Hurt
Transfer learning have been frequently used to improve deep neural netwo...
Spatiotemporal Manifold Learning for Human Motions via Longhorizon Modeling
Datadriven modeling of human motions is ubiquitous in computer graphics...
AdaGCN: Adaboosting Graph Convolutional Networks into Deep Models
The design of deep graph models still remains to be investigated and the...
The Multiplicative Noise in Stochastic Gradient Descent: DataDependent Regularization, Continuous and Discrete Approximation
The randomness in Stochastic Gradient Descent (SGD) is considered to pla...
Differentiable Neural Architecture Search via Proximal Iterations
Neural architecture search (NAS) recently attracts much research attenti...
On the Learning Dynamics of Twolayer Nonlinear Convolutional Neural Networks
Convolutional neural networks (CNNs) have achieved remarkable performanc...
Interpreting Adversarially Trained Convolutional Neural Networks
We attempt to interpret how adversarially trained convolutional neural n...
Bayesian Optimized Continual Learning with Attention Mechanism
Though neural networks have achieved much progress in various applicatio...
You Only Propagate Once: Accelerating Adversarial Training Using Maximal Principle
Deep learning achieves stateoftheart results in many areas. However r...
You Only Propagate Once: Painless Adversarial Training Using Maximal Principle
Deep learning achieves stateoftheart results in many areas. However r...
STUNet: A SpatioTemporal UNetwork for Graphstructured Time Series Modeling
The spatiotemporal graph learning is becoming an increasingly important...
3D Graph Convolutional Networks with Temporal Graphs: A Spatial Information Free Framework For Traffic Forecasting
Spatiotemporal prediction plays an important role in many application a...
Virtual Adversarial Training on Graph Convolutional Networks in Node Classification
The effectiveness of Graph Convolutional Networks (GCNs) has been demons...
MultiStage SelfSupervised Learning for Graph Convolutional Networks
Graph Convolutional Networks(GCNs) play a crucial role in graph learning...
Enhancing the Robustness of Deep Neural Networks by Boundary Conditional GAN
Deep neural networks have been widely deployed in various machine learni...
Towards Understanding Adversarial Examples Systematically: Exploring Data Size, Task and Model Factors
Most previous works usually explained adversarial examples from several ...
Quasipotential as an implicit regularizer for the loss function in the stochastic gradient descent
We interpret the variational inference of the Stochastic Gradient Descen...
TangentNormal Adversarial Regularization for Semisupervised Learning
The everincreasing size of modern datasets combined with the difficulty...
Neural Control Variates for Variance Reduction
In statistics and machine learning, approximation of an intractable inte...
Reinforced Continual Learning
Most artificial intelligence models have limiting ability to solve new t...
The Regularization Effects of Anisotropic Noise in Stochastic Gradient Descent
Understanding the generalization of deep learning has raised lots of con...
Understanding and Enhancing the Transferability of Adversarial Examples
Stateoftheart deep neural networks are known to be vulnerable to adve...
Spatiotemporal Graph Convolutional Neural Network: A Deep Learning Framework for Traffic Forecasting
The goal of traffic forecasting is to predict the future vital indicator...
Towards Understanding Generalization of Deep Learning: Perspective of Loss Landscapes
It is widely observed that deep learning models with learned parameters ...
Learning with Noise: Enhance Distantly Supervised Relation Extraction with Dynamic Transition Matrix
Distant supervision significantly reduces human efforts in building trai...
Langevin Dynamics with Continuous Tempering for Training Deep Neural Networks
Minimizing nonconvex and highdimensional objective functions is challe...
Stochastic Parallel Block Coordinate Descent for Largescale Saddle Point Problems
We consider convexconcave saddle point problems with a separable struct...
CovarianceControlled Adaptive Langevin Thermostat for LargeScale Bayesian Sampling
Monte Carlo sampling for Bayesian posterior inference is a common approa...
Adaptive Stochastic PrimalDual Coordinate Descent for Separable Saddle Point Problems
We consider a generic convexconcave saddle point problem with separable...
Zhanxing Zhu
