
RDrop: Regularized Dropout for Neural Networks
Dropout is a powerful and widely used technique to regularize the traini...
read it

PriorGrad: Improving Conditional Denoising Diffusion Models with DataDriven Adaptive Prior
Denoising diffusion probabilistic models have been recently proposed to ...
read it

Incorporating NODE with Pretrained Neural Differential Operator for Learning Dynamics
Learning dynamics governed by differential equations is crucial for pred...
read it

MachineLearning NonConservative Dynamics for NewPhysics Detection
Energy conservation is a basic physics principle, the breakdown of which...
read it

UniDrop: A Simple yet Effective Technique to Improve Transformer without Extra Cost
Transformer architecture achieves great success in abundant natural lang...
read it

Towards Accelerating Training of Batch Normalization: A Manifold Perspective
Batch normalization (BN) has become a crucial component across diverse d...
read it

The Implicit Bias for Adaptive Optimization Algorithms on Homogeneous Neural Networks
Despite their overwhelming capacity to overfit, deep neural networks tra...
read it

Constructing Basis Path Set by Eliminating Path Dependency
The way the basis path set works in neural network remains mysterious, a...
read it

Dynamic of Stochastic Gradient Descent with StateDependent Noise
Stochastic gradient descent (SGD) and its variants are mainstream method...
read it

Interpreting Basis Path Set in Neural Networks
Based on basis path set, GSGD algorithm significantly outperforms conve...
read it

Reinforcement Learning with Dynamic Boltzmann Softmax Updates
Value function estimation is an important task in reinforcement learning...
read it

Positively ScaleInvariant Flatness of ReLU Neural Networks
It was empirically confirmed by Keskar et al.SharpMinima that flatter mi...
read it

Target Transfer QLearning and Its Convergence Analysis
Qlearning is one of the most popular methods in Reinforcement Learning ...
read it

Capacity Control of ReLU Neural Networks by Basispath Norm
Recently, path norm was proposed as a new capacity measure for neural ne...
read it

Differential Equations for Modeling Asynchronous Algorithms
Asynchronous stochastic gradient descent (ASGD) is a popular parallel op...
read it

Optimizing Neural Networks in the Equivalent Class Space
It has been widely observed that many activation functions and pooling m...
read it

Convergence Analysis of Distributed Stochastic Gradient Descent with Shuffling
When using stochastic gradient descent to solve largescale machine lear...
read it

Generalization Error Bounds for Optimization Algorithms via Stability
Many machine learning tasks can be formulated as Regularized Empirical R...
read it
Qi Meng
is this you? claim profile