It is believed that Gradient Descent (GD) induces an implicit bias towar...
Learning mappings between infinite-dimensional function spaces has achie...
It is well-known that modern neural networks are vulnerable to adversari...
The study of accelerated gradient methods in Riemannian optimization has...
Distributionally robust optimization (DRO) is a widely-used approach to ...
In recent years, the success of deep learning has inspired many research...
Gradient clipping is commonly used in training deep neural networks part...