Convergence of SGD for Training Neural Networks with Sliced Wasserstein Losses

07/21/2023
by   Eloi Tanguy, et al.
0

Optimal Transport has sparked vivid interest in recent years, in particular thanks to the Wasserstein distance, which provides a geometrically sensible and intuitive way of comparing probability measures. For computational reasons, the Sliced Wasserstein (SW) distance was introduced as an alternative to the Wasserstein distance, and has seen uses for training generative Neural Networks (NNs). While convergence of Stochastic Gradient Descent (SGD) has been observed practically in such a setting, there is to our knowledge no theoretical guarantee for this observation. Leveraging recent works on convergence of SGD on non-smooth and non-convex functions by Bianchi et al. (2022), we aim to bridge that knowledge gap, and provide a realistic context under which fixed-step SGD trajectories for the SW loss on NN parameters converge. More precisely, we show that the trajectories approach the set of (sub)-gradient flow equations as the step decreases. Under stricter assumptions, we show a much stronger convergence result for noised and projected SGD schemes, namely that the long-run limits of the trajectories approach a set of generalised critical points of the loss function.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/19/2023

Properties of Discrete Sliced Wasserstein Losses

The Sliced Wasserstein (SW) distance has become a popular alternative to...
research
04/18/2023

Convergence of stochastic gradient descent under a local Lajasiewicz condition for deep neural networks

We extend the global convergence result of Chatterjee <cit.> by consider...
research
02/16/2021

Convergence of stochastic gradient descent schemes for Lojasiewicz-landscapes

In this article, we consider convergence of stochastic gradient descent ...
research
03/06/2023

Critical Points and Convergence Analysis of Generative Deep Linear Networks Trained with Bures-Wasserstein Loss

We consider a deep matrix factorization model of covariance matrices tra...
research
07/11/2023

Measure transfer via stochastic slicing and matching

This paper studies iterative schemes for measure transfer and approximat...
research
06/12/2020

Projection Robust Wasserstein Distance and Riemannian Optimization

Projection robust Wasserstein (PRW) distance, or Wasserstein projection ...
research
08/13/2019

On the Convergence of AdaBound and its Connection to SGD

Adaptive gradient methods such as Adam have gained extreme popularity du...

Please sign up or login with your details

Forgot password? Click here to reset