Wasserstein GANs with Gradient Penalty Compute Congested Transport
Wasserstein GANs with Gradient Penalty (WGAN-GP) are an extremely popular method for training generative models to produce high quality synthetic data. While WGAN-GP were initially developed to calculate the Wasserstein 1 distance between generated and real data, recent works (e.g. Stanczuk et al. (2021)) have provided empirical evidence that this does not occur, and have argued that WGAN-GP perform well not in spite of this issue, but because of it. In this paper we show for the first time that WGAN-GP compute the minimum of a different optimal transport problem, the so-called congested transport (Carlier et al. (2008)). Congested transport determines the cost of moving one distribution to another under a transport model that penalizes congestion. For WGAN-GP, we find that the congestion penalty has a spatially varying component determined by the sampling strategy used in Gulrajani et al. (2017) which acts like a local speed limit, making congestion cost less in some regions than others. This aspect of the congested transport problem is new in that the congestion penalty turns out to be unbounded and depend on the distributions to be transported, and so we provide the necessary mathematical proofs for this setting. We use our discovery to show that the gradients of solutions to the optimization problem in WGAN-GP determine the time averaged momentum of optimal mass flow. This is in contrast to the gradients of Kantorovich potentials for the Wasserstein 1 distance, which only determine the normalized direction of flow. This may explain, in support of Stanczuk et al. (2021), the success of WGAN-GP, since the training of the generator is based on these gradients.
READ FULL TEXT 
  
  
     share
 share