Sparse Continuous Distributions and Fenchel-Young Losses

08/04/2021
by   André F. T. Martins, et al.
0

Exponential families are widely used in machine learning; they include many distributions in continuous and discrete domains (e.g., Gaussian, Dirichlet, Poisson, and categorical distributions via the softmax transformation). Distributions in each of these families have fixed support. In contrast, for finite domains, there has been recent works on sparse alternatives to softmax (e.g. sparsemax, α-entmax, and fusedmax) and corresponding losses, which have varying support. This paper expands that line of work in several directions: first, it extends Ω-regularized prediction maps and Fenchel-Young losses to arbitrary domains (possibly countably infinite or continuous). For linearly parametrized families, we show that minimization of Fenchel-Young losses is equivalent to moment matching of the statistics, generalizing a fundamental property of exponential families. When Ω is a Tsallis negentropy with parameter α, we obtain "deformed exponential families," which include α-entmax and sparsemax (α = 2) as particular cases. For quadratic energy functions in continuous domains, the resulting densities are β-Gaussians, an instance of elliptical distributions that contain as particular cases the Gaussian, biweight, triweight and Epanechnikov densities, and for which we derive closed-form expressions for the variance, Tsallis entropy, and Fenchel-Young loss. When Ω is a total variation or Sobolev regularizer, we obtain a continuous version of the fusedmax. Finally, we introduce continuous-domain attention mechanisms, deriving efficient gradient backpropagation algorithms for α∈{1, 4/3, 3/2, 2}. Using them, we demonstrate our sparse continuous distributions for attention-based audio classification and visual question answering, showing that they allow attending to time intervals and compact regions.

READ FULL TEXT

page 29

page 32

page 33

research
06/12/2020

Sparse and Continuous Attention Mechanisms

Exponential families are widely used in machine learning; they include m...
research
11/01/2021

Kernel Deformed Exponential Families for Sparse Continuous Attention

Attention mechanisms take an expectation of a data representation with r...
research
04/07/2021

Multimodal Continuous Visual Attention Mechanisms

Visual attention mechanisms are a key component of neural network models...
research
11/04/2018

A method to construct exponential families by representation theory

In this paper, we give a method to construct "good" exponential families...
research
05/01/2019

Total positivity in structured binary distributions

We study binary distributions that are multivariate totally positive of ...
research
05/30/2019

Sinkhorn Barycenters with Free Support via Frank-Wolfe Algorithm

We present a novel algorithm to estimate the barycenter of arbitrary pro...
research
12/19/2019

Invertible Gaussian Reparameterization: Revisiting the Gumbel-Softmax

The Gumbel-Softmax is a continuous distribution over the simplex that is...

Please sign up or login with your details

Forgot password? Click here to reset