DeepAI
Log In Sign Up

Understanding the Mechanics of SPIGOT: Surrogate Gradients for Latent Structure Learning

10/05/2020
by   Tsvetomila Mihaylova, et al.
0

Latent structure models are a powerful tool for modeling language data: they can mitigate the error propagation and annotation bottleneck in pipeline systems, while simultaneously uncovering linguistic insights about the data. One challenge with end-to-end training of these models is the argmax operation, which has null gradient. In this paper, we focus on surrogate gradients, a popular strategy to deal with this problem. We explore latent structure learning through the angle of pulling back the downstream learning objective. In this paradigm, we discover a principled motivation for both the straight-through estimator (STE) as well as the recently-proposed SPIGOT - a variant of STE for structured models. Our perspective leads to new algorithms in the same family. We empirically compare the known and the novel pulled-back estimators against the popular alternatives, yielding new insight for practitioners and revealing intriguing failure cases.

READ FULL TEXT

page 1

page 2

page 3

page 4

07/24/2019

Notes on Latent Structure Models and SPIGOT

These notes aim to shed light on the recently proposed structured projec...
01/18/2023

Discrete Latent Structure in Neural Networks

Many types of data from fields including natural language processing, co...
09/03/2018

Towards Dynamic Computation Graphs via Sparse Latent Structure

Deep NLP models benefit from underlying structures in the data---e.g., p...
01/03/2022

Learning with Latent Structures in Natural Language Processing: A Survey

While end-to-end learning with fully differentiable models has enabled t...
07/28/2022

Latent Properties of Lifelong Learning Systems

Creating artificial intelligence (AI) systems capable of demonstrating l...
10/28/2021

Leveraging Recursive Gumbel-Max Trick for Approximate Inference in Combinatorial Spaces

Structured latent variables allow incorporating meaningful prior knowled...
06/15/2020

Gradient Estimation with Stochastic Softmax Tricks

The Gumbel-Max trick is the basis of many relaxed gradient estimators. T...