Understanding the Mechanics of SPIGOT: Surrogate Gradients for Latent Structure Learning

10/05/2020
by   Tsvetomila Mihaylova, et al.
0

Latent structure models are a powerful tool for modeling language data: they can mitigate the error propagation and annotation bottleneck in pipeline systems, while simultaneously uncovering linguistic insights about the data. One challenge with end-to-end training of these models is the argmax operation, which has null gradient. In this paper, we focus on surrogate gradients, a popular strategy to deal with this problem. We explore latent structure learning through the angle of pulling back the downstream learning objective. In this paradigm, we discover a principled motivation for both the straight-through estimator (STE) as well as the recently-proposed SPIGOT - a variant of STE for structured models. Our perspective leads to new algorithms in the same family. We empirically compare the known and the novel pulled-back estimators against the popular alternatives, yielding new insight for practitioners and revealing intriguing failure cases.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/24/2019

Notes on Latent Structure Models and SPIGOT

These notes aim to shed light on the recently proposed structured projec...
research
01/18/2023

Discrete Latent Structure in Neural Networks

Many types of data from fields including natural language processing, co...
research
09/03/2018

Towards Dynamic Computation Graphs via Sparse Latent Structure

Deep NLP models benefit from underlying structures in the data---e.g., p...
research
01/03/2022

Learning with Latent Structures in Natural Language Processing: A Survey

While end-to-end learning with fully differentiable models has enabled t...
research
07/28/2022

Latent Properties of Lifelong Learning Systems

Creating artificial intelligence (AI) systems capable of demonstrating l...
research
10/28/2021

Leveraging Recursive Gumbel-Max Trick for Approximate Inference in Combinatorial Spaces

Structured latent variables allow incorporating meaningful prior knowled...
research
02/27/2019

Alternating Synthetic and Real Gradients for Neural Language Modeling

Training recurrent neural networks (RNNs) with backpropagation through t...

Please sign up or login with your details

Forgot password? Click here to reset