Scaling Matters in Deep Structured-Prediction Models

02/28/2019
by   Aleksandr Shevchenko, et al.
0

Deep structured-prediction energy-based models combine the expressive power of learned representations and the ability of embedding knowledge about the task at hand into the system. A common way to learn parameters of such models consists in a multistage procedure where different combinations of components are trained at different stages. The joint end-to-end training of the whole system is then done as the last fine-tuning stage. This multistage approach is time-consuming and cumbersome as it requires multiple runs until convergence and multiple rounds of hyperparameter tuning. From this point of view, it is beneficial to start the joint training procedure from the beginning. However, such approaches often unexpectedly fail and deliver results worse than the multistage ones. In this paper, we hypothesize that one reason for joint training of deep energy-based models to fail is the incorrect relative normalization of different components in the energy function. We propose online and offline scaling algorithms that fix the joint training and demonstrate their efficacy on three different tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/07/2019

Improving Joint Training of Inference Networks and Structured Prediction Energy Networks

Deep energy-based models are powerful, but pose challenges for learning ...
research
03/16/2017

End-to-End Learning for Structured Prediction Energy Networks

Structured Prediction Energy Networks (SPENs) are a simple, yet expressi...
research
08/07/2021

NASOA: Towards Faster Task-oriented Online Fine-tuning with a Zoo of Models

Fine-tuning from pre-trained ImageNet models has been a simple, effectiv...
research
08/27/2021

Learning Energy-Based Approximate Inference Networks for Structured Applications in NLP

Structured prediction in natural language processing (NLP) has a long hi...
research
10/21/2016

End-to-End Training Approaches for Discriminative Segmental Models

Recent work on discriminative segmental models has shown that they can a...
research
10/02/2018

Learning Discriminators as Energy Networks in Adversarial Learning

We propose a novel framework for structured prediction via adversarial l...
research
03/20/2023

Unit Scaling: Out-of-the-Box Low-Precision Training

We present unit scaling, a paradigm for designing deep learning models t...

Please sign up or login with your details

Forgot password? Click here to reset