Stabilized training of joint energy-based models and their practical applications

03/07/2023
by   Martin Sustek, et al.
0

The recently proposed Joint Energy-based Model (JEM) interprets discriminatively trained classifier p(y|x) as an energy model, which is also trained as a generative model describing the distribution of the input observations p(x). The JEM training relies on "positive examples" (i.e. examples from the training data set) as well as on "negative examples", which are samples from the modeled distribution p(x) generated by means of Stochastic Gradient Langevin Dynamics (SGLD). Unfortunately, SGLD often fails to deliver negative samples of sufficient quality during the standard JEM training, which causes a very unbalanced contribution from the positive and negative examples when calculating gradients for JEM updates. As a consequence, the standard JEM training is quite unstable requiring careful tuning of hyper-parameters and frequent restarts when the training starts diverging. This makes it difficult to apply JEM to different neural network architectures, modalities, and tasks. In this work, we propose a training procedure that stabilizes SGLD-based JEM training (ST-JEM) by balancing the contribution from the positive and negative examples. We also propose to add an additional "regularization" term to the training objective – MI between the input observations x and output labels y – which encourages the JEM classifier to make more certain decisions about output labels. We demonstrate the effectiveness of our approach on the CIFAR10 and CIFAR100 tasks. We also consider the task of classifying phonemes in a speech signal, for which we were not able to train JEM without the proposed stabilization. We show that a convincing speech can be generated from the trained model. Alternatively, corrupted speech can be de-noised by bringing it closer to the modeled speech distribution using a few SGLD iterations. We also propose and discuss additional applications of the trained model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/06/2019

Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One

We propose to reinterpret a standard discriminative classifier of p(y|x)...
research
08/26/2022

Learning From Positive and Unlabeled Data Using Observer-GAN

The problem of learning from positive and unlabeled data (A.K.A. PU lear...
research
09/10/2018

Addressing the Fundamental Tension of PCGML with Discriminative Learning

Procedural content generation via machine learning (PCGML) is typically ...
research
10/25/2022

Connective Reconstruction-based Novelty Detection

Detection of out-of-distribution samples is one of the critical tasks fo...
research
11/10/2022

The CRINGE Loss: Learning what language not to model

Standard language model training employs gold human documents or human-h...
research
07/06/2021

Contrastive Multimodal Fusion with TupleInfoNCE

This paper proposes a method for representation learning of multimodal d...
research
09/30/2019

Towards Diverse Paraphrase Generation Using Multi-Class Wasserstein GAN

Paraphrase generation is an important and challenging natural language p...

Please sign up or login with your details

Forgot password? Click here to reset