Learning Invariances for Interpretability using Supervised VAE

07/15/2020
by   An-phi Nguyen, et al.
0

We propose to learn model invariances as a means of interpreting a model. This is motivated by a reverse engineering principle. If we understand a problem, we may introduce inductive biases in our model in the form of invariances. Conversely, when interpreting a complex supervised model, we can study its invariances to understand how that model solves a problem. To this end we propose a supervised form of variational auto-encoders (VAEs). Crucially, only a subset of the dimensions in the latent space contributes to the supervised task, allowing the remaining dimensions to act as nuisance parameters. By sampling solely the nuisance dimensions, we are able to generate samples that have undergone transformations that leave the classification unchanged, revealing the invariances of the model. Our experimental results show the capability of our proposed model both in terms of classification, and generation of invariantly transformed samples. Finally we show how combining our model with feature attribution methods it is possible to reach a more fine-grained understanding about the decision process of the model.

READ FULL TEXT
research
02/06/2020

Generating diverse and natural text-to-speech samples using a quantized fine-grained VAE and auto-regressive prosody prior

Recent neural text-to-speech (TTS) models with fine-grained latent featu...
research
02/06/2020

Fully-hierarchical fine-grained prosody modeling for interpretable speech synthesis

This paper proposes a hierarchical, fine-grained and interpretable laten...
research
04/09/2020

Exemplar VAEs for Exemplar based Generation and Data Augmentation

This paper presents a framework for exemplar based generative modeling, ...
research
03/20/2022

Attri-VAE: attribute-based, disentangled and interpretable representations of medical images with variational autoencoders

Deep learning (DL) methods where interpretability is intrinsically consi...
research
10/24/2022

Generating Hierarchical Explanations on Text Classification Without Connecting Rules

The opaqueness of deep NLP models has motivated the development of metho...
research
06/03/2021

Dissecting Generation Modes for Abstractive Summarization Models via Ablation and Attribution

Despite the prominence of neural abstractive summarization models, we kn...

Please sign up or login with your details

Forgot password? Click here to reset