Variational Encoder-based Reliable Classification

02/19/2020 ∙ by Chitresh Bhushan, et al. ∙ 0

Machine learning models provide statistically impressive results which might be individually unreliable. To provide reliability, we propose an Epistemic Classifier (EC) that can provide justification of its belief using support from the training dataset as well as quality of reconstruction. Our approach is based on modified variational auto-encoders that can identify a semantically meaningful low-dimensional space where perceptually similar instances are close in ℓ_2-distance too. Our results demonstrate improved reliability of predictions and robust identification of samples with adversarial attacks as compared to baseline of softmax-based thresholding.



There are no comments yet.


page 2

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Individual prediction reliability is key in safety-critical applications of machine learning (ML) in healthcare, industrial controls, and autonomy. To provide this reliability, the notion of epistemic classifiers (EC) was recently introduced in [16]

. EC is a classifier that can justify its belief using support/evidence from neighborhoods in multiple layers. EC additionally provides exemplar-based interpretability using those supporting instances. In this paper, we propose epistemic encoders, where we co-train a variational auto-encoders (VAE) and a classifier to construct a low-dimensional semantically-meaningful embedding. The neighborhood support from training instances is then computed in that embedding to overcome curse of dimensionality and enforce agreement in

-distance and semantic similarity. The VAE can also provide reconstruction score at inference time, so we also use that as an additional support in the justification process.

MagNet [12]

uses autoencoder reconstruction error to either reject or reform potentially adversarial examples before the example is provided to a classifier. Unlike MagNet, where the autoencoder is trained independent of the classifier, our approach performs a joint training. In 

[7], a classifier was trained on the latent space of a VAE to generate adversarial attacks for generative models. Here, we use a similar architecture with co-training to defend the network. Since the support operator in ECs is non-differentiable, it makes it less vulnerable to, and computationally more expensive for, white-box attacks compared to MagNet.

We make the following contributions: (a) an approach to identify a semantically meaningful low-dimensional space for computing support using -distance; (b) introduce reconstruction quality as additional justification mechanism to identify uncertainties that neighborhood support cannot resolve by itself.

2 Method

2.1 Epistemic Classifiers (EC)

Figure 1: Illustration of region of: trust (IK0, IK1), confusion (IMK), and extrapolation (IDK) for 2D-input binary classification with Epistemic Classifier using a base NN classifier.

EC provides an approach to enhance prediction reliability for a classifier, which builds on the theory of justified true belief from epistemology [5]

and extends it to neural networks (NN) 

[16]. Specifically, ECs link reliability of predictions on a test input to characteristics of the support gathered from hidden layers of the network. For a given test sample , ECs generate support for using training data as a mechanism to justify the class prediction for . The support enables ECs to characterize the input space into: regions of extrapolation (“I don’t know” or IDK), regions of confusion (“I may know” or IMK), and regions of trust (“I know” or IK). This enables annotating the classifier output (i.e. its belief) with IK, IMK, and IDK assertions (see Fig. 1 for illustration). Traditional EC uses neighborhood-based support across multiple layers of a NN to obtain this justification. The support in layer is defined as [16]:


where is function that maps training input to its training label, is the activation value in the layer, and is the neighborhood operator over the training data-set .

The neighborhood operator is generally defined using computationally tractable -norm distance. However, as most state-of-the-art classification networks use cascaded convolutional layers, a -norm based distance metric for layer-activations does not necessarily reflect semantic or perceptual distance, especially in layers away from the output layer [2, 13]. Hence, use of -norm based support from early layers can lead to an ill-informed justification of the belief (or output), causing the Epistemic classifier to assert IMK or IDK frequently in real-world application. In response, we propose ECs that use VAE to construct a semantically meaningful embedding for support generation and to augment support with reconstruction loss of the autoencoder. Next, we formally introduce this extension to ECs and show how it imparts better prediction reliability to ECs.

2.2 Joint encoder-classifier approach

Figure 2: Outline of our modified VAE architecture that is co-trained with the classifier. ‘RB’ stands for the standard residual block as described in [4].

Owing to VAE’s excellent generative properties, we hypothesize that the VAE’s latent space is a perceptually meaningful space for support computation and retains ability to identify uncertainties without frequent IMK or IDK assertions. VAE are also known to produce unexpected reconstruction under adversarial attack [7], which further motivated use of VAE in this work. Our EC share the ‘encoder’ layers from VAE, as shown in Fig.2, and is jointly trained with the modified VAE.

VAEs use learned approximate Bayesian inference to generate a sample that is similar to the training set

[2, 1, 6]. It indirectly maximizes the model distribution by minimizing the upper-bound of its negative log-likelihood [6, 1, 11]. In our approach we add another term to the upper-bound that captures the classification decoding capacity of learned model distribution. As shown in Fig.2, lets assume

is the latent code vector of VAE,

is the encoding distribution, is the decoding distribution,

is the prior normal distribution imposed in VAE, then we minimize following modified loss function for training the model.

(Reconstruction) (2)

where is classification decoding distribution, is the true classification decoding (known for training samples) and is a scalar weight for the classification loss. The first three terms are from [6], where first term can be interpreted as reconstruction loss of the VAE. Rest of the terms can be seen as regularization terms for model optimization [11]. Second term (CrossEntropy(

)) encourages the posterior and prior to approach each other, which are chosen to be multivariate Gaussian in VAE. Third term encourages the posterior to have non-zero variance, which should be helpful in avoiding badly scaled gradients during back propagation. The fourth term captures the categorical cross-entropy for classification and can be seen as another regularization term that pushes the VAE model to parameterize latent codes to be semantically-meaningful and suitable for classification.

2.3 Justification: Support and Reconstruction

We consider both the quality of reconstructed outputs as well as the support for justification in our EC. Unlike [16], we use only the latent code of our VAE to compute the support for input . Specifically, we use eq.(1) to compute support , where has only one value corresponding to the encoder output with the latent code . Our neighborhood operator is defined using -NN neighborhood operation that identifies nearest (-norm) training samples for an input.

We use two different image dis-similarity metrics to estimate the loss

in the reconstructed output as compared to the input: mean-square error (MSE) and Structural Similarity Index (SSIM) [17]. Function is used to identify the quality of reconstruction , using thresholds . Reconstruction quality is identified as ‘Good’ by function if both of the losses are lower than corresponding thresholds, otherwise it is considered ‘Bad’. Next we construct justification operator as


where is an arbitrary element to reflect bad reconstruction quality. In other words, when the reconstruction error is low, mirrors and the Justification set remains unchanged. However, when the reconstruction loss is high, uncertainty in support is increased by adding an arbitrary element.

2.4 Algorithm and Implementation

1:training set , validation set
2:trained modified VAE network
3:distance metrics for latent code
4: Extract latent code for training set
Algorithm 1 – Training VAE-based Epistemic Classifier
1:Test input
2:Epistemic classifier
3: Class & reconstruction predictions
4: Extract latent code for input
5: support of in latent code
6:Get justification using and Eq. (3)
7:if  then
8:     output
9:else if  then proper subset
10:     output
11:else implies
12:     output
13:end if
14:return output
Algorithm 2 – Inference with our Epistemic Classifier

Our EC is build using a trained modified-VAE network . As shown in Fig. 2, for an input network output has two elements, i.e. where is label output and is the reconstructed output from latent code . After the network is trained using the loss functions described in Sec. 2.2, the evidence for justification is derived from the training set itself, as described in Algorithm. 1. We extract latent codes across training set and construct a ball-tree using a defined distance metric, which is represented by NeighborSearchTree function. Ball-tree is used for nearest neighbor search and use -metric for computing distances. JustificationParameters is a function that selects parameters for support operators. It selects the value of to define the neighborhood for the support in latent space. In addition, it also computes a set of thresholds using using percentile of each metric across the validation set. Validation set is used to select value for and .

During the inference stage, a testing sample is used with proposed EC according to Algorithm 2. The belief of the classifier is same as the classifier output . The support of the input is computed by function , which uses its latent code and to find nearest neighbors in training set. Function quantifies the reconstruction quality using SSIM and MSE as described above. Justification of the belief is then computed as per eq. (3) using support and reconstruction quality. This justification and belief is used to obtain the justified belief of our classifier. Note that justification set can be used to provide interpretable exemplars as evidence for the belief.

We use residual blocks [15, 4] in encoder and decoder part of our modified VAE shown in Fig. 2

. The classifier output is obtained by adding a single dense layer connection with ReLU activation to the latent vector

obtained from VAE. For classification part of the network, we use only the mean part of the latent code, which is a 16-length vector in our modified VAE.

3 Experiments and Results

Figure 3: t-SNE visualization of the latent code (length-16) for (left) MNIST, (middle) Fashion-MNIST and (right) GTSRB dataset.
Figure 4: Example of reconstruction along with input’s computed support from training set. Support’s reconstruction is also shown in bottom row.
Figure 5: Examples of (a) confusion (IMK) along with their EC predictions, and (b) inputs affected by adversarial attacks along with corresponding reconstructed images.
Table 1: Performance of Epistemic Classifier on different dataset with different perturbation. Baseline is the performance using softmax thresholding. Base model test accuracy on nominal data is provided in the first column.

We demonstrate usefulness of our approach with several data-sets (MNIST [9], Fashion-MNIST [18] and German Traffic Sign Recognition Benchmark (GTSRB) [14]) and test prediction reliability under various perturbations and adversarial attacks [8]. Fig. 3 shows t-SNE [10] visualization of the latent code learned by our modified VAE for different datasets. It shows good separability across classes which is meaningful when computing support using -norm distance metric and provides an effective computing space irrespective of dimensionality of inputs. Fig. 4 shows example of reconstruction and support for an input, demonstrating quality of the computed support. Fig. 5 shows a few IMK examples from MNIST test set and reconstruction for some inputs that were perturbed by BIM attack [8]. In adversarial attack cases, the reconstruction loss is high, which allows our EC to detect the attack.

Similar to [16]

, we use augmented confusion matrix (ACM) to quantify performance of EC when testing with a dataset. ACM consists of three sub-matrices, where each sub-matrix is a confusion matrix for predicted label versus true label under assertion of IK (top), IMK (middle), and IDK (bottom). In this work, we will use following three metrics derived from ACM to quantify performance: Coverage or fraction of IK (

), accuracy over IK samples (), and accuracy over non-IK samples (). We use values of (, %) as follows MNIST: (10, 99%), Fashion-MNIST:(50, 90%), GTSRB:(50, 70%). Our justification approach is also compared to a baseline technique that uses thresholding on softmax outputs of the classifier. For fair comparison, baseline thresholds were chosen to match to our combined approach. For adversarial attacks, we use FGSM [3] and BIM [8] with attack magnitude of 0.2 on the classifier outputs. We also study effect of perturbation of inputs using uniform noise in range . For training, MNIST and Fashion-MNIST image were scaled to the range of . Similar to [16], we group GTSRB traffic sign dataset into eight types of traffic signs: speed limit, no passing, end of restriction, warning, priority, yield, stop and direction; yielding 34799 training, 4410 validation and 12630 testing images.

Figure 6: Augmented confusion matrices (ACM) for expanded MNIST test-set using (a) Baseline, (b) only support-based, (c) only reconstruction-based, and (d) combined support & reconstruction based justification. Half of the samples of expanded MNIST test-set were perturbed by BIM-attack of 0.2 magnitude (see text for more details).

Fig. 6 show ACMs with different forms of justification for an expanded MNIST test set. The expanded MNIST test set was created by first perturbing each sample from the original MNIST test set with adversarial BIM-attack of 0.2 magnitude [8] and then appending these BIM perturbed samples to the original MNIST test. This resulted into expanded MNIST test set with twice as many samples as original MNIST test set, where one half of the samples were BIM-attacked. A reliable classifier should achieve high accuracy for IK samples () along with high coverage (). Baseline justification approach using softmax thresholding (fig. 6a) achieves highest coverage of 73%, however, with low IK-accuracy of 0.68. It also shows relatively high accuracy over non-IK samples (=0.43), which indicates that the used justification approach is sub-optimal, possibly resulting into several false negatives while identifying IK samples. We achieve slightly better IK-accuracy of 0.79, when only support is used for justification (fig. 6b), however it is unable to identify several attacks. Reconstruction based justification (fig. 6c) achieves higher IK-accuracy, however we achieve highest IK-accuracy of 0.99 when both support and reconstruction is used for justification (fig. 6d). Our combined approach also achieves a coverage of 53%, which is reasonable given that half of the samples of expanded MNIST test set were BIM-attacked samples. Note that when only reconstruction is used for justification, we define support in eq. 3 as , where is the class prediction for input . This implies that we cannot assert any belief as IDK when using reconstruction-only justification, as seen in fig. 6c.

Table 1 compares performance using all datasets with and without adversarial attack. Baseline achieves good and for nominal testset but performs poorly with attacks (high with low ), which indicates that several attack samples were incorrectly classified with high confidence. Our combined approach achieves high with good for nominal testset and shows excellent identification of attacks by asserting samples as IMK/IDK (almost 100% for FGSM and highest for BIM) across all data-sets. Results for BIM-attack shows that for combined approach is better than and reconstruction combined. This seems to suggest that and reconstruction provide complementary information, as hypothesized. When inputs are perturbed by uniform noise baseline performs well however support based approach achieves higher with similar accuracy.

4 Discussion and Conclusion

In practice un-trustworthy (IDK/IMK) assertion from EC can be used to seek help from an expert and these parameters can be tuned to match the desired frequency to seek expert’s help. Further, assertion of un-trustworthy classification into IMK and IDK can reduce the expert’s effort to identify challenging cases. The computed support also provides a mechanism for obtaining training examples that the classifier beliefs is similar to test sample, which can be used for interpretability purposes.

Our current framework does not enforce any posterior distribution for each class in the latent space, which can change the ideal choice of support size for each class. We plan to use approach described in [11] to enforce similar distribution for each class for more uniform effect of neighborhood size. Our reconstructed images are smooth in nature, similar to other VAE [2], which results in large SSIM loss and explains the low in presence of large noise in table 1. In future, we will explore other dis-similarity metrics to address this issue. Presented results show somewhat robust performance to gray-box or semi-white-box attack on the classifier. In future work, white-box attack on both classification and reconstruction will be studied.

In conclusion, we propose an Epistemic Classifier (EC) that can assert its belief based on justification from training set and shows robust performance to adversarial attacks. Our EC obtains semantically-meaningful latent space using modified VAE for support generation and uses reconstruction as an additional justification mechanism.


  • [1] C. Doersch (2016) Tutorial on variational autoencoders. arXiv preprint arXiv:1606.05908. Cited by: §2.2.
  • [2] I. Goodfellow, Y. Bengio, and A. Courville (2016) Deep learning. MIT press. Cited by: §2.1, §2.2, §4.
  • [3] I. J. Goodfellow, J. Shlens, and C. Szegedy (2014) Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572. Cited by: §3.
  • [4] K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In

    Proceedings of the IEEE conference on computer vision and pattern recognition

    pp. 770–778. Cited by: Figure 2, §2.4.
  • [5] J. J. Ichikawa and M. Steup (2001) The analysis of knowledge. The Stanford encyclopedia of philosophy. External Links: Link Cited by: §2.1.
  • [6] D. P. Kingma and M. Welling (2013) Auto-Encoding Variational Bayes. In Proc. of the International Conference on Learning Representations, Cited by: §2.2.
  • [7] J. Kos, I. Fischer, and D. Song (2018-05) Adversarial examples for generative models. In IEEE Security and Privacy Workshops (SPW), pp. 36–42. External Links: Document, ISSN null Cited by: §1, §2.2.
  • [8] A. Kurakin, I. Goodfellow, and S. Bengio (2016) Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236. Cited by: §3, §3, §3.
  • [9] Y. LeCun (1998)

    The MNIST database of handwritten digits

    http://yann. lecun. com/exdb/mnist/. Cited by: §3.
  • [10] L. v. d. Maaten and G. Hinton (2008) Visualizing data using t-SNE. Journal of machine learning research 9 (Nov), pp. 2579–2605. Cited by: §3.
  • [11] A. Makhzani, J. Shlens, N. Jaitly, and I. Goodfellow (2016) Adversarial Autoencoders. In International Conference on Learning Representations, External Links: Link Cited by: §2.2, §4.
  • [12] D. Meng and H. Chen (2017) Magnet: a two-pronged defense against adversarial examples. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 135–147. Cited by: §1.
  • [13] W. Rawat and Z. Wang (2017)

    Deep convolutional neural networks for image classification: a comprehensive review

    Neural computation 29 (9), pp. 2352–2449. Cited by: §2.1.
  • [14] J. Stallkamp, M. Schlipsing, J. Salmen, and C. Igel (2012) Man vs. computer: benchmarking machine learning algorithms for traffic sign recognition. Neural networks 32, pp. 323–332. Cited by: §3.
  • [15] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi (2017)

    Inception-v4, inception-resnet and the impact of residual connections on learning


    Thirty-first AAAI conference on artificial intelligence

    Cited by: §2.4.
  • [16] N. Virani, N. Iyer, and Z. Yang (2019) Justification-based reliability in machine learning. arXiv preprint arXiv:1911.07391. Cited by: §1, §2.1, §2.3, §3.
  • [17] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli (2004) Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13 (4), pp. 600–612. Cited by: §2.3.
  • [18] H. Xiao, K. Rasul, and R. Vollgraf (2017) Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747. Cited by: §3.