Individual prediction reliability is key in safety-critical applications of machine learning (ML) in healthcare, industrial controls, and autonomy. To provide this reliability, the notion of epistemic classifiers (EC) was recently introduced in 
. EC is a classifier that can justify its belief using support/evidence from neighborhoods in multiple layers. EC additionally provides exemplar-based interpretability using those supporting instances. In this paper, we propose epistemic encoders, where we co-train a variational auto-encoders (VAE) and a classifier to construct a low-dimensional semantically-meaningful embedding. The neighborhood support from training instances is then computed in that embedding to overcome curse of dimensionality and enforce agreement in-distance and semantic similarity. The VAE can also provide reconstruction score at inference time, so we also use that as an additional support in the justification process.
uses autoencoder reconstruction error to either reject or reform potentially adversarial examples before the example is provided to a classifier. Unlike MagNet, where the autoencoder is trained independent of the classifier, our approach performs a joint training. In, a classifier was trained on the latent space of a VAE to generate adversarial attacks for generative models. Here, we use a similar architecture with co-training to defend the network. Since the support operator in ECs is non-differentiable, it makes it less vulnerable to, and computationally more expensive for, white-box attacks compared to MagNet.
We make the following contributions: (a) an approach to identify a semantically meaningful low-dimensional space for computing support using -distance; (b) introduce reconstruction quality as additional justification mechanism to identify uncertainties that neighborhood support cannot resolve by itself.
2.1 Epistemic Classifiers (EC)
EC provides an approach to enhance prediction reliability for a classifier, which builds on the theory of justified true belief from epistemology 
and extends it to neural networks (NN). Specifically, ECs link reliability of predictions on a test input to characteristics of the support gathered from hidden layers of the network. For a given test sample , ECs generate support for using training data as a mechanism to justify the class prediction for . The support enables ECs to characterize the input space into: regions of extrapolation (“I don’t know” or IDK), regions of confusion (“I may know” or IMK), and regions of trust (“I know” or IK). This enables annotating the classifier output (i.e. its belief) with IK, IMK, and IDK assertions (see Fig. 1 for illustration). Traditional EC uses neighborhood-based support across multiple layers of a NN to obtain this justification. The support in layer is defined as :
where is function that maps training input to its training label, is the activation value in the layer, and is the neighborhood operator over the training data-set .
The neighborhood operator is generally defined using computationally tractable -norm distance. However, as most state-of-the-art classification networks use cascaded convolutional layers, a -norm based distance metric for layer-activations does not necessarily reflect semantic or perceptual distance, especially in layers away from the output layer [2, 13]. Hence, use of -norm based support from early layers can lead to an ill-informed justification of the belief (or output), causing the Epistemic classifier to assert IMK or IDK frequently in real-world application. In response, we propose ECs that use VAE to construct a semantically meaningful embedding for support generation and to augment support with reconstruction loss of the autoencoder. Next, we formally introduce this extension to ECs and show how it imparts better prediction reliability to ECs.
2.2 Joint encoder-classifier approach
Owing to VAE’s excellent generative properties, we hypothesize that the VAE’s latent space is a perceptually meaningful space for support computation and retains ability to identify uncertainties without frequent IMK or IDK assertions. VAE are also known to produce unexpected reconstruction under adversarial attack , which further motivated use of VAE in this work. Our EC share the ‘encoder’ layers from VAE, as shown in Fig.2, and is jointly trained with the modified VAE.
VAEs use learned approximate Bayesian inference to generate a sample that is similar to the training set[2, 1, 6]. It indirectly maximizes the model distribution by minimizing the upper-bound of its negative log-likelihood [6, 1, 11]. In our approach we add another term to the upper-bound that captures the classification decoding capacity of learned model distribution. As shown in Fig.2, lets assume
is the latent code vector of VAE,is the encoding distribution, is the decoding distribution,
where is classification decoding distribution, is the true classification decoding (known for training samples) and is a scalar weight for the classification loss. The first three terms are from , where first term can be interpreted as reconstruction loss of the VAE. Rest of the terms can be seen as regularization terms for model optimization . Second term (CrossEntropy(
)) encourages the posterior and prior to approach each other, which are chosen to be multivariate Gaussian in VAE. Third term encourages the posterior to have non-zero variance, which should be helpful in avoiding badly scaled gradients during back propagation. The fourth term captures the categorical cross-entropy for classification and can be seen as another regularization term that pushes the VAE model to parameterize latent codes to be semantically-meaningful and suitable for classification.
2.3 Justification: Support and Reconstruction
We consider both the quality of reconstructed outputs as well as the support for justification in our EC. Unlike , we use only the latent code of our VAE to compute the support for input . Specifically, we use eq.(1) to compute support , where has only one value corresponding to the encoder output with the latent code . Our neighborhood operator is defined using -NN neighborhood operation that identifies nearest (-norm) training samples for an input.
We use two different image dis-similarity metrics to estimate the lossin the reconstructed output as compared to the input: mean-square error (MSE) and Structural Similarity Index (SSIM) . Function is used to identify the quality of reconstruction , using thresholds . Reconstruction quality is identified as ‘Good’ by function if both of the losses are lower than corresponding thresholds, otherwise it is considered ‘Bad’. Next we construct justification operator as
where is an arbitrary element to reflect bad reconstruction quality. In other words, when the reconstruction error is low, mirrors and the Justification set remains unchanged. However, when the reconstruction loss is high, uncertainty in support is increased by adding an arbitrary element.
2.4 Algorithm and Implementation
Our EC is build using a trained modified-VAE network . As shown in Fig. 2, for an input network output has two elements, i.e. where is label output and is the reconstructed output from latent code . After the network is trained using the loss functions described in Sec. 2.2, the evidence for justification is derived from the training set itself, as described in Algorithm. 1. We extract latent codes across training set and construct a ball-tree using a defined distance metric, which is represented by NeighborSearchTree function. Ball-tree is used for nearest neighbor search and use -metric for computing distances. JustificationParameters is a function that selects parameters for support operators. It selects the value of to define the neighborhood for the support in latent space. In addition, it also computes a set of thresholds using using percentile of each metric across the validation set. Validation set is used to select value for and .
During the inference stage, a testing sample is used with proposed EC according to Algorithm 2. The belief of the classifier is same as the classifier output . The support of the input is computed by function , which uses its latent code and to find nearest neighbors in training set. Function quantifies the reconstruction quality using SSIM and MSE as described above. Justification of the belief is then computed as per eq. (3) using support and reconstruction quality. This justification and belief is used to obtain the justified belief of our classifier. Note that justification set can be used to provide interpretable exemplars as evidence for the belief.
. The classifier output is obtained by adding a single dense layer connection with ReLU activation to the latent vectorobtained from VAE. For classification part of the network, we use only the mean part of the latent code, which is a 16-length vector in our modified VAE.
3 Experiments and Results
We demonstrate usefulness of our approach with several data-sets (MNIST , Fashion-MNIST  and German Traffic Sign Recognition Benchmark (GTSRB) ) and test prediction reliability under various perturbations and adversarial attacks . Fig. 3 shows t-SNE  visualization of the latent code learned by our modified VAE for different datasets. It shows good separability across classes which is meaningful when computing support using -norm distance metric and provides an effective computing space irrespective of dimensionality of inputs. Fig. 4 shows example of reconstruction and support for an input, demonstrating quality of the computed support. Fig. 5 shows a few IMK examples from MNIST test set and reconstruction for some inputs that were perturbed by BIM attack . In adversarial attack cases, the reconstruction loss is high, which allows our EC to detect the attack.
Similar to 
, we use augmented confusion matrix (ACM) to quantify performance of EC when testing with a dataset. ACM consists of three sub-matrices, where each sub-matrix is a confusion matrix for predicted label versus true label under assertion of IK (top), IMK (middle), and IDK (bottom). In this work, we will use following three metrics derived from ACM to quantify performance: Coverage or fraction of IK (), accuracy over IK samples (), and accuracy over non-IK samples (). We use values of (, %) as follows MNIST: (10, 99%), Fashion-MNIST:(50, 90%), GTSRB:(50, 70%). Our justification approach is also compared to a baseline technique that uses thresholding on softmax outputs of the classifier. For fair comparison, baseline thresholds were chosen to match to our combined approach. For adversarial attacks, we use FGSM  and BIM  with attack magnitude of 0.2 on the classifier outputs. We also study effect of perturbation of inputs using uniform noise in range . For training, MNIST and Fashion-MNIST image were scaled to the range of . Similar to , we group GTSRB traffic sign dataset into eight types of traffic signs: speed limit, no passing, end of restriction, warning, priority, yield, stop and direction; yielding 34799 training, 4410 validation and 12630 testing images.
Fig. 6 show ACMs with different forms of justification for an expanded MNIST test set. The expanded MNIST test set was created by first perturbing each sample from the original MNIST test set with adversarial BIM-attack of 0.2 magnitude  and then appending these BIM perturbed samples to the original MNIST test. This resulted into expanded MNIST test set with twice as many samples as original MNIST test set, where one half of the samples were BIM-attacked. A reliable classifier should achieve high accuracy for IK samples () along with high coverage (). Baseline justification approach using softmax thresholding (fig. 6a) achieves highest coverage of 73%, however, with low IK-accuracy of 0.68. It also shows relatively high accuracy over non-IK samples (=0.43), which indicates that the used justification approach is sub-optimal, possibly resulting into several false negatives while identifying IK samples. We achieve slightly better IK-accuracy of 0.79, when only support is used for justification (fig. 6b), however it is unable to identify several attacks. Reconstruction based justification (fig. 6c) achieves higher IK-accuracy, however we achieve highest IK-accuracy of 0.99 when both support and reconstruction is used for justification (fig. 6d). Our combined approach also achieves a coverage of 53%, which is reasonable given that half of the samples of expanded MNIST test set were BIM-attacked samples. Note that when only reconstruction is used for justification, we define support in eq. 3 as , where is the class prediction for input . This implies that we cannot assert any belief as IDK when using reconstruction-only justification, as seen in fig. 6c.
Table 1 compares performance using all datasets with and without adversarial attack. Baseline achieves good and for nominal testset but performs poorly with attacks (high with low ), which indicates that several attack samples were incorrectly classified with high confidence. Our combined approach achieves high with good for nominal testset and shows excellent identification of attacks by asserting samples as IMK/IDK (almost 100% for FGSM and highest for BIM) across all data-sets. Results for BIM-attack shows that for combined approach is better than and reconstruction combined. This seems to suggest that and reconstruction provide complementary information, as hypothesized. When inputs are perturbed by uniform noise baseline performs well however support based approach achieves higher with similar accuracy.
4 Discussion and Conclusion
In practice un-trustworthy (IDK/IMK) assertion from EC can be used to seek help from an expert and these parameters can be tuned to match the desired frequency to seek expert’s help. Further, assertion of un-trustworthy classification into IMK and IDK can reduce the expert’s effort to identify challenging cases. The computed support also provides a mechanism for obtaining training examples that the classifier beliefs is similar to test sample, which can be used for interpretability purposes.
Our current framework does not enforce any posterior distribution for each class in the latent space, which can change the ideal choice of support size for each class. We plan to use approach described in  to enforce similar distribution for each class for more uniform effect of neighborhood size. Our reconstructed images are smooth in nature, similar to other VAE , which results in large SSIM loss and explains the low in presence of large noise in table 1. In future, we will explore other dis-similarity metrics to address this issue. Presented results show somewhat robust performance to gray-box or semi-white-box attack on the classifier. In future work, white-box attack on both classification and reconstruction will be studied.
In conclusion, we propose an Epistemic Classifier (EC) that can assert its belief based on justification from training set and shows robust performance to adversarial attacks. Our EC obtains semantically-meaningful latent space using modified VAE for support generation and uses reconstruction as an additional justification mechanism.
-  (2016) Tutorial on variational autoencoders. arXiv preprint arXiv:1606.05908. Cited by: §2.2.
-  (2016) Deep learning. MIT press. Cited by: §2.1, §2.2, §4.
-  (2014) Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572. Cited by: §3.
-  (2016) Deep residual learning for image recognition. In , pp. 770–778. Cited by: Figure 2, §2.4.
-  (2001) The analysis of knowledge. The Stanford encyclopedia of philosophy. External Links: Cited by: §2.1.
-  (2013) Auto-Encoding Variational Bayes. In Proc. of the International Conference on Learning Representations, Cited by: §2.2.
-  (2018-05) Adversarial examples for generative models. In IEEE Security and Privacy Workshops (SPW), pp. 36–42. External Links: Cited by: §1, §2.2.
-  (2016) Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236. Cited by: §3, §3, §3.
The MNIST database of handwritten digits. http://yann. lecun. com/exdb/mnist/. Cited by: §3.
-  (2008) Visualizing data using t-SNE. Journal of machine learning research 9 (Nov), pp. 2579–2605. Cited by: §3.
-  (2016) Adversarial Autoencoders. In International Conference on Learning Representations, External Links: Cited by: §2.2, §4.
-  (2017) Magnet: a two-pronged defense against adversarial examples. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 135–147. Cited by: §1.
Deep convolutional neural networks for image classification: a comprehensive review. Neural computation 29 (9), pp. 2352–2449. Cited by: §2.1.
-  (2012) Man vs. computer: benchmarking machine learning algorithms for traffic sign recognition. Neural networks 32, pp. 323–332. Cited by: §3.
Inception-v4, inception-resnet and the impact of residual connections on learning. In
Thirty-first AAAI conference on artificial intelligence, Cited by: §2.4.
-  (2019) Justification-based reliability in machine learning. arXiv preprint arXiv:1911.07391. Cited by: §1, §2.1, §2.3, §3.
-  (2004) Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13 (4), pp. 600–612. Cited by: §2.3.
-  (2017) Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747. Cited by: §3.