Class Anchor Clustering: a Distance-based Loss for Training Open Set Classifiers

04/06/2020 ∙ by Dimity Miller, et al. ∙ 26

Existing open set classifiers distinguish between known and unknown inputs by measuring distance in a network's logit space, assuming that known inputs cluster closer to the training data than unknown inputs. However, this approach is typically applied post-hoc to networks trained with cross-entropy loss, which neither guarantees nor encourages the hoped-for clustering behaviour. To overcome this limitation, we introduce Class Anchor Clustering (CAC) loss. CAC is an entirely distance-based loss that explicitly encourages training data to form tight clusters around class-dependent anchor points in the logit space. We show that an open set classifier trained with CAC loss outperforms all state-of-the-art techniques on the challenging TinyImageNet dataset, achieving a 2.4 other state-of-the-art distance-based approaches on a number of further relevant datasets. We will make the code for CAC publicly available.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

page 12

page 13

page 14

page 19

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

(a) Training data.
(b) Network trained with cross-entropy loss.
(c) Network trained with our proposed CAC loss. Grey regions: recognised as unknown.
Figure 4: Consider a binary classification task with the training data shown in (a). A network trained with cross-entropy loss achieves high classification accuracy, but confidently classifies areas of input space far from the training data as a known class ((b)-left). In addition, it’s logit space ((b)-right) does not exhibit the unimodal clustering behaviour assumed by current distance-based open set methods [3, 30, 31]. In contrast, a network trained with our CAC loss produces tight clusters in the logit space ((c)-right), while still maintaining high classification accuracy. When combined with our distance-based measure, our network is able to identify areas of input and logit space falling far from the training data as unknown (grey regions in (c)-left and middle).

Many practically relevant applications require the deployment of trained models for visual perception under open set conditions, where object classes unseen during training can be encountered [19]

. This is especially true for applications in autonomous systems, driverless cars, and robotics. Deep convolutional neural networks (CNNs) have been shown to degrade in performance under open set conditions, tending to misclassify unknown inputs as a known class with high confidence 

[16, 8, 3, 4]. This has raised serious concerns about the safety of using CNNs in open set environments [1] – particularly on autonomous systems where perception failures may lead to serious consequences [26, 4].

The concept of open set recognition was introduced to extend object recognition to open set environments [19]. An open set classifier is expected to identify whether an input belongs to a known or unknown class, while maintaining the classification accuracy of a closed set classifier. In this paper, we propose a new distance-based loss that achieves state-of-the-art performance for distance-based open set classification.

Many recent open set classifiers model the distribution of known training data in the final layer, or logit space, of a CNN [3, 30, 31]. At test time, inputs that fall far from all known class distributions are considered to belong to an unknown class, as illustrated in Figure 4. While distance-based measures are a promising approach for open set classification, current methods typically apply this idea post-hoc to networks trained as closed set classifiers with cross-entropy loss [3, 30, 31]. This is problematic, as cross-entropy loss does not use a distance-based metric and thus does not encourage (nor guarantee) the clustering behaviour these methods seek to exploit. On the contrary, for complex datasets, the known training data may be widely spread across the logit space, form multiple clusters per class, or form clusters with shapes that are difficult to model, as illustrated on a simple example in Figure 4. These detrimental effects can negatively influence the open set performance of existing methods.

In this work, we introduce the Class Anchor Clustering (CAC) loss. CAC is a distance-based loss function that explicitly encourages the training data to form tight clusters around fixed, class-specific anchor points in the logit space, while maintaining large distances to all other class anchors. Our paper makes the following contributions:

  1. We show that training with a distance-based loss is highly beneficial to open set classification when using a distance-based measure to distinguish between known and unknown inputs during deployment.

  2. We introduce the concept of class anchors in logit space as an effective and simple strategy to train distance-based open set classifiers.

  3. A novel combination of distance-based loss terms is formulated as the Class Anchor Clustering loss and is shown to reach a new state-of-the-art open set classification result on the complex TinyImageNet dataset.

  4. We demonstrate that replacing the cross-entropy loss with CAC during training maintains closed set performance, while significantly increasing open set performance.

  5. We show that CAC is not sensitive to the choice of its two hyperparameters.

2 Related Work

Open set classification was introduced to extend closed set classification to a more realistic scenario, where not all testing classes are assumed as known during training [19]. It was formalised as the task of minimising open space risk, the portion of classification space that is labelled as ‘known’, but lies far from any known training data [19]

. Related areas, such as out of distribution and novelty detection, exist in the literature as relaxed forms of open set classification where known and unknown classes are from different distributions

[8] or multi-class classification is not required [5]. For this work, we focus specifically on open set classification.

Early work on open set classification involved SVM-based approaches to model the known areas of a feature space [19, 20, 9]

or distance-based approaches, such as Nearest Neighbor, to identify outliers

[10]

. These approaches rely on feature engineering and therefore do not scale to complex image datasets. As deep learning gained momentum in the computer vision community, recent work has moved towards open set classification with CNNs.

OpenMax was one of the first CNN-based open set classifiers, using the network’s final layer’s logits, or logit space, as the classification space containing open space risk [2]. A Weibull distribution (often used in Extreme Value Theory [21]

) is used to model the distance from known class mean activation vectors to true positive training points

[2]. At test time, an input’s distance to each known class mean is combined with its softmax score to provide final class confidences [2]. We refer to this as a ‘distance-based’ approach, where a distance measure is used to decay the class confidences for inputs falling far from the known class means and thus minimise open space risk. Another early CNN-based open set classifier performed open set text classification by using a sigmoid final layer and class-specific Gaussian fitting [24], however, this work fails to consider unknown inputs that may have logit activations greater than the known training data.

Several following works employed ‘known unknown’ data to augment the training dataset, either using the data to improve the feature representation for distance-based measures [31, 7] or to bound the known classification space with an ‘other’ class [14, 22]. Both [31] and [14] utilise generative networks to synthesise realistic ‘known unknown’ data points. In [22], atypical training samples from the existing training data are treated as ‘known unknowns’, whereas [7] uses unrelated auxiliary datasets.

Other recent open set classifiers use combined classifier and autoencoder network architectures

[30, 17]. In [30], OpenMax (a distance-based approach) is applied to the classifier logit space and auto-encoder latent space, with the additional reconstruction-learnt features improving the overall feature representation. In contrast to previous work, [17] does not use a distance measure or bound open space risk with an ‘other’ class in the classification space. Instead, the reconstruction error from a class-conditioned autoencoder-classifier is used to distinguish between known and unknown inputs. While this work currently has state-of-the-art performance for open set image classification, it does not explicitly minimise open space risk. In addition, other works have observed that reconstruction error alone may not be suitable as a measure of class novelty [23, 6], as some inputs from unknown classes can be reconstructed with low error, and combination with a distance-based measure can improve performance [6].

Existing distance-based open set classifiers [3, 30, 31] apply distance-based measures to the logit space of networks trained with a cross-entropy loss. However, cross-entropy loss does not utilise a distance-based measure during training, and thus does not explicitly force training inputs to form tight, class-specific clusters in the logit space. Our work addresses this limitation by introducing an entirely distance-based loss function that encourages training inputs to cluster in the logit space.

In the field of metric learning, approaches exist that use distance-based measures for learning meaningful feature embeddings. Deep metric learning was recently applied to the open set classification task by [13], however only for fine-grained image classification, where classes are very similar. Metric learning approaches compute distances between individual instances of the training data, and the sampling technique used to achieve this can have a significant effect on the convergence speed and stability of the training minimum [29]

. Additionally, this sampling typically makes metric learning computationally intractable on large-scale datasets, such as CIFAR10, CIFAR100 or ImageNet

[18]. While recent work has adapted metric learning approaches for large-scale datasets, the classification accuracy is not comparable to training with cross-entropy loss [18], as is required for open set classification.

Center Loss [28]

was proposed as a distance-based loss to encourage clustering in a feature space for improved performance on face recognition tasks. The approach learns class centres and penalises the squared Euclidean distance from an input to its correct class centre

[28]. While this approach shares similar aims to our proposed loss, it must be used with cross-entropy loss for effective training [28], whereas our proposed loss is standalone. In addition, Center Loss is not formulated to be applied to the network’s final layer for clustering in the logit space. We attempted to adapt Center Loss to encourage clustering in the logit space, but found that learning the class centres and feature extractor simultaneously was too unstable (see supplementary details for more detailed information).

3 Class Anchor Clustering (CAC) for Open Set Classification

Our proposed method aims to produce an open set classifier trained for distance-based distinction of inputs by introducing a Class Anchor Clustering (CAC) loss. CAC can be applied to existing classification networks, with only slight modifications to architecture and training procedure. We will now detail our open set classifier architecture, training procedure and testing procedure.

3.1 Distance-based classifier architecture

Our proposed open set classifier has three main components:

  1. A network, , that reduces an input x to a vector of class logits

  2. A non-trainable parameter, , representing a set of class anchor points

  3. A new layer, , that outputs a vector of euclidean distances d between a given logit vector and the set of class anchors

The first component of the network can be any existing classifier with an N-dimension logit space, where N is the number of known classes. The formulation of the class anchor points will be detailed in the following section. The transformation of an input to a distance-based output from our network is as shown below, where denotes the Euclidean norm.

(1)
(2)

3.2 Training with a Distance-based Loss Function

During training, we wish to learn a logit space embedding where training inputs form tight, class-specific clusters. This should provide desirable behaviour for the two different uses of our open set classifier: distance-based separation of known and unknown inputs and distance-based classification of known inputs.

We introduce the concept of non-trainable class anchor points as a method of anchoring cluster centres for each class in the logit space. This is in contrast to other clustering losses, such as Center loss [28], which are required to learn a centre point for each class during training. For each known class , our network has a class anchor in the logit space. Given an N-dimensional logit space for N known classes, we place the anchor for each known class at a point along its class coordinate axis. This is equivalent to a scaled standard basis vector, or scaled one-hot vector, for each class. The magnitude of the anchored point, , is a hyperparameter of our method (explored in section 5.3). We summarise this below.

(3)
(4)
(5)

Given an input and its ground truth class , we penalise the euclidean distance between the training logit and the ground truth class anchor . We use this Anchor loss term to enforce tight clustering around the class anchors in the logit space.

(6)

Alone, this Anchor loss term does not explicitly force training inputs to maximise distance from other anchored class centres. This behaviour is important for achieving high classification accuracy, thus we require another loss term to further encourage discriminative learning. For this purpose, we use a modified Tuplet loss term [25], defined below. Tuplet loss attempts to maximise the margin of distance to the correct class anchor and all incorrect class anchors.

(7)

We combine the Anchor and Tuplet loss terms to form our final distance-based loss, which we refer to as the Class Anchor Clustering (CAC) loss. A final hyperparameter of our method is , which is used to balance these two individual loss terms (explored in section 5.3). By combining the Anchor and Tuplet loss terms, our loss encourages training inputs to minimise distance to their ground-truth class anchor, while maximising distance to other class anchors.

(8)

3.3 Using Distance-based Measures during Testing

After training is completed, we reconfigure the class anchors C with the mean of the correctly classified training data logits. This allows us to more accurately model the class cluster centres for complex datasets, where visual and semantic similarities between classes can cause clustering to diverge from the original class anchors. With the updated class anchors, a test input is passed through our network and distance layer to produce the distance vector . We pass this distance vector through a SoftMin function , which yields a distribution of known class confidences c, where smaller distances are assigned higher confidences.

(9)
(10)

While the SoftMin confidences provide meaningful information about the confidence of an input belonging to any given known class, it does not consider the possibility that an input belongs to an unknown class. The SoftMin function only considers the difference between the distances, rather than the absolute value of each distance. For example, a distance vector of [1, 4, 5] and a distance vector of [101, 104, 105] both yield the same SoftMin confidences of [0.94, 0.04, 0.02]. To overcome this, we multiply the class confidences with the original distance output to produce class ‘rejection’ scores , as shown below ( denotes the element-wise product).

(11)

We use these class rejection scores to identify unknown inputs and classify known inputs. If the minimum class rejection score, , for an input is below a defined rejection threshold, , it is classified as that known class. Otherwise, the input has high rejection scores for all known classes and is rejected as an unknown class. By using a distance-based rejection score, an input’s likelihood of rejection as ‘unknown’ increases as the input moves away from the training data. In this way, we minimise the open space risk [19] of our classifier.

This testing protocol is summarised as an algorithm in the supplementary material.

4 Experimental Setup

We follow the evaluation protocol defined in [14]. To evaluate open set classification performance, a number of standard closed set datasets are adapted to an open set task. This is achieved by randomly splitting the existing dataset classes into ‘known’ or ‘unknown’ classes, where only the known classes can be used during training and all classes are present during testing. Depending on the proportion of known and unknown classes, [19] defined the openness of the classification task as

(12)

where Ntrain is the number of classes during training, Ntarget is the number of classes requiring classification during testing and Ntest is the total number of classes during testing (known and unknown). In general, a higher openness indicates a more difficult problem setup, but other factors such as the input image size or the number of available training images per class influence the difficulty as well. For each dataset, we evaluate performance over 5 trials with random class splits.

Dataset Input Size Training Ims/Class # Known/Unknown Openness O
MNIST  6,000 6/4 13.4%
SVHN  7,326 6/4 13.4%
CIFAR10 5,000 6/4 13.4%
CIFAR+10 5,000 4/10 33.3%
CIFAR+50 5,000 4/50 62.9%
TinyImNet 500 20/180 57.4%
Table 1: Information about datasets. Note that MNIST and SVHN have different numbers of training images per class, we listed the average.

4.0.1 Datasets:

We evaluate on 6 different datasets as in [14]. We summarise the datasets in Table 1 and give a short description below.

  • MNIST [12] contains grayscale images of handwritten digits. It has 10 total classes, with an open set configuration of 6 known classes, 4 unknown classes (O = 13.39%).

  • SVHN [15] contains RGB images of street view house digits. It has 10 total classes, with an open set configuration of 6 known classes, 4 unknown classes (O = 13.39%).

  • CIFAR10 [11] consists of RGB images of animal and non-animal objects. It has 10 total classes, with an open set configuration of 6 known classes, 4 unknown classes (O = 13.39%).

  • CIFAR+10/+50 considers the 4 non-animal classes of CIFAR10 as known, and 10 and 50 randomly sampled animal classes from CIFAR100 [11] as unknown (O = 33.33% and 62.86%).

  • TinyImageNet [27] contains RGB images of animal and non-animal objects. It has a total of 200 classes, with an open set configuration of 20 known classes, 180 unknown classes (O = 57.35%). Images can contain significant background information unrelated to the object class, a number of classes are very visually and semantically related (e.g. different breeds of dogs), and there is high variation within individual classes. Some examples of this are provided in the supplementary material.

TinyImageNet is of particular importance and interest for our evaluation as it is the most difficult dataset for open set classification in this benchmark. With its limited number of only 500 training images per class, the comparatively large image size of , the high openness score of 57.35%, and the inclusion of visually and semantically very similar classes, it represents a very challenging dataset. As we will show, our approach achieves a new state-of-the-art result on TinyImageNet.

4.0.2 Metrics:

As established by [14], we use two different metrics to assess the performance of an open set classifier.

  • Classification Accuracy measures the accuracy of the classifier when applied only to the known classes in the dataset, as is done for closed set classification. An open set classifier should not degrade the classification accuracy achievable with a closed set classifier.

  • Area Under the ROC Curve (AUROC) is a calibration-free measure of the open set performance of a classifier. The Receiver Operating Characteristic (ROC) curve represents the trade-off between true positive rate (unknown inputs correctly rejected as ‘unknown’) and the false positive rate (known inputs incorrectly rejected as ‘unknown’) when applying varying thresholds to a given score. To compute the ROC curve for our open set classifier, we vary a threshold and compare it to the minimum class rejection score .

4.0.3 State-of-the-art Methods for Comparison:

We compare to a range of existing state-of-the-art open set classifiers. The core details of each method and the score used for open set identification are listed below.

  1. SoftMax [8]: The maximum class SoftMax score from a closed set classifier is used for open set identification.

  2. OpenMax [2]: A Weibull-calibrated distance measure, combined with the maximum class SoftMax score from a closed set classifier, is used for open set identification.

  3. Generative OpenMax (G-OpenMax) [31]: ‘Unknown’ samples are generated by a generative network and used to augment the training dataset. At test time, OpenMax is used as a distance-based measure for open set identification.

  4. Open Set Recognition with Counterfactual Images (OSRCI) [14]: ‘Unknown’ samples are generated by a generative network and used to augment the training dataset. The classifier is trained with an ‘unknown’ class. The difference between the ‘unknown’ class SoftMax score and maximum known class SoftMax score is used for open set identification.

  5. Classification-Reconstruction learning for Open Set Recognition (CROSR) [30]: An autoencoder and classifier are jointly trained for closed set classification and image reconstruction. OpenMax is applied to the logits and autoencoder latent space to produce a distance-based measure for open set identification.

  6. Class Conditioned Auto-Encoder (C2AE) [17]: An autoencoder is added to an existing closed set classifier and trained in a class-conditioned approach to reconstruct images. At test time, reconstruction error is used for open set identification.

4.1 Implementation Details

For fair comparison, we use the same network architecture as specified by the evaluation protocol in [14]

. During training, we use a Stochastic Gradient Descent (SGD) optimiser with a learning rate of 0.01 and train until convergence. For the more complex datasets (CIFAR10 and TinyImageNet), we then complete another training cycle with a lower learning rate of 0.001. More specific details about the training procedure and network architecture can be found in the supplementary material. For all datasets, we use an Anchor loss weight

of 0.1 and a logit anchor magnitude of 10.

5 Results and Discussion

Method MNIST SVHN CIFAR10 CIFAR+10/+50 TinyImNet
Softmax[8] 0.978 0.886 0.677 0.816/0.805 0.577
* OpenMax[3] 0.981 0.894 0.695 0.817/0.796 0.576
* G-OpenMax[31] 0.984 0.896 0.675 0.827/0.819 0.580
OSRCI[14] 0.988 0.910 0.699 0.838/0.827 0.586
* CROSR[30] 0.991 0.899 - - 0.589
C2AE[17] 0.989 0.922 0.895 0.955/0.937 0.748
* CAC (Ours) 0.985 0.938 0.803 0.863/0.872 0.772
Table 2: AUROC for state-of-the-art open set classifiers and our proposed approach. Best and second best performance are bolded and italicised respectively. Distance-based open set approaches are indicated by an asterisk (*). Values for each method have been taken from their published results.

5.1 Comparison with State-of-the-Art Open Set Classifiers

The performance of our open set classifier trained with CAC compared to the state-of-the-art methods is shown in Table 2.

For TinyImageNet and SVHN, we outperform all state-of-the-art open set classifiers, increasing AUROC by 2.4% and 1.9% compared to the next best method. TinyImageNet represents the most difficult dataset for open set classification in this benchmark. It has limited training data (500 images per class), the largest input dimensionality (3x64x64) and a high openness score (O = 57.35%). In addition to this, it has object classes that are very visually and semantically similar, e.g. five different breeds of dogs. As a result, our best performance on this dataset indicates the scalability of our approach to more complex open set classification settings.

5.1.1 Comparison to other distance-based approaches:

Compared to other open set classifiers using distance-based approaches (indicated by an asterisk * in Table 2), we achieve the best performance on TinyImageNet, CIFAR10, CIFAR+10, and CIFAR+50. By training with a distance-based loss, we improve the use of distance-based measures for open set classification by 18.3% on TinyImageNet, 12.8% on CIFAR10, and 3.6% and 4.5% on CIFAR+10/+50. This demonstrates the importance of training with a distance-based loss such as CAC when using distance-based measures to distinguish between known and unknown inputs.

5.1.2 Comparison to non-distance-based approaches:

Compared to non-distance-based open set classifiers, we achieve the best performance on TinyImageNet and SVHN, and come second to the class-conditioned auto-encoder (C2AE) approach [17] on CIFAR10 and CIFAR+10/+50. We also do not achieve best performance on MNIST, however performance for this simple toy dataset appears to be saturated, with even the naive baseline already obtaining a performance of 97.8%.

The lower performance on the CIFAR10 variations highlights a potential limitation of our approach on datasets with low numbers of known classes. The ability of a network to separate known and unknown inputs in the logit space depends on the quality of the feature representation it learns. When presented with a small number of known classes, such as only 4 for CIFAR+10/+50, the network may not be able to learn a rich enough feature representation to ensure that known and unknown inputs do not project to the same region in the logit space. Despite this, our approach has other advantages over the C2AE approach. C2AE uses the reconstruction error from a class-conditioned auto-encoder to distinguish between known and unknown inputs. Reconstruction error does not explicitly measure distance from the training data, and thus cannot guarantee that inputs far from the training data will be identified as unknown. In another work, [6] showed that autoencoders are able to reconstruct unknown inputs with low error. [23] also highlighted that reconstruction error can fail to distinguish known and unknown inputs, particularly for more complex datasets.

5.2 CAC Maintains Closed Set Classification Accuracy

Loss MNIST SVHN CIFAR10 CIFAR+10/+50 TinyImNet
Cross-Entropy 0.997 0.963 0.927 0.947 0.733
CAC (Ours) 0.998 0.970 0.934 0.952 0.759
Table 3: Classification accuracy of a standard closed set classifier trained with cross-entropy loss and our open set classifier trained with CAC loss. It is important than an open set approach at least maintains the closed set accuracy of a closed set classifier.

While the previous section discussed the performance of CAC at distinguishing between known and unknown inputs, we now show that a network trained with the proposed CAC loss maintains closed set performance compared to training with cross entropy loss. This is an important result, as it shows that there is not necessarily a trade-off between open set and closed set performance.

To obtain the results shown in Table 3, we trained the same network architecture with cross-entropy loss and with CAC loss. The open set datasets described in Section 4 were used for training. Testing was only performed on the known classes. For the closed set cross-entropy classifier, inputs were classified as the class with the maximum SoftMax confidence. For our open set CAC classifier, inputs were classified as the class with the minimum class rejection score. For all datasets, our CAC-trained network slightly improved upon the classification accuracy of the closed set cross-entropy network.

5.3 Analysis of Hyperparameters and Components of CAC Loss

In this section, we explore the sensitivity of CAC to the value of the two hyperparameters: the Anchor loss term weight and the class anchor magnitude . As shown in Figure 7, a wide range of CAC hyperparameter values achieve very similar results. With an Anchor loss weight and an anchor magnitude , both the classification accuracy and open set AUROC vary by 4% or less. Note that the results in Figure 7 were produced from training one network per hyperparameter configuration on TinyImageNet, and that the general trend should be considered rather than the absolute values.

(a) Classification Accuracy
(b) Open set AUROC
Figure 7: Effect of CAC hyperparameters on the performance of our open set classifier. Data is generated from 1 trial with a random split of known/unknown classes for TinyImageNet.

We also explore how the two components of CAC loss, Anchor loss and Tuplet loss terms, affect the performance of an open set classifier. As shown in Figure 10, the Anchor loss term alone is able to achieve equivalent closed set classification accuracy to our final CAC loss. However, for open set performance, neither the Anchor loss term nor the Tuplet loss term in isolation achieve the same performance as the CAC loss. This validates that both loss terms are important for distance-based open set classification.

(a) Closed Set Classification Accuracy
(b) Open set AUROC
Figure 10: Effect of the two components of CAC loss on performance. Data was generated from 5 trials with a random splits of known/unknown classes for TinyImageNet.

5.4 Understandable Errors with Our CAC Open Set Classifier

Figure 11: If an unknown image was falsely recognised as known

, which known class was it classified as? A network trained with CAC tends to make interpretable mistakes. For example, of all car images that were falsely recognised as known, 83% were classified as truck. The confusion matrix shows high correlation between semantically similar classes. Data was generated from 5 trials with random splits of known/unknown classes for CIFAR10.

For many practical applications, graceful performance degradation is important. In the case of open set classification, this would mean that when the system misclassifies an unknown as a known class, it should be somewhat comprehensible. Ideally, the system would confuse unknown classes only with semantically close known classes.

In Figure 11, we analyse the open set errors of made by a network trained with CAC for CIFAR10. As can be seen in the figure, the network’s mistakes are understandable. Whenever an unknown class is mistaken as a known class, it is most likely to be misclassified as a known classes that is visually or semantically similar. For example, of all images of the unknown ‘car’ class that were mistaken as a known input, 83% were classified as known class ‘truck’. Unknown animal classes are predominantly confused with known animal classes, rather than non-animal classes, and vice-versa. This is a desirable property for many applications such as autonomous systems or robotics. In Figure 14, we present some additional qualitative examples of success and failure cases on TinyImageNet.

(a) Correctly Rejected Unknowns
(b) Failure cases
Figure 14: Examples of unknown images from TinyImageNet. In (a), our classifier trained with CAC correctly rejects the bannister classified as a picket fence with a rejection score of 6.17 and the jellyfish classified as an umbrella with a rejection score of 5.72. In (b), our open set classifier fails on difficult examples: the lion is classified as a brown bear with a low rejection score of 0.036 and the egyptian cat is classified as a tabby cat with a rejection score of 0.004.

6 Conclusions

The applications of deep neural network models under open set conditions remains an important and difficult challenge for computer vision. Our paper provided empirical evidence that using a distance-based loss function during training is beneficial for distance-based open set classifiers. Many existing distance-based open set classifiers train only with cross-entropy, but then hope to exploit clustering and separation behaviour in the logit space during open set testing, applying distance-based known/unknown recognition methods in a post-hoc manner. We specifically highlight the importance of training with a distance-based loss instead. To this end, we introduced the Class Anchor Clustering (CAC) loss. CAC explicitly drives the network to learn a mapping from input to logit space that results in tight, well-separated, class-specific clusters. This property in return enables using a distance-based decision function to distinguish known and unknown inputs, without sacrificing closed set classification performance. We demonstrated that a CAC-trained network achieves a new state-of-the-art performance on the most complex dataset in the benchmark, TinyImageNet.

References

  • [1] D. Amodei, C. Olah, J. Steinhardt, P. Christiano, J. Schulman, and D. Mané (2016) Concrete problems in ai safety. arXiv preprint arXiv:1606.06565. Cited by: §1.
  • [2] A. Bendale and T. E. Boult (2016) Towards open set deep networks. In

    Proceedings of the IEEE conference on computer vision and pattern recognition

    ,
    pp. 1563–1572. Cited by: §2, item 2.
  • [3] A. Bendale and T. Boult (2015) Towards open world recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1893–1902. Cited by: Figure 4, §1, §1, §2, Table 2.
  • [4] H. Blum, P. Sarlin, J. Nieto, R. Siegwart, and C. Cadena (2019) The fishyscapes benchmark: measuring blind spots in semantic segmentation. arXiv preprint arXiv:1904.03215. Cited by: §1.
  • [5] T. Boult, S. Cruz, A. Dhamija, M. Gunther, J. Henrydoss, and W. Scheirer (2019) Learning and the unknown: surveying steps toward open world recognition. In

    Proceedings of the AAAI Conference on Artificial Intelligence

    ,
    Vol. 33, pp. 9801–9807. Cited by: §2.
  • [6] T. Denouden, R. Salay, K. Czarnecki, V. Abdelzad, B. Phan, and S. Vernekar (2018) Improving reconstruction autoencoder out-of-distribution detection with mahalanobis distance. arXiv preprint arXiv:1812.02765. Cited by: §2, §5.1.2.
  • [7] A. R. Dhamija, M. Günther, and T. Boult (2018) Reducing network agnostophobia. In Advances in Neural Information Processing Systems (NeurIPS, pp. 9157–9168. Cited by: §2.
  • [8] D. Hendrycks and K. Gimpel (2017) A baseline for detecting misclassified and out-of-distribution examples in neural networks. In

    International Conference on Machine Learning (ICML)

    ,
    Cited by: §1, §2, item 1, Table 2.
  • [9] L. P. Jain, W. J. Scheirer, and T. E. Boult (2014)

    Multi-class open set recognition using probability of inclusion

    .
    In European Conference on Computer Vision, pp. 393–409. Cited by: §2.
  • [10] P. R. M. Júnior, R. M. De Souza, R. d. O. Werneck, B. V. Stein, D. V. Pazinato, W. R. de Almeida, O. A. Penatti, R. d. S. Torres, and A. Rocha (2017) Nearest neighbors distance ratio open-set classifier. Machine Learning 106 (3), pp. 359–386. Cited by: §2.
  • [11] A. Krizhevsky, G. Hinton, et al. (2009) Learning multiple layers of features from tiny images. Cited by: 3rd item, 4th item.
  • [12] Y. LeCun, C. Cortes, and C. Burges (2010) MNIST handwritten digit database. Cited by: 1st item.
  • [13] B. J. Meyer and T. Drummond (2019)

    The importance of metric learning for robotic vision: open set recognition and active learning

    .
    In 2019 International Conference on Robotics and Automation (ICRA), pp. 2924–2931. Cited by: §2.
  • [14] L. Neal, M. Olson, X. Fern, W. Wong, and F. Li (2018) Open set learning with counterfactual images. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 613–628. Cited by: §0.A.4, §2, item 4, §4.0.1, §4.0.2, §4.1, §4, Table 2.
  • [15] Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng (2011) Reading digits in natural images with unsupervised feature learning. Cited by: 2nd item.
  • [16] A. Nguyen, J. Yosinski, and J. Clune (2015) Deep neural networks are easily fooled: high confidence predictions for unrecognizable images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 427–436. Cited by: §1.
  • [17] P. Oza and V. M. Patel (2019) C2AE: class conditioned auto-encoder for open-set recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2307–2316. Cited by: §2, item 6, §5.1.2, Table 2.
  • [18] Q. Qian, J. Tang, H. Li, S. Zhu, and R. Jin (2018) Large-scale distance metric learning with uncertainty. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8542–8550. Cited by: §2.
  • [19] W. J. Scheirer, A. de Rezende Rocha, A. Sapkota, and T. E. Boult (2013) Toward open set recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (7), pp. 1757–1772. Cited by: §1, §1, §2, §2, §3.3, §4.
  • [20] W. J. Scheirer, L. P. Jain, and T. E. Boult (2014) Probability models for open set recognition. IEEE transactions on pattern analysis and machine intelligence 36 (11), pp. 2317–2324. Cited by: §2.
  • [21] W. J. Scheirer, A. Rocha, R. J. Micheals, and T. E. Boult (2011) Meta-recognition: the theory and practice of recognition score analysis. IEEE transactions on pattern analysis and machine intelligence 33 (8), pp. 1689–1695. Cited by: §2.
  • [22] P. Schlachter, Y. Liao, and B. Yang (2019) Open-set recognition using intra-class splitting. In 2019 27th European Signal Processing Conference (EUSIPCO), pp. 1–5. Cited by: §2.
  • [23] A. Shafaei, M. Schmidt, and J. J. Little (2018) A less biased evaluation of out-of-distribution sample detectors. arXiv preprint arXiv:1809.04729. Cited by: §2, §5.1.2.
  • [24] L. Shu, H. Xu, and B. Liu (2017) Doc: deep open classification of text documents. arXiv preprint arXiv:1709.08716. Cited by: §2.
  • [25] K. Sohn (2016) Improved deep metric learning with multi-class n-pair loss objective. In Advances in neural information processing systems, pp. 1857–1865. Cited by: §3.2.
  • [26] N. Sünderhauf, O. Brock, W. Scheirer, R. Hadsell, D. Fox, J. Leitner, B. Upcroft, P. Abbeel, W. Burgard, M. Milford, et al. (2018) The limits and potentials of deep learning for robotics. The International Journal of Robotics Research 37 (4-5), pp. 405–420. Cited by: §1.
  • [27] TinyImageNet Tiny ImageNet Visual Recognition Challenge.. Note: https://tiny-imagenet.herokuapp.com/, Accessed: 2020-03-01 Cited by: 5th item.
  • [28] Y. Wen, K. Zhang, Z. Li, and Y. Qiao (2016) A discriminative feature learning approach for deep face recognition. In European conference on computer vision, pp. 499–515. Cited by: §0.A.1, §0.A.1, §2, §3.2.
  • [29] C. Wu, R. Manmatha, A. J. Smola, and P. Krahenbuhl (2017) Sampling matters in deep embedding learning. In Proceedings of the IEEE International Conference on Computer Vision, pp. 2840–2848. Cited by: §2.
  • [30] R. Yoshihashi, W. Shao, R. Kawakami, S. You, M. Iida, and T. Naemura (2019) Classification-reconstruction learning for open-set recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4016–4025. Cited by: Figure 4, §1, §2, §2, item 5, Table 2.
  • [31] S. D. Zongyuan Ge and R. Garnavi (2017) Generative openmax for multi-class open set classification. In Proceedings of the British Machine Vision Conference (BMVC), pp. 42.1–42.12. Cited by: Figure 4, §1, §2, §2, item 3, Table 2.

Appendix 0.A Appendices

0.a.1 Experiments with Center loss

To implement Center loss [28]

, we used the public pytorch implementation available at:

https://github.com/KaiyangZhou/pytorch-center-loss

We followed the instructions for applying Center loss to our own project. Contrary to the original paper [28], we applied Center loss to the final logit layer, the same layer that cross-entropy acts upon; in contrast, the original paper applies Center loss to the penultimate layer.

Center loss has two hyperparameters, the learning rate for the centres, , and the weighting of Center loss with cross-entropy loss, . We tested with the recommended learning rate of and tried a range of weighting values of

. Even when training on the simple dataset MNIST, within the first epoch the training destabilises and produces a ‘NaN’ loss. We theorise that applying both cross-entropy and Center loss to the final layer, while learning

from scratch, is too unstable for training. In contrast, our distance-base loss has been designed to be applied to the final layer of the network.

0.a.2 CAC open set classifier testing protocol

We summarise the testing protocol for a CAC-trained open set classifier in Algorithm 1.

1:
2:f and e, the base encoder and distance layer of our CAC-trained network
3:The logit vectors of correctly classified training images for each class,
4:The test images,
5:SoftMin function, S
6:Rejection score threshold,
7:for  do
8:     Initialise class anchor as
9:end for
10:for test image  do
11:     Test image through network for distance output
12:     Apply SoftMin for
13:     Calculate class rejection scores
14:     
15:     
16:     if   then
17:         Image belongs to known class
18:     else
19:         Image belongs to an unknown class
20:     end if
21:end for
Algorithm 1 Testing a distance-based open set classifier trained with CAC

When forming the variable , a training image is considered correctly classified if the maximum logit in the logit vector corresponds to the ground truth class.

0.a.3 TinyImageNet Examples

TinyImageNet is the most difficult dataset in the open set classification benchmark used in this paper. Images can contain significant background information that is unrelated to the object class and there can be huge visual feature variations within a single class (see Figure 17). This can make learning difficult and can cause classes which typically would not be considered related to present similar features, e.g. both class ‘Umbrella’ and ‘Nail’ can feature humans in the images. In addition to this, there are a number of classes that are visually and semantically related, such as the six different breeds of dogs shown in Figure 18. This can be particularly difficult in open set classification when some dog breeds are known classes and other dog breeds are unknown classes.

(a) ‘Nail’ Class
(b) ‘Umbrella’ Class
Figure 17: Examples of the large intra-class variations present in TinyImageNet. While ‘nail’ and ‘umbrella’ might typically be considered unrelated classes, they can contain humans in the background.
Figure 18: Images from six different, but visually and semantically related, classes in Tiny ImageNet: chihuahua, german shepherd, golden retriever, labrador retriever, standard poodle and yorkshire terrier.

0.a.4 Training and network architecture details

The learning rate, number of training epochs and network dropout rate for each dataset is shown in Table 4. For CIFAR* (CIFAR10, CIFAR+10 and CIFAR+50) and TinyImageNet, we first trained with learning rate 0.01 for 150/500 epochs and then continued training with a lower learning rate of 0.001 for 50/300 epochs respectively.

Dataset Learning Rate Epochs Dropout Rate
MNIST 0.01 35 0.2
SVHN 0.01 50 0.2
CIFAR* 0.01 150 0.2
0.001 50
TinyImageNet 0.01 500 0.3
0.001 300
Table 4: Training details for each dataset.

The base network architecture f was consistent with the architecture established by [14]. It is detailed in the code submitted, which will also be made public.