Hybrid Models for Open Set Recognition

03/27/2020 ∙ by Hongjie Zhang, et al. ∙ Google Nanjing University 3

Open set recognition requires a classifier to detect samples not belonging to any of the classes in its training set. Existing methods fit a probability distribution to the training samples on their embedding space and detect outliers according to this distribution. The embedding space is often obtained from a discriminative classifier. However, such discriminative representation focuses only on known classes, which may not be critical for distinguishing the unknown classes. We argue that the representation space should be jointly learned from the inlier classifier and the density estimator (served as an outlier detector). We propose the OpenHybrid framework, which is composed of an encoder to encode the input data into a joint embedding space, a classifier to classify samples to inlier classes, and a flow-based density estimator to detect whether a sample belongs to the unknown category. A typical problem of existing flow-based models is that they may assign a higher likelihood to outliers. However, we empirically observe that such an issue does not occur in our experiments when learning a joint representation for discriminative and generative components. Experiments on standard open set benchmarks also reveal that an end-to-end trained OpenHybrid model significantly outperforms state-of-the-art methods and flow-based baselines.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Image classification is a core problem in computer vision. However, most of the existing research is based on the closed-set assumption,


, training set is assumed to cover all classes that appear in the test set. This is an unrealistic assumption. Even with a large-scale image dataset, such as ImageNet

[14], it is impossible to cover all scenarios in the real world. When a closed-set model encounters an out-of-distribution sample, it is forced to identify it as a known class, which can cause issues in many real-world applications. We instead study the “open-set” problem where the test set is assumed to contain both known and unknown classes. So the model has to classify samples into either known (inlier) classes or the unknown (outlier) category. Figure 1 illustrates the difference of classification decision boundaries under open set and closed set assumptions.

Figure 1: Decision boundaries of a closed set classifier (a) and an open set classifier (b). Green symbols indicate known samples (different shapes represent different classes), and orange question marks indicate unknown samples. Dashed lines indicate the decision boundaries. (a) Closed set leads to unbounded decision boundaries of a typical 4-class classifier. Unknown samples are forced to be classified into one of known classes. (b) open set results in bounded decision boundaries for a 5-class classifier, which can classify both known and unknown samples.

Identifying unknown samples is naturally challenging because they are not observed during training. Existing approaches fit a probability distribution of the training samples at their embedding space, and detect unknown samples according to such distribution. Since the feature representation of unknown classes is unknown, most of the methods operate on a discriminative feature space obtained from a supervised classifier trained on known classes. A thresholding on this probability distribution is then used to detect samples from unknown classes. A common approach along this direction is to threshold on SoftMax responses, but [2] has conducted experiments to show that it reachs only sub-optimal solutions to open set recognition. Some variants have been proposed to better utilize the SoftMax scores [6, 20, 20, 30]

. These methods modify the SoftMax scores to perform both unknown detection while maintaining its classification accuracy. It is extremely challenging to find a single score measure on the SoftMax layer, that can perform well on both the generative and discriminative tasks. We believe the discriminative feature space learned by classification of inlier classes may not be sufficiently effective for identifying outlier classes. So we propose to employ a flow-based generative model for outlier detection, and learn a joint feature space in an end-to-end manner from both the classifier and the density estimator.

Flow-based models have recently emerged [4, 5, 12, 1, 3]

, allowing a neural network to be invertible. They can fit a probability distribution to training samples in an unsupervised manner via maximum likelihood estimation. The flow models can predict the probability density of each example. When the probability density of an input sample is large, it is likely to be part of the training distribution (known classes). And the outlier samples (unknown class) usually have a small probability density value. The advantage of flow-based models is that they do not require the intervention of a classifier when fitting a probability distribution, and one can directly apply a thresholding model on these probability values without modifying the scores of any known classes.

Flow-based models have been adopted to solve out-of-distribution detection [18, 17, 9], but have not yet been considered the open set recognition problem. Most related to our approach, [18] proposed a deep invertible generalized linear model (DIGLM), which is comprised of a generalized linear model (GLM) stacked on top of flow-based model. They use the model’s natural rejection rule based on the probability generated by flow-based model to detect unknown inputs, and directly classify known samples with the features used to fit the probability distribution. Our work differs in that instead of adding a classifier on top of flow model’s embedding, we propose to learn a joint embedding for both the flow model and the classifier. Our insight is that the embedding space learned from only flow-based model may not have sufficient discriminative expressiveness.

We empirically observe in our experiments that learning a joint embedding space resolves a common issue in flow-based model that the flow-based model may assign higher likelihood to OOD inputs (mentioned in [9, 24, 17]). This issue was considered in [11], the underlying factor of which is believed to be to the inconsistency between a uni-modal prior distribution and a multi-modal data distribution. In our framework, the deep network can well represent the multi-modal distribution of the input data, which is probably the reason for the improved performance of flow models.

We perform extensive experiments on various benchmarks including MNIST, SVHN, CIFAR10 and TinyImageNet. The proposed OpenHybrid model outperforms both state-of-the-art methods [2, 6, 20, 22, 33] and hybrid model baselines [18, 9] in these benchmarks. We further compare our method with an additional baseline which uses a pre-trained encoder and the result suggests the importance of jointly training both the classifier and the flow-based model.

1.0.1 Contribution.

The contribution of this paper can be summarized as follows:

  1. To the best of our knowledge, we are the first to incorporate a generative flow-based model with a discriminative classifier to address the open set recognition problem, while most of the existing open set approaches focus on either using the softmax logits or adversarial training.

  2. We propose a hybrid model (called OpenHybrid) that learns a joint representation between the classifier and flow density estimator. Our approach ensures that the inlier classification is not affected by outlier detection. We find that joint training is an important contributing factor, according to a comparison with a baseline using a separated training strategy.

  3. A known issue of flow-based models and their hybrid forms is that they may assign higher likelihood to unknown inputs. We observe that such phenomenon does not occur when using OpenHybrid. A possible reason is that the deep neural encoder fits the multi-modal input distribution to a latent space which is more suitable to the unimodal assumption of flow models.

  4. We conducted extensive experiments on various open set image classification datasets and compared our approach against state-of-the-art open set methods and flow-based baseline models. Our approach achieves significant improvement over these baseline methods.

2 Related Work

2.1 Open Set Recognition

Open set recognition has been surprisingly overlooked, though it has more practical value than the common closed set setting. The few investigated models on this topic can be broadly classified into two categories: discriminative model and generative model.

Discriminative methods.

Before the deep learning era, most of the approaches

[29, 28, 10, 34]

are based on traditional classification models such as Support Vector Machines (SVMs), Nearest Neighbors, Sparse Representation, etc. These methods usually do not scale well without careful feature engineering.Recently, deep learning based models have shown more appealing results. The first among them is probably

[2], which introduced Weibull-based calibration to augment the SoftMax layer of a deep network, called OpenMax. Since then, the OpenMax is further developed in [25, 6]. [33] presented the classification-reconstruction learning algorithm for open set recognition (CROSR), which utilizes latent representations for reconstruction and enables robust unknown detection without harming the known classification accuracy. [22] proposed the C2AE model for open set recognition, using class conditioned auto-encoders with novel training and testing methodology. Several methods [31, 30] have also been proposed to apply open set models to text classification.

Generative methods. Unlike discriminative models, generative approaches generate unknown samples based on Generative Adversarial Network (GAN) [8] to help the classifier learn decision boundary between known and unknown samples. [6] proposed the Generative OpenMax (G-OpenMax) algorithm, which uses a conditional GAN to synthesize mixtures of known classes and finetune the closed-set classification model. G-OpenMax improves the performance of both SoftMax and OpenMax based deep network. Although G-OpenMax effectively detects unknowns in monochrome digit datasets, it fails to produce significant performance improvement on natural images. Different from G-OpenMax, [20] introduced a novel dataset augmentation technique, called counterfactual image generation (OSRCI). OSRCI adopts an encoder-decoder GAN architecture to generate the synthetic open set examples which are close to knowns. They further reformulated the open set problem as classification with one additional class containing those newly generated samples. GAN-based methods also have been used to solve open set domain adaptation problem recently [35, 27].

Out-of-distribution detection. The open set recognition is naturally related to some other problem settings such as out-of-distribution detection [32], outlier detection [26]

, and novelty detection

[23], etc. They can be incorporated in the concept of open set classification as an unknown detector. However, they do not require open-set classifiers because those models does not have discriminative power within known classes. We focus in this paper on the broader open set recognition problem.

2.2 Flow-Based Methods

Flow-Based (also called invertible) models have shown promises in density estimation. The original representative models are NICE [4], RealNVP [5] and Glow [12]. The design ideas of these flow-based models are similar. Through the ingenious design, the inverse transformation of each layer of the model is relatively simple, and the Jacobian matrix is a triangular matrix, so the Jacobian determinant is easy to be calculated. Such models are elegant in theory, but there exists an issue in practice, i.e., the nonlinear transformation ability of each layer becomes weak. Apart from these flow-based models, [1] proposed an Invertible Residual Network (I-ResNet), which adds some constraints to the ordinary ResNet structure to make the model invertible. The I-ResNet model still retains the basic structure of a ResNet and most of its original fitting ability. So previous experience in ResNet design can basically be re-used. Unfortunately, the density evaluation requires computing an infinite series. The choice of a fixed truncation estimator used by [1] leads to substantial bias which is tightly coupled with the expressiveness of the network. It cannot be used to perform maximum likelihood because the bias is introduced in the objective and gradients. [3]

improved I-ResNet, and introduced the Residual Flows, a flow-based generative model that produces an unbiased estimate of the log density. Residual Flows allows memory-efficient backpropagation through the log density computation. This allows model to use expressive architectures and train via maximum likelihood in many tasks, such as classification, density estimation and generation, etc. Our work differs from existing flow-based models in that we explicitly address a broader open-set problem, where the flow model is a sub-component.

2.3 Flow-Based Methods for Out-of-Distribution Detection

Flow based models have been applied to out-of-distribution (OOD) detection, which is relevant to open set problem. Nalisnick et al. [18] presented a neural hybrid model created by combining deep invertible features and GLMs to filter out-of-distribution (OOD) inputs, using the model’s natural “reject” rule based on the density estimation of the flow-based component. However, this rejection rule is not guaranteed to work in all settings. The main reason is that deep generative models can assign higher likelihood to OOD inputs. Nalisnick et al. [17] find that the density learned by flow-based models cannot distinguish images of common objects such as dogs, trucks, and horses (i.e. CIFAR-10) from those of house numbers (i.e. SVHN), assigning a higher likelihood to the latter when the model is trained on the former. [24] also observed that likelihood learned from deep generative models can be confounded by background statistics (e.g. OOD input with the same background but different semantic component). [9]

proposed a simple technique that uses out-of-distribution samples to teach a network heuristics to detect out-of-distribution examples, namely Outlier Exposure (OE). But this improvement is limited and sensitive to the selection of OE dataset.

[11] showed that a factor underlying this phenomenon is a mismatch between the nature of the prior distribution and that of the data distribution. They proposed the use of a mixture distribution as a prior to make likelihoods assigned by deep generative models sensitive to out-of-distribution inputs. [19] explained the phenomenon through typicality and proposed a typicality test based on batches of inputs which solves many of the failure modes. While we also follow the same hybrid modeling direction, our work differs from [18] in that we choose to share a common visual representation for both the classifier and the flow model and [18] uses the output of the flow model as the input to the classifier. It is observed from our experiments that the proposed representation sharing approach is effective in our setup.

Figure 2: Proposed architecture for open set recognition. During the training phase (left), images are mapped into a latent feature space by the encoder, then the encoded features are fed into two branches for learning: One is typical classification learning with a classifier via cross entropy loss, and the other is density estimation with a flow-based model via its log likelihood. The whole architecture is trained in an end-to-end manner. In testing phase (right), the of each image is computed and then compared with the minimum taken over the training set. If it is greater than the threshold , it is sent to the classifier to identify its specific known class, otherwise it is rejected as an unknown sample.

3 Our Approach

We start this section by defining the open set problem and introducing the notations. Following this is an overview of our proposed approach which we call “OpenHybrid”. After an explanation to details of each module, we introduce how to achieve open set recognition using OpenHybrid.

3.1 Problem Statement and Notation

For open set recognition, given a labeled training set of instances and their corresponding labels where is the number of known classes, is the total number of instances and is the dimension of each instance, we learn a model such that the model accurately classify an unseen instance (in test set, not in ) to one of the classes or an unknown class (or the “none of the above” class) indexed using .

3.2 Overview

Figure 2 overviews the training and testing procedures for the proposed method. The OpenHybrid framework consists of three modules: an encoder for learning latent representations with parameters , a classifier for classifying known classes with parameters , and a flow-based module for density estimation with parameters . Existing flow-based models and their hybrid variants, which directly feed as input the original image data into the flow-based model for density estimation. Different from these works, our OpenHybrid framework directly uses the latent representation (the output of encoder ) as the input to the flow model . The reason for this is that density estimation directly on the original image is susceptible to the population level background statistics (e.g., in MNIST, the background pixels that account for most of the image are similar), which makes it hard to detect unknown samples via exact marginal likelihood. Even in some settings with different backgrounds, unknown samples are assigned higher likelihoods than known samples, and this behavior still exists and has not been explained so far. We propose to estimate the density of latent representations instead of the original input. We find our method to be effective in all of our experimental benchmarks and we do not observe the “higher outlier likelihood” issue using such framework.

For classification, the classifier is directly connected to the output of the encoder instead of the output of the invertible transformation . We choose to remove the dependency of the classifier on the flow model because we believe the output of the invertible transform loses the discriminative power. We find this approach allows both the detection of unknown classes and the classification of known classes are effective.

3.3 Training

We define the training loss function in this section.

3.3.1 Classification Loss.

Given images in a batch and their corresponding labels . Here is the batch size and . The encoder and the classifier are trained using the following cross entropy loss,


where is an indicator function for label , and is the probability of the class from the probability score vector predicted by .

3.3.2 Density Estimation Loss.

For unknown detection, unlike general open set methods, flow-based model directly fit the distribution of the training set, and compute the probability of each training sample from the training distribution (also can be treated as the distribution of known classes) through the maximum likelihood estimation, then using the model’s natural reject rule based on to filter unknown inputs. Although this is intuitively feasible, there are still problems as mentioned above. We suspect the problems come from the difficulty of flow models representing the original input space. So we perform density estimation with learned latent representations, instead of the original images.

Flow-based model are the first key building block in our approach. These are simply high-capacity, bijective transformations with a tractable Jacobian matrix and inverse. The bijective nature of these transforms is crucial as it allows us to employ the change-of-variables formula for exact density evaluation:


A simple base distribution such as a standard normal distribution is often used for

. Tractable evaluation of Equation 2 allows flow-based models to be trained using the maximum likelihood with the loss function:


In training, we map the loss to bits per dimension results by normalizing the loss by the dimensionality of the flow input. In our OpenHybrid framework, there are multiple choices for the flow-based module. Considering the stability of the density estimation, we use a tractable unbiased estimate of the log density, called residual flow [3].

3.3.3 Full Loss.

The complete loss function of our method is:


where is a scaling factor on the contribution of . In all of our experiments in this paper, we empirically set it to 1.

3.4 Inference

3.4.1 Outlier Threshold.

At test time, we use the probability density estimated by flow-based module to detect unknown samples from probability distributions. This value corresponds to the probability of a sample being generated from the distribution of the training classes (known classes). Theoretically, the minimum boundary of this probability distribution in the training set is the maximum value of the outlier threshold. We assume that the known samples of the training set and the test set are from the same domain, then the outlier threshold is calculated as:


where is a free parameter providing slack in the margin. We estimate the outlier threshold using training samples without data augmentation.

3.4.2 Open Set Recognition.

Open set recognition is a classification over class labels, where the first labels are from the known classes the classifier is trained on, and the -st label represents the unknown class that signifies that an instance does not belong to any of the known classes. This is performed using the outlier score in Equation 5 and the score estimated in Equation 2. The outlier threshold is first calculated on training data. If the estimated probability is smaller than outlier threshold, the test instance is classified as , which in our case corresponds to the unknown class, otherwise the appropriate class label is assigned to the instance from among the known classes. More formally, the prediction of a sample is define as


4 Experiments

We evaluate our OpenHybrid framework and compare it with the state-of-the-art non-flow-based and flow-based open set methods. We follow other methods’ protocols for fair comparisons. That is, we compare with non-flow-based open set methods without considering operating threshold while we set an unified threshold value during the comparison with flow-based methods.

4.1 Implementation

In our experiments, the encoder, decoder, and classifier architectures are similar to those used in [20]. The last layer of encoder in [20] maps 512d to 100d. We moved this layer in our model to the classifier since we do not want the input dimension of flow model to be too small. So the output of our encoder is 512d instead. For flow-based model, we use the standard setup of passing the data through a logit transform [5], followed by residual blocks. We use activation normalization [12]

before and after every residual block. Each residual connection consists of 6 layers (

i.e., LipSwish InducedNormLinear LipSwish InducedNormLinear LipSwish InducedNormLinear) with hidden dimensions of 256 (the first 6 blocks) and 128 (the next 4 blocks) [18]

. We use the Adam optimizer with a learning rate 0.0001 for the encoder and flow-based module to learn log probability distribution. For training classification, we use the Stochastic Gradient Descent (SGD) with momentum 0.9 and learning rate 0.01 for digits data, 0.1 for natural data. The parameter

is empirically set to 80. Another important factor affecting open-set performance is openness of the problem. we define the openness based on the ratio of the numbers of unique classes in training and test sets, i.e., where and are the number of classes in the training set and the test set, respectively. In following experiments, we will evaluate performance over multiple openness values depending on different dataset settings.

4.2 Datasets

We evaluate open set classification performance using multiple common benchmarks, such as MNIST [16], SVHN [21], CIFAR10 [13], CIFAR+10, CIFAR+50 and TinyImageNet [15] datasets. We briefly describe these datasets below.

  • MNIST, SVHN, CIFAR10: All three datasets contain 10 categories. MNIST are monochrome images with hand-written digits, and it has 60,000 2828 gray images for training and 10,000 for testing. SVHN are street view house numbers, consisting of ten digit classes each with between 9981 and 11379 3232 color images. To validate our method on non-digital images, we apply the CIFAR10 dataset, which has 50,000 3232 natural color images for training and 10,000 for testing. Each dataset is partitioned at random into 6 known and 4 unknown classes. so in these settings, the openness score is fixed to 22.54%.

  • CIFAR+10, CIFAR+50: To test the method in a range of greater openness scores, we perform CIFAR+U experiments using the CIFAR10 and CIFAR100 [13]. 4 known classes are sampled from CIFAR10 and U unknown classes are drawn randomly from the more diverse CIFAR100 dataset. The openness scores of CIFAR+10 and CIFAR+50 are 46.54% and 72.78% respectively.

  • TinyImageNet: For the larger TinyImagenet dataset, which is a 200-class subset of ImageNet, we randomly sampled 20 classes as known and the remaining classes as unknown. In this setting, the openness score is 68.37%.

4.3 Metrics

Open set classification performance can be characterized by F-score or AUROC (Area Under the ROC Curve)

[7]. Open set recognition methods usually require thresholds and their performance may be sensitive to the thresholds. The non-flow-based open set methods often have different ways of thresholding so we mainly use AUROC to compare with these methods as its sensitivity is varied from zero recall (in this case, no input is labeled as open set) to complete recall (all inputs labeled as open set). For comparison with flow-based open set methods, thresholds to detect unknown samples are all selected from the probability distribution of the same flow module, so we use F-score to evaluate their performance. For both metrics, higher values correspond to better performance.

4.4 Comparison with Non-flow-based Methods

We compare OpenHybrid against the following non-flow-based baseline approaches:

  1. SoftMax: A standard confidence-based method for open-set recognition by using SoftMax score of a predicted class.

  2. OpenMax [2]: This approach augments the baseline classifier with a new OpenMax layer replacing the SoftMax at the final layer of the network.

  3. G-OpenMax [6]: A direct extension of OpenMax method, which trains networks with synthesized unknown data by using a Conditional GAN.

  4. OSRCI [20]: An improved version of G-OpenMax work, which uses a specific data augmentation technique called counterfactual image generation to train the classifier for the -st class.

  5. C2AE [22]: This approach uses class conditioned auto-encoders with novel training and testing methodologies for open set recognition.

  6. CROSR [33]: A deep open set classifier augmented by latent representation learning which jointly classifies and reconstructs the input data.

Method MNIST SVHN CIFAR10 CIFAR+10 CIFAR+50 TinyImageNet
SoftMax 0.978 0.886 0.677 0.816 0.805 0.577
OpenMax [2] 0.981 0.894 0.695 0.817 0.796 0.576
G-OpenMax [6] 0.984 0.896 0.675 0.827 0.819 0.580
OSRCI [20] 0.988 0.910 0.699 0.838 0.827 0.586
C2AE [22] 0.989 0.922 0.895 0.955 0.937 0.748
CROSR [33] 0.991 0.899 0.883 0.912 0.905 0.589
OpenHybrid (ours) 0.995 0.947 0.950 0.962 0.955 0.793
Table 1: AUROC for comparisons of our method with recent open set methods. Results averaged over 5 random class partitions. The best results are highlighted in bold.

Considering the evaluation protocol defined in [20]

, we use the AUROC as the evaluation metric. AUROC provides a calibration-free metric and characterizes the performance of a given score by changing the threshold. Following

[20], we report the average AUROC of 5 randomized trials. The precise details of classes of in-distribution and OOD are the same as those of [20].

Table 1 presents the open set recognition performance of our method and non-flow-based baselines on six datasets. Our approach OpenHybrid outperforms all of the baseline methods, which demonstrates the effectiveness of our approach. It is interesting to note that our method on MNIST dataset produces a minor improvement compared to the other methods. The main reason is that the MNIST is relatively simple, and the results of all methods on it are almost saturated. But for other relatively complex databases, our method performs significantly better than the the baseline methods, especially for natural images, such as CIFAR (6% better than the second best) and TinyImageNet (5% better than the second best).

4.5 Comparison with Flow-based Methods

We compare our approach against the following flow-based baseline approaches:

  1. DIGLM [18]: A neural hybrid model consisting of a linear model defined on a set of features computed by a deep invertible transformation. It uses the model’s natural reject rule based on the generative component to detect unknown inputs. The threshold is setted as , where the minimum is taken over the training set and is a free parameter providing slack in the margin.

  2. OE [9]: A training method leveraging an auxiliary dataset of unknown samples to improve unknown detection. The framework is the same as DIGLM, except that during training, a margin ranking loss on the log probabilities of training and outlier exposure samples is used to update the flow-based model. In this experiment, we use counterfactual images generated by [20] from training samples as its outlier exposure dataset.

  3. OpenHybrid with pre-trained encoder: In addition to the above methods, we further compare with a different training strategy of our approach based on alternative training. The framework is still the same. However, during training, the encoder and classifier are pretrained first on the training data. The flow-based model was then trained separately with both encoder and classifier being frozen. We expect to use this baseline to show the importance of joint training in our OpenHybrid framework.

Method MNIST SVHN CIFAR10 CIFAR+10 CIFAR+50 TinyImageNet
DIGLM ( = 80) 0.656 0.687 0.673 0.644 0.583 0.511
OE ( = 80) 0.723 0.776 0.701 0.683 0.653 0.531
DIGLM (best ) 0.670 0.737 0.702 0.694 0.633 0.565
OE (best ) 0.741 0.802 0.731 0.712 0.699 0.576
OpenHybrid (ours, )
+ pretrained encoder 0.847 0.842 0.791 0.783 0.761 0.674
+ joint training 0.942 0.912 0.865 0.903 0.888 0.753
Table 2: F-scores for our methods and flow-based baselines. is the slack parameter for outlier thresholding. We use for our method. For the baselines, we also show their performance upperbound by sweeping the value of using test labels. Results are averaged over 5 random class partitions. The best results are highlighted in bold.

Table 2

shows the F-scores (the harmonic mean of precision and recall) of our method and the three flow-based baselines in different datasets. We choose the threshold slack parameter

for all methods. Additionally, we also sweep this parameter for the baseline methods by utilizing the test labels and report their performance upper bound. We observe that our method outperforms the baseline methods significantly under all cases. The highest F-score is observed in MNIST, where a large number of background pixels from each image are almost the same. As reported by [24], the number of pixels belonging to the background in an image is a confounding factor to the likelihood score. If the background pixels of known and unknown samples are the same, and these background pixels occupy most of the image, it is difficult for flow-based models to detect these unknown samples from the likelihood value. This is because the likelihood value is dominated by the background pixels. The second best F-score is SVHN, with the natural image data of CIFAR and TinyImageNet trailing far behind, indicating that natural images are significantly more challenging.

Figure 3 shows the histograms of log-likelihoods for MNIST (0-5 as known classes and 6-9 as unknown classes) made by DIGLM, OE, OpenHybrid with pretrained encoder and OpenHybrid with joint training. For DIGLM, the three histograms almost overlap so it is impossible to detect the unknown class by setting a threshold. Then for OE, unknown samples is a bit smaller than those of known samples in training and test datasets, but there is still a large area of overlap, which causes the detection of unknown samples to be inaccurate. For OpenHybrid with pretrained encoder, although it seems intuitively better than the above two, it is not ideal for detecting unknown samples. In contrast, for ours, we can see that the histogram of unknown samples is well separated from those of known samples, and the actual minimum likelihood is almost equal to what is observed on Figure 3(d). The main reason for this huge improvement is that we project both known and unknown data into a latent feature space, where can better highlight the semantic information of the data, and effectively avoid the likelihood value being dominated by the background term.

Figure 3: Histograms of log-likelihoods for MNIST (0-5 as known classes and 6-9 as unknown classes) made by DIGLM (a), OE (b), pretrained (c) and ours (d). The blue color indicates training samples, the pink indicates known samples in the test, and the green is unknown samples.
Figure 4: Histograms of log-likelihoods for CIFAR10 (known samples) and SVHN (unknown samples) made by DIGLM (a), OE (b) and ours (c). The blue color indicates training samples, the pink indicates known samples in the test, and the green is unknown samples.

Nalisnick et al. [17] raised the issue that the flow-based model trained on CIFAR10 will assign a higher log-likelihood value to SVHN. So we further conduct an experiment on this setting, where we use the full 10 classes of the CIFAR10 as known classes, and the SVHN as an unknown class. In this setting, the openness is 29.29%. Figure 4 shows the histograms of log-likelihoods for this setting. Similar to the observation made by [17], in Figure 4(a), the histogram of unknown samples (green) is shifted more to the right than that of known samples (blue and pink), i.e., unknown samples are assigned a larger log-likelihood value than known samples. In Figure 4(b), OE seems to shift the histogram of unknown samples a little to the left, but it does not solve the problem as well. Our method is shown in Figure 4(c) which clearly distinguish the two distributions. The histogram of unknown samples is almost entirely to the left of known samples, with only a very small part of the area overlapping. The unknown samples can be detected easily by setting a threshold.

The recall of each known and unknown class (unk), and their overall accuracy (all) on the CIFAR10-SVHN setting are shown in Figure 5, our method outperforms the other baselines significantly in all categories, especially in “unk”. The reason for this is that both DIGLM and OE assign higher log likelihood values to the unknown samples than the those of training samples, so detecting the unknown samples through a minimum likelihood threshold will not be successful. The unknown accuracy of DIGLM and OE are 0.3% and 0.5%, respectively. In contrast, our unknown accuracy reached 98%.

Figure 5: Recognition accuracy for CIFAR10 (known samples) and SVHN (unknown samples) made by DIGLM (blue), OE (green) and ours (orange).

5 Conclusion

We presented the OpenHybrid framework for open set recognition. Our approach is built upon a flow-based model for density estimation, together with a discriminative classifier. Both the flow model and the classifier share the same feature representation. Our extensive experiments show that our approach outperforms both non-flow-based and flow-based state-of-the-art approaches. A common issue of flow-based models is that they often assign larger likelihood to out-of-distribution samples. We empirically observe on various datasets that this issue is resolved by learning a joint feature space. Ablation study also suggests that joint training and sharing a common representation space are key contributing factors to the improved performance of open set recognition.

6 Acknowledgement

We would like to thank Balaji Lakshminarayanan for meaningful discussions.


  • [1] J. Behrmann, W. Grathwohl, R. T. Chen, D. Duvenaud, and J. Jacobsen (2018) Invertible residual networks. arXiv preprint arXiv:1811.00995. Cited by: §1, §2.2.
  • [2] A. Bendale and T. E. Boult (2016) Towards open set deep networks. In

    Proceedings of the IEEE conference on computer vision and pattern recognition

    pp. 1563–1572. Cited by: §1, §1, §2.1, item 2, Table 1.
  • [3] T. Q. Chen, J. Behrmann, D. K. Duvenaud, and J. Jacobsen (2019) Residual flows for invertible generative modeling. In Advances in Neural Information Processing Systems, pp. 9913–9923. Cited by: §1, §2.2, §3.3.2.
  • [4] L. Dinh, D. Krueger, and Y. Bengio (2014) Nice: non-linear independent components estimation. arXiv preprint arXiv:1410.8516. Cited by: §1, §2.2.
  • [5] L. Dinh, J. Sohl-Dickstein, and S. Bengio (2016) Density estimation using real nvp. arXiv preprint arXiv:1605.08803. Cited by: §1, §2.2, §4.1.
  • [6] Z. Ge, S. Demyanov, Z. Chen, and R. Garnavi (2017) Generative openmax for multi-class open set classification. arXiv preprint arXiv:1707.07418. Cited by: §1, §1, §2.1, §2.1, item 3, Table 1.
  • [7] C. Geng, S. Huang, and S. Chen (2018) Recent advances in open set recognition: a survey. arXiv preprint arXiv:1811.08581. Cited by: §4.3.
  • [8] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014) Generative adversarial nets. In Advances in neural information processing systems, pp. 2672–2680. Cited by: §2.1.
  • [9] D. Hendrycks, M. Mazeika, and T. Dietterich (2018)

    Deep anomaly detection with outlier exposure

    arXiv preprint arXiv:1812.04606. Cited by: §1, §1, §1, §2.3, item 2.
  • [10] P. R. M. Júnior, R. M. De Souza, R. d. O. Werneck, B. V. Stein, D. V. Pazinato, W. R. de Almeida, O. A. Penatti, R. d. S. Torres, and A. Rocha (2017) Nearest neighbors distance ratio open-set classifier. Machine Learning 106 (3), pp. 359–386. Cited by: §2.1.
  • [11] R. Kamoi and K. Kobayashi (2019) Likelihood assignment for out-of-distribution inputs in deep generative models is sensitive to prior distribution choice. arXiv preprint arXiv:1911.06515. Cited by: §1, §2.3.
  • [12] D. P. Kingma and P. Dhariwal (2018) Glow: generative flow with invertible 1x1 convolutions. In Advances in Neural Information Processing Systems, pp. 10215–10224. Cited by: §1, §2.2, §4.1.
  • [13] A. Krizhevsky, G. Hinton, et al. (2009) Learning multiple layers of features from tiny images. Cited by: 2nd item, §4.2.
  • [14] A. Krizhevsky, I. Sutskever, and G. E. Hinton (2012)

    Imagenet classification with deep convolutional neural networks

    In Advances in neural information processing systems, pp. 1097–1105. Cited by: §1.
  • [15] Y. Le and X. Yang (2015) Tiny imagenet visual recognition challenge. CS 231N. Cited by: §4.2.
  • [16] Y. LeCun, C. Cortes, and C. Burges (2010) MNIST handwritten digit database. Cited by: §4.2.
  • [17] E. Nalisnick, A. Matsukawa, Y. W. Teh, D. Gorur, and B. Lakshminarayanan (2018) Do deep generative models know what they don’t know?. arXiv preprint arXiv:1810.09136. Cited by: §1, §1, §2.3, §4.5.
  • [18] E. Nalisnick, A. Matsukawa, Y. W. Teh, D. Gorur, and B. Lakshminarayanan (2019) Hybrid models with deep and invertible features. arXiv preprint arXiv:1902.02767. Cited by: §1, §1, §2.3, item 1, §4.1.
  • [19] E. Nalisnick, A. Matsukawa, Y. W. Teh, and B. Lakshminarayanan (2019) Detecting out-of-distribution inputs to deep generative models using typicality. arXiv preprint arXiv:1906.02994. Cited by: §2.3.
  • [20] L. Neal, M. Olson, X. Fern, W. Wong, and F. Li (2018) Open set learning with counterfactual images. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 613–628. Cited by: §1, §1, §2.1, item 4, item 2, §4.1, §4.4, Table 1.
  • [21] Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng (2011) Reading digits in natural images with unsupervised feature learning. Cited by: §4.2.
  • [22] P. Oza and V. M. Patel (2019) C2ae: class conditioned auto-encoder for open-set recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2307–2316. Cited by: §1, §2.1, item 5, Table 1.
  • [23] P. Perera, R. Nallapati, and B. Xiang (2019) Ocgan: one-class novelty detection using gans with constrained latent representations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2898–2906. Cited by: §2.1.
  • [24] J. Ren, P. J. Liu, E. Fertig, J. Snoek, R. Poplin, M. Depristo, J. Dillon, and B. Lakshminarayanan (2019) Likelihood ratios for out-of-distribution detection. In Advances in Neural Information Processing Systems, pp. 14680–14691. Cited by: §1, §2.3, §4.5.
  • [25] A. Rozsa, M. Günther, and T. E. Boult (2017) Adversarial robustness: softmax versus openmax. arXiv preprint arXiv:1708.01697. Cited by: §2.1.
  • [26] L. Ruff, R. Vandermeulen, N. Goernitz, L. Deecke, S. A. Siddiqui, A. Binder, E. Müller, and M. Kloft (2018) Deep one-class classification. In International conference on machine learning, pp. 4393–4402. Cited by: §2.1.
  • [27] K. Saito, S. Yamamoto, Y. Ushiku, and T. Harada (2018) Open set domain adaptation by backpropagation. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 153–168. Cited by: §2.1.
  • [28] W. J. Scheirer, L. P. Jain, and T. E. Boult (2014) Probability models for open set recognition. IEEE transactions on pattern analysis and machine intelligence 36 (11), pp. 2317–2324. Cited by: §2.1.
  • [29] B. Schölkopf, R. C. Williamson, A. J. Smola, J. Shawe-Taylor, and J. C. Platt (2000) Support vector method for novelty detection. In Advances in neural information processing systems, pp. 582–588. Cited by: §2.1.
  • [30] L. Shu, H. Xu, and B. Liu (2017) Doc: deep open classification of text documents. arXiv preprint arXiv:1709.08716. Cited by: §1, §2.1.
  • [31] V. M. Venkataram (2018) Open set text classification using neural networks. Ph.D. Thesis, University of Colorado Colorado Springs. Kraemer Family Library. Cited by: §2.1.
  • [32] S. Vernekar, A. Gaurav, V. Abdelzad, T. Denouden, R. Salay, and K. Czarnecki (2019) Out-of-distribution detection in classifiers via generation. arXiv preprint arXiv:1910.04241. Cited by: §2.1.
  • [33] R. Yoshihashi, W. Shao, R. Kawakami, S. You, M. Iida, and T. Naemura (2019) Classification-reconstruction learning for open-set recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4016–4025. Cited by: §1, §2.1, item 6, Table 1.
  • [34] H. Zhang and V. M. Patel (2016) Sparse representation-based open set recognition. IEEE transactions on pattern analysis and machine intelligence 39 (8), pp. 1690–1696. Cited by: §2.1.
  • [35] H. Zhang, A. Li, X. Han, Z. Chen, Y. Zhang, and Y. Guo (2019)

    Improving open set domain adaptation using image-to-image translation

    In 2019 IEEE International Conference on Multimedia and Expo (ICME), pp. 1258–1263. Cited by: §2.1.