mlosr
None
view repo
We propose a novel deep convolutional neural network (CNN) based multitask learning approach for openset visual recognition. We combine a classifier network and a decoder network with a shared feature extractor network within a multitask learning framework. We show that this approach results in better openset recognition accuracy. In our approach, reconstruction errors from the decoder network are utilized for openset rejection. In addition, we model the tail of the reconstruction error distribution from the known classes using the statistical Extreme Value Theory to improve the overall performance. Experiments on multiple image classification datasets are performed and it is shown that this method can perform significantly better than many competitive open set recognition algorithms available in the literature. The code will be made available at: github.com/otkupjnoz/mlosr.
READ FULL TEXT VIEW PDFNone
Recent developments in deep convolutional neural networks (CNNs) have shown impressive performance on various computer vision tasks such as image classification and object recognition alexnet12 , resnet16 , senet17
. The top5 error rate of recent image classification methods on ImageNet dataset
imagenet15 has dropped from 25% winner11 to 2.25% winner17 . Even though the performance of these systems look very promising, their setting is not realistic. A vast majority of these algorithms follow a closedset setting, where the knowledge of all the test classes is assumed to be available during training. However, realworld applications contain many more categories compared to the number of categories present in any of the current datasets, e.g., ImageNet (1000 classes). Hence, when deployed, these systems are highly likely to observe test samples belonging to unknown classes, i.e., classes not observed during training. Because of the closed set assumption, these systems will wrongly recognize a test sample from an unknown class as a sample belonging to one of the known closed set classes.Openset recognition was introduced to tackle this problem, extending closed set classification task in a more realistic scenario scheirer2013toward . In openset recognition, an incomplete knowledge of the world is assumed during training, i.e., test samples can be from any of the unknown classes, not observed during training (see Fig. 1). The goal of an openset recognition system is to reject test samples from unknown classes while maintaining the performance on known classes. Since we only have access to the known classes during training phase, it is challenging to identify unknown classes in a closedset setting. Scheirer et al. scheirer2013toward proposed a new framework by combining empirical risk minimization with open space risk minimization for openset recognition problems. Open space risk is defined as the risk of labeling the unknown samples as known. Following this framework, there have been many openset recognition algorithms proposed over the past few years that try to directly or indirectly minimize open space risk for training openset recognition models. Jain et al. jain2014multi
showed vulnerability of Support Vector Machine (SVM) based classification (Fig. 1 in the paper
jain2014multi) in the presence of unknown class test samples and proposed an approach to improve the identification of unknown classes. Many other extensions of the traditional statistical classification approaches for openset recognition have also been proposed in the literature
scheirer2013toward , jain2014multi , scheirer2014probability , junior2016specialized , rudd2018extreme , junior2017nearest , bendale2015towards , zhang2017sparse .As mentioned earlier, CNN is a powerful tool to learn discriminative representations for image classification. However, they are fundamentally limited in identifying unknown samples due to their closedset training (refer to Fig. 2 for details). Hence, it is important to make CNNbased image classification algorithms capable of performing openset recognition. There have been several methods proposed over the years to tackle the presence of unknown classes by extending deep neural networks in openset settings bendale2016towards , shu2017doc , ge2017generative , neal2018open , yoshihashi2018classification . Bendale et al. bendale2016towards
proposed to use pretrained penultimate activations from a neural network and extreme value modeling to update the SoftMax probability values for openset recognition (referred to as Openmax). Ge
et al. ge2017generative used synthetic unknown classes generated using a Generative Adversarial Network (GAN) goodfellow2014generative and trained a neural network to classify those samples as unknown. Shu et al. shu2017docproposed a novel loss function by replacing the SoftMax layer with sigmoid activations (referred to as onevsrest layer) to train a neural network for openset recognition. Neal
et al. neal2018open introduced another GANbased data augmentation approach which generates synthetic unknown class images referred to as counterfactual images for openset recognition. Yoshihashi et al. yoshihashi2018classification proposed a novel neural network architecture which involves hierarchical reconstruction blocks and extreme value model for openset recognition. Though there has been a fair amount of research in developing CNNbased openset recognition algorithms, the performance of these systems for challenging object recognition datasets is still far from optimal.In this paper, we present a CNNbased multitask learning algorithm for openset recognition. The proposed Multitask Learning Based OpenSet Recognition (MLOSR) method consists of a shared feature extractor network along with a decoder network and a classifier network for reconstruction and classification, respectively. All these networks are trained in a multitask learning framework caruana1997multitask . We show that such multitask training yields a better model for openset recognition by improving the identification of samples from the unknown classes. Additionally, we utilize extreme value theory (EVT) modeling techniques to model the reconstruction error distribution from the network that further enhances the performance. Extensive experiments on multiple image classification datasets show that MLOSR performs better than existing openset algorithms. In summary, the main contributions of this paper are as follows:
We propose a CNNbased multitask learning algorithm, called MLOSR, for openset recognition.
Extensive experiments on various datasets show that the proposed multitask training helps to reject outofdistribution data as well as samples from the indistribution unknown classes.
This paper is organized as follows. Section 2
gives a brief review of openset recognition and related problems such as outofdistribution detection, anomaly detection, and EVT. Section
3 introduces the proposed approach and presents training and testing details of the MLOSR algorithm. Experiments and results are presented in Section 4 and Section 5 concludes the paper with a brief summary and discussion.In this section, we provide some related works on openset recognition, outofdistribution detection, EVT and novelty detection.
In recent years, a few attempts have been made to create a classifier with rejection option bartlett2008classification , yuan2010classification , da2014learning . Inspired from these earlier methods, Scheirer et al. scheirer2013toward formally defined the openset recognition problem and introduced a framework to train and evaluate such algorithms. Scheirer et al. scheirer2013toward also introduced a simple slab modelbased approach to address this problem. In follow up works by Scheirer et al. scheirer2014probability and Jain et al. jain2014multi , both proposed to leverage extreme value models on the SVM decision scores to extend the SVMbased classification in openset setting. While Jain et al. jain2014multi utilized the decision scores only from OnevsAll multiclass SVM, Scheirer et al. scheirer2014probability combined the scores from multiclass SVM with classspecific oneclass RBFSVMs to get a better openset model. Junior et al. junior2017nearest proposed a nearest neighborbased classification approach based on the similarity scores calculated using the ratio of distances between the nearest neighbors, and identified any test sample as unknown having low similarity. Zhang and Patel zhang2017sparse proposed another approach by extending the sparse representationbased classification (SRC) to the openset setting. They also discovered that the residual errors from SRC contain some discriminative information to identify known and unknown classes. These residual errors are modeled using EVT as match and nonmatch to identify unknown test samples by hypothesis testing.
Following these extensions of traditional classification algorithms for openset recognition, Bendale et al. bendale2016towards became the first work to address the openset recognition problem for deep neural networks. Since, thresholding on SoftMax probability does not yield a good model for openset recognition bendale2016towards (also shown in Fig. 2), an alternative solution was proposed for adapting deep neural network to openset settings. Bendale et al. bendale2016towards utilized the activation vectors from a penultimate layer of a pretrained deep neural network. Modeling distance of these activation vectors from the mean of each class with EVT an updated penultimate vector is generated (referred to as OpenMax). This updated vector yields a better model for identifying unknown class test samples. Ge et al. ge2017generative introduced GOpenMax algorithm which combines OpenMax with data augmentation using GANs. Ge et al. generated unknown samples from the known class data using GANs and later used them for training a CNN along with known classes. This data augmentation technique was shown to improve the unknown class identification. In another approach, Shu et al. shu2017doc argued that OpenMax inherently considers that hard to classify samples are more likely to be from the unknown classes and proposed a sigmoid activationbased method, to overcome that issue. The sigmoid activation method replaces the SoftMax layer to train the network with a novel loss function. Neal et al. introduced another GANbased data augmentation method. Instead of considering the misclassified samples generated using GAN as unknown classes like GOpenMax, Neal et al. proposed a method to search for such examples, referred to as counterfactualimages. These counterfactual samples are later augmented with the original dataset as unknown class samples and are utilized to finetune the classification network. This technique was shown to be a better data augmentation approach than GOpenMax for openset recognition. Recently, Yoshihashi et al. yoshihashi2018classification proposed a novel neural network architecture for openset recognition which consists of hierarchical reconstruction modules combined with extreme value modeling. To the best of our knowledge, it is the best performing openset algorithm in the literature.
Recently, some concerns have been raised regarding the safety of AI systems amodei2016concrete . These include Adversarial Attacks szegedy2013intriguing , yuan2017adversarial , Distribution Shifts hendrycks2016baseline , lee2017training etc. One of the concerns regarding the distribution shift is the detection of outofdistribution (OOD) examples. Hendrycks et al. first introduced the problem of OOD in hendrycks2016baseline and defined it as the detection of samples that do not belong in the training set but can appear during testing. Several approaches have been proposed in the literature that address the OOD problem hendrycks2016baseline , liang2017enhancing , hendrycks2018deep , lee2017training , liu2018open . The experimental setup used for evaluating OOD usually includes two datasets, a clean set with finite categories available during training, and testing as well as OOD test set containing samples from a completely different distribution. For example, classification on CIFAR10 krizhevsky2014cifar will have OOD examples from LSUN yu2015lsun . Both OOD and openset problems are studied separately even though the OOD problem setting resembles that of openset recognition. Furthermore, the evaluation protocol followed by both OOD and openset recognition problems is very similar. Hence, we will also explore the capability of the proposed openset recognition method in detecting outofdistribution samples.
The problems such as discovering outliers
xia2015learning , you2017provable , niu2018learning , identifying novel classes abati2018and , perera2018learning and detecting anomalies golan2018deep , sabokrou2018adversarially , chalapathy2017robust also have some overlap with openset recognition. Though all of these problems involve identifying abnormality/novelty, the problem setting differs from the openset problem. Though anomaly/novelty detection problems do not have access to abnormal/novel classes during training phase, several works assume the availability of abnormal classes hendrycks2018deepduring training and they are mainly limited to one class recognition problems. Outlier detection allows access to outlier data during the training phase. On the other hand, openset recognition problems do not have access to unknown class data and also deals with multiclass classification problem and hence is more challenging than outlier, novelty or anomaly detection.
Extreme Value Theory has proven to be useful in many vision applications shi2008modeling , gibert2017deep , scheirer2017extreme , scheirer2011meta , fragoso2013evsac including openset recognition scheirer2014probability , rudd2018extreme , yoshihashi2018classification , bendale2016towards , zhang2017sparse . This popularity is attributed by the fact that extreme value modeling of decision scores yields better performance than directly utilizing the raw score values scheirer2011meta , scheirer2017extreme . Extreme value modeling has became one of the most popular approaches for post recognition score analysis to improve the performance of openset recognition. Inspired by these methods, the proposed approach also utilizes EVT to obtain better recognition scores.
In this section, we describe the proposed approach in detail. The proposed system architecture is composed of four modules:
1. Feature Extractor (or Encoder) (),
2. Decoder (),
3. Classifier (), and
4. Extreme Value Model ().
Fig. 3 shows these modules of the proposed system. The feature extractor (or encoder) network () is modeled by a CNN architecture which maps an input image onto a latent space representation. The decoder (), modeled by another CNN and a classifier (), modeled by a fullyconnected neural network, take this latent representation as input and produce a reconstructed image and its label as the outputs, respectively. Both the decoder network and the classifier network share the feature extractor module. After the models , and are trained, the reconstruction errors are modeled using EVT. In the following sections, we present the training procedure to learn the parameters (, , ) and discuss the recognition score analysis using EVT.
The feature extractor network can be represented by a function, . Similarly, let the decoder and the classifier networks be represented by functions and , respectively. Here, is the space of latent representations, is the space of all images, and is the space of all possible image labels. Let , and be the parameters of , and , respectively. The classification loss, denoted by , penalizes the network for misclassifying known class samples. The reconstruction loss, denoted by , penalizes the network for generating images away from the known class samples. Let and be a sample from any of the known classes and it’s corresponding label. Let be the reconstructed input from the encoderdecoder pipeline, +. Also, is the predicted class probability vector by the encoderclassifier pipeline, +. The loss function depends on the parameters , which are associated with the networks and , respectively. Similarly, the loss function depends on the parameters , which are associated with the networks and , respectively. We can formulate the losses for the input images with batch size of as follows
(1) 
(2) 
where, is a onehot vector for label . Also, and can be implemented using any valid classification and reconstruction loss functions, respectively. For this method we consider crossentropy for the classification loss () and norm of the vectorized images for the reconstruction loss (). and are defined as follows,
(3) 
(4) 
The final loss to train the overall network is as follows
(5) 
Here, and are two constants and is the total number of classes known during training. After training, the learned parameters , , will yield an openset recognition model.
As discussed in Section 2, EVT is useful in many vision applications. Some openset recognition algorithms scheirer2014probability , rudd2018extreme , yoshihashi2018classification , bendale2016towards , zhang2017sparse also utilize this tool to model the tail part of the distribution corresponding to the recognition scores. Histograms corresponding to the reconstruction errors from both known (shown in green) and unknown (shown in red) class samples computed using the COIL00 dataset are shown in Fig. 4
. As can be seen from this figure that the reconstruction errors contain some information to discriminate between unknown and known classes. Since, during training we do not have access to the samples from unknown classes, the region of optimal decision threshold must lie somewhere within the set of extremes values from the known class reconstruction errors (the overlapped region). With this observation, we also use EVT to model the tail part of the reconstruction error distribution to achieve a better estimate of the tail data.
There are two widely used theorems to model statistical extremes, namely, FisherTippettGnedenko theorem (FTG) scheirer2017extreme and PicklandsBalkemadeHaan formulation or Generalized Extreme Value theorem (GEV) pickands1975statistical , balkema1974residual . Some works follow FTG scheirer2014probability , bendale2016towards , yoshihashi2018classification while others follow GEV zhang2017sparse . In this method, we consider extreme value formulation by PicklandsBalkemadeHaan or GEV. It states that for large enough threshold , for a large class of distributions denoted as , with , IID samples, the following equation is well approximated by a Generalized Pareto Distribution (GPD), denoted as ,
where,
such that , , and . is CDF of GPD and hence, denotes the probability that extremes of distribution is less than some value . This probability score will be useful in making decision about whether a given test sample is from known classes or not. The parameters and can be estimated from the given tail data, using maximum likelihood estimation procedure, provided by Grimshaw et al. grimshaw1993computing . Here, there are two user defined parameters  the tail size and the threshold value to make the decision on known/unknown classes (more information is provided in Section 4). The MLOSR training and testing procedures are summarized in Algorithms 1 and 2, respectively.
In this section, we demonstrate the effectiveness of the proposed MLOSR approach by conducting various experiments on the COIL100 nene1996columbia , MNIST lecun2010mnist , SVHN netzer2011reading , CIFAR10 krizhevsky2009learning and TinyImageNet le2015tiny datasets. In particular, we first present analysis of the proposed approach on the COIL100 dataset. Then, we compare the performance of MLOSR with recent stateoftheart openset recognition algorithms on four image classification datasets (MNIST, SVHN, CFAR10, TinyImageNet). In these experiments, unknown classes are sampled from within the dataset by dividing the total number of classes into known and unknown categories. In the last set of experiments, we test the ability of MLOSR to detect outofdistribution unknown samples. In this experiment, the indistribution samples are from CIFAR10 and outofdistribution samples are from ImageNet and LSUN, as provided by liang2017enhancing .
The networks are trained using the Adam optimizer kingma2015adam with the learning rate () of 0.0003 and batch size () of 64. We stop the training when loss becomes sufficiently small. Both weights and for reconstruction and classification loss are set equal to 0.5. For EVT modeling, we keep the tailsize of 20 for all experiments. The decision threshold during testing is set equal to 0.5, i.e., identify any sample with reconstruction error and probability less than 0.5 as unknown.
In this experiment, we perform the quantitative and the qualitative analysis to give insights into the proposed MLOSR algorithm. For quantitative analysis, we measure the performance gain contributed by each module of the overall algorithm. The qualitative study provides visual examples of the reconstructed known and unknown test samples.
We perform ablation experiments showing the contribution from each modules of MLOSR on the COIL100 dataset. The COIL100 dataset has 100 different object categories with varying pose. Out of 100 classes randomly sampled 15 classes are used as known classes and the remaining are used as unknown. The dataset contains colored images of size . For this experiment, each image is converted into grayscale, resized to and intensity values are normalized between . The network architectures used for this experiments are as follows,
Encoder:
Conv(32)ReLUConv(64)ReLUConv(128)FC(512).
and stride 2 respectively, ReLU and Tanh are activation units and FC(L) denote fully connected layer with L neurons.
For ablation analysis, the performance is measured using Fmeasure (or F1 score) against varying Openness scheirer2013toward of the problem. Openness is defined as follows
(6) 
Here, is the number of target classes, is the number of train classes and is the number of test classes scheirer2013toward
. For this experiment openness is varied by keeping 15 classes as known and changing the number of unknown classes from 15 to 85. This corresponds to the change in openness from 18% to 49%. The performance and the corresponding errors for each openness are calculated as average and standard deviation of five randomized trials.
We consider the following methods as baselines for comparison:
1. DCN+SoftMax: Encoder and classifier networks are trained using the classification loss and the SoftMax scores are used for closedset classification. A test sample is identified as unknown if the leading SoftMax score (between ) is less than 0.5. This baseline is a traditional closedset model with a threshold over the SoftMax scores.
2. DCN+AE: Encoder and classifier networks are trained using and SoftMax scores are used for classification. However, to identify any test sample as unknown, an autoencoder is used, with encoderdecoder architectures as described above. This encoderdecoder pipeline is trained with a reconstruction loss of . A test sample is identified as unknown when the reconstruction error is more than 50% of the maximum reconstruction error observed on training samples. Another difference to note is that the encoder network is not shared across encoderclassifier and encoderdecoder pipelines. Instead, these pipelines are trained separately with two different encoder networks having the same architecture. This baseline shows the use of reconstruction error as a score compared to SoftMax scores for identifying unknown classes. It also provides a baseline to compare with multitask training having a shared encoder network.
3. MLOSR (without EVT): Encoderclassifier and encoderdecoder pipelines are trained with a shared encoder network using and loss function in a multitask fashion. SoftMax scores and reconstruction errors are utilized for closedset classification and identifying unknown classes, respectively. Method to identify any test sample as unknown is similar to the method mentioned in the previous baseline. This method provides a baseline to compare the performance with and without extreme value modeling and shows benefits of multitask training.
4. MLOSR (Proposed): This is the method proposed in this paper, where after multitask training of encoder, classifier and decoder networks, EVT models the tail part of the reconstruction errors from known classes as described in Section 3.2. A test sample is identified as unknown when the reconstruction error has less than 0.5 probability of coming from a known class. Extreme value model () provides the probability score.
From Fig. 5 it is clear that DCN+SoftMax is not an optimal model for openset recognition and has the worst performance among all the baselines. DCN+AE shows that utilizing reconstruction errors from an encoderdecoder trained on known classes elevates the performance of the openset recognition model. It shows that the reconstruction errors are better than SoftMax scores for identifying unknown classes. Furthermore MLOSR (without EVT) shows that DCN+AE performance can be further improved by utilizing a multitask training strategy with shared encoder, and improves the openset performance of the model even further. Finally, the MLOSR utilizes the extreme value model on the known class to better model the tail part of the reconstruction error distribution. This in turn gives improvements over MLOSR (without EVT) under varying openness.
Fig. 6 shows the qualitative analysis of the MLOSR algorithm in cases where the test input is from known and unknown classes. Models trained with MLOSR produce output that correctly reconstructs the test input if they are from the known classes, resulting in low reconstruction errors. On the other hand, for the test samples from unknown classes, MLOSR produces distorted outputs resulting in high reconstruction errors.
For the openset recognition experiments, we use the testing protocol followed in neal2018open . For the encoder, decoder and classifier networks, the architectures are also the same as neal2018open . For all other methods compared, the same architecture is also followed for this experiment. All images are resized to and intensity values are normalized between for each color channel. The following experimental protocols are followed:
(i) MNIST: MNIST lecun2010mnist has total 10 digit classes having images of handwritten digits. Out of which the number of randomly chosen known and unknown classes used are 6 and 4, respectively. This choice results in the openness of 13.39%.
(ii) SVHN: SVHN netzer2011reading is a digit dataset, where the images of each digit is cropped from house number image data collected from the google street view images. SVHN has a total of 10 digit classes, Similar to MNIST, for SVHN randomly chosen 6 known and 4 unknown classes are used with the openness of 13.39%.
(iii) CIFAR10: CIFAR10 krizhevsky2014cifar consists of 10 object categories. Out of which, we randomly choose 6 known and 4 unknown classes which results in the openness of 13.39%.
(iv) CIFAR+10: CIFAR+10 uses 4 classes from CIFAR10 that are non animal categories and 10 classes from CIFAR100 krizhevsky2014cifar are randomly sampled from the animal categories as known and unknown classes, respectively. This results in the openness of 33.33%.
(v) CIFAR+50: Similar to CIFAR+10, 4 nonanimal CIFAR10 categories as sampled as known classes and 50 animal categories from CIFAR100 as sampled as unknown, resulting in the openness of 62.86%.
(vi) TinyImageNet: TinyImageNet le2015tiny dataset is derived from ImageNet imagenet15 by reducing the number of classes and image sizes. It has a total of 200 categories with 500 images per category for training and 50 for testing. From 200 categories 20 known and 180 unknown classes are randomly sampled, resulting in openness of 57.35%.
Method  MNIST  SVHN  CIFAR10  CIFAR+10  CIFAR+50  TinyImageNet 
SoftMax  0.978  0.886  0.677  0.816  0.805  0.577 
OpenMax bendale2016towards (CVPR’16)  0.981  0.894  0.695  0.817  0.796  0.576 
GOpenMax ge2017generative (BMVC’17)  0.984  0.896  0.675  0.827  0.819  0.580 
OSRCI neal2018open (ECCV’18)  0.988  0.910  0.699  0.838  0.827  0.586 
CROSR yoshihashi2018classification (CVPR’19)  0.998  0.955  ——  ——  ——  0.670 
MLOSR  0.989  0.921  0.845  0.895  0.877  0.718 

Method  ImageNetcrop  ImageNetresize  LSUNcrop  LSUNresize  
VGGNet  SoftMax  0.639  0.653  0.642  0.647  
OpenMax  0.660  0.684  0.657  0.668  
LadderNet + SoftMax  0.640  0.646  0.644  0.647  
LadderNet + OpenMax  0.653  0.670  0.652  0.659  
DHRNet + SoftMax  0.645  0.649  0.650  0.649  
DHRNet + OpenMax  0.655  0.675  0.656  0.664  
DHRNet + CROSR  0.721  0.735  0.720  0.749  
MLOSR  0.837  0.826  0.783  0.801  
DenseNet  SoftMax  0.693  0.685  0.697  0.722  
OpenMax  0.696  0.688  0.700  0.726  
DHRNet + SoftMax  0.691  0.726  0.688  0.700  
DHRNet + OpenMax  0.729  0.760  0.712  0.728  
DHRNet + CROSR  0.733  0.763  0.714  0.731  
MLOSR  0.903  0.896  0.871  0.929 
The performance of the method is measured by its ability to identify unknown classes. Following the protocol from neal2018open , the Area Under the ROC curve (AUROC) is used to measure the performance of different methods. The values reported in Table 1 are averaged over five randomized trials. The numbers corresponding to CROSR are taken from yoshihashi2018classification . The results corresponding to all the other methods except MLOSR (proposed method), are taken from neal2018open . CROSR yoshihashi2018classification did not report its performance on the CIFAR10, CIFAR+10, and CIFAR+50 datasets, hence those numbers are not included here. The results on digits dataset are mostly saturated, and almost all methods perform more or less similar. CROSR achieves the best performance on digits dataset, with next best performance from MLOSR. However, for TinyImageNet which is much more challenging object classification dataset, MLOSR performs better than CROSR giving an improvement of 5%.
In this experiment, we test the ability of MLOSR to identify OOD samples. Following the protocol defined by Yoshihashi et al. in yoshihashi2018classification , which uses indistribution samples from CIFAR10 and samples from four different datasets (ImageNetcrop, Imagenetresize, LSUNcrop and LSUNresize) as OOD samples. These four OOD datasets were developed specifically for CIFAR10 by liang2017enhancing . Following the setup of yoshihashi2018classification , OOD experiments use two backbone network architectures, VGGNet (referred in the paper as Plain CNN) and DenseNet. VGG, which consists of 13 layers, is a modified version of the VGG architecture as defined in simonyan2014very . DenseNet follows the network architecture defined by huang2017densely for CIFAR10. It has a depth of 92 and a growth rate of 24. Decoder architecture for all experiments is the same as used in the openset experiments (modified to accommodate the image size), and the classifier architecture is a simple one layer fullyconnected network with 10 neurons corresponding to 10 categories of CIFAR10 dataset, for all OOD experiments. We consider LadderNet valpola2015neural and DHRNet baselines, where DHRNet architecture is a novel openset method proposed by yoshihashi2018classification . The performance is measured using Fmeasure (or F1 score). All the images are of size . For training we use all 50,000 training samples of CIFAR10 and evaluate the trained model on 10,000 OOD samples for each experiments, i.e., ImageNetresize, LSUNresize, ImageNetcrop, LSUNcrop. All the reported numbers except MLOSR are taken from yoshihashi2018classification .
The Table 2 shows that for all OOD experiments MLOSR performs significantly better than other openset algorithms. Second best performing method is DHRNet with CROSR both proposed in yoshihashi2018classification . With VGGNet backbone, MLOSR is able to improve the performance by 11.6%, 9.31%, 6.3% and 5.2% for OOD samples from ImageNetcrop, ImageNetresize, LSUNcrop, LSUNresize, respectively. On average, MLOSR with the VGGNet backbone performs better than the next best method by 8.05%. Furthermore, MLOSR with DenseNet backbone significantly improves the performance by 17.0%, 13.3%, 15.7% and 19.8% for OOD samples from ImageNetcrop, ImageNetresize, LSUNcrop and LSUNresize, respectively. On average, MLOSR with DenseNet backbone performs better than the next best method by a significant 16.45%. Overall MLOSR achieves 12.25% improvement on average over the next best method (i.e., in both cases it is DHRNet+CROSR). Even though, there is no comparison provided for CIFAR10 dataset in Section 4.2 for openset recognition experiments, results from OOD experiments show that for particular object datasets, MLOSR shows better performance than DHRNet+CROSR in identifying unknown samples.
We presented an openset recognition algorithm for deep neural networks called Multitask Learning for OpenSet Recognition (MLOSR). MLOSR uses encoder, decoder and classifier networks trained in a multitask framework. We compare the performance of MLOSR with state of the art openset algorithms and show better overall performance. Furthermore, we validate the ability of MLOSR to counter OutOfDistribution unknown samples by conducting experiments with the CIFAR10 dataset. MLOSR emerges as the best performing algorithm showing significant improvements over the baselines. Experiments show that MLOSR is able to deal with unknown samples better than recent competitive methods. It achieves the stateoftheart performance on various openset recognition and OOD datasets.
Journal of Machine Learning Research
9(Aug), 1823–1840 (2008)In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1893–1902 (2015)
In: TwentyEighth AAAI Conference on Artificial Intelligence (2014)
In: NIPS workshop on deep learning and unsupervised feature learning, vol. 2011, p. 5 (2011)
Perronnin, F., Liu, Y., Sánchez, J., Poirier, H.: Largescale image retrieval with compressed fisher vectors.
In: Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pp. 3384–3391. IEEE (2010)Valpola, H.: From neural pca to deep unsupervised learning.
In: Advances in Independent Component Analysis and Learning Machines, pp. 143–171. Elsevier (2015)