Robotic applications demand special considerations when designing a visual recognition system. The open set nature of robotics means that a robot will encounter observations belonging to novel, out-of-distribution classes. Any classifier prediction in a robotic environment can trigger some sort of costly robotic action. As such, a recognition system must not silently fail when observing a novel example by incorrectly predicting a label from the training set distribution, as shown in Figure 1a. Additionally, a robotic vision system should not cease learning after the initial training phase. The distribution of data in the training set will undoubtedly vary from the true distribution of data in the robot’s operating environment. By sampling data from the environment and interactively querying a human user about novel observations, a robotic vision system can continue to improve its understanding of the real-world data distribution, as shown in Figure 1b.
Detecting out-of-distribution observations is known as novelty detection and the problem of both classifying in-distribution observations with the correct class label and detecting novel examples is known as open set recognition. The task of interactively querying a user for labels is referred to as active learning. If a recognition system can select the most informative observations for labelling, the model can efficiently learn from a small number of labelled examples. This is important for robotics, as the number of observations may be very large but the labelling budget is likely to be small. In this work we focus on the active learning of novel classes, also referred to as open set active learning. A model trained on known classes is deployed in an environment containing both known and novel classes. The active learning algorithm aims to learn about the novel class distribution from as few human-labelled examples as possible.
Conventional classification approaches, such as convolutional neural networks (CNNs) with softmax classifiers[1, 2, 3, 4], are closed set by design. As such, these commonly used methods are limited in their ability to detect novel examples. Softmax-based CNNs are not only forced to predict a known label for out-of-distribution examples, but often do so with high confidence . Approaches that are limited in their ability to detect novel examples, are also limited in their ability to learn from and improve their understanding of the corresponding unknown classes. As a result, these conventional softmax-based approaches are not suitable for open set robotic vision problems.
Deep metric learning algorithms learn a transformation from the image space to a feature embedding space, in which distance is a measure of semantic similarity. State-of-the-art deep metric learning models demonstrate an impressive aptitude for transfer learning[6, 7, 8, 9, 10, 11, 12], meaning that features are likely to be co-located based on class, even when those classes are outside of the training distribution. This not only allows for the reliable detection of novel examples, but also provides a meaningful way of determining an observation’s informativeness of the true class distribution. This knowledge enables efficient querying in an active learning setting, allowing the model to learn about novel classes from a small amount of labelled examples.
An overview of our approach is shown in Figure 1 and the main contributions of this paper are as follows:
We propose an open set active learning approach using metric spaces, which allows a model to efficiently learn about observed novel classes (Section III-C).
We show that our proposed approach to active learning significantly outperforms comparable methods at small labelling budgets (Section IV-C).
For a labelling budget of zero, we investigate if the representation of observed novel classes can be improved using unsupervised pseudo-labels (Section IV-D).
Ii Related Work
Ii-a Novelty Detection and Open Set Recognition
that analyse the information content of data. Novelty detection is related to the task of anomaly and outlier detection.
Recent works that make use of deep CNNs include a Generative Adversarial Network approach , in which a multi-class discriminator is trained with a generator that creates data from both known and novel distributions. Mandelbaum and Weinshall  propose a density-based confidence score that can be applied to novelty detection, as an alternative to confidence scores based on softmax probabilities .
Bendale and Boult 
propose an open set version of a softmax classifier named OpenMax. Class probabilities are revised using a meta-recognition Weibull model fitted on distances between activation vectors and per-class mean activation vectors. A pseudo-class representing unknown classes is introduced, allowing direct measurement of novelty.
Liang et al.  introduce an out-of-distribution detector called ODIN that operates on pre-trained softmax-based networks. The authors use softmax temperature scaling and input pre-processing to push softmax scores from known and novel classes further apart. This method requires a forward pass, backward pass and second forward pass through the network to perform novelty detection.
Ii-B Active Learning
Classic methods of active learning include uncertainty approaches [29, 30, 31] and decision-theoretic approaches [32, 33, 34]. A comprehensive review of these methods can be found in Settles’ survey . Recent works have investigated active learning with CNNs [36, 37, 38, 39, 40]
. Approaches include framing active learning as a reinforcement learning problem, generative adversarial active learning  and a core-set based approach . These methods aim to select a subset that best represents the entire set of unlabelled examples, for the purpose of initial training. Our method is focused on learning from observed novel classes that are not present in the existing training set. This means that rather than selecting a subset that best represents the entire unlabelled set, we want to select a subset that best represents the novel classes in the unlabelled set. In other words, we don’t want to waste our limited labelling budget on classes that are already well learned.
Iii Metric Learning for Open Set Problems
We first introduce the notation used for the remainder of this paper. For a given observation with label , a trained neural network produces a -dimensional feature embedding in a metric space . The set of labelled training feature embeddings is denoted as , with corresponding to the feature embedding vector of the -th labelled training image.
Iii-a Deep Metric Learning
Deep metric learning refers to metric learning approaches that make use of deep convolutional neural networks (CNNs). Unlike conventional classification models, such as a CNN with a softmax classifier, metric learning algorithms aim to learn a transformation from the image space to a feature embedding space , in which distance is a measure of semantic similarity. Deep metric learning approaches learn feature embeddings that are amenable to transfer learning [6, 7, 8, 9, 10, 11, 12]. This suggests that such models are suitable for detecting novel examples.
The deep metric learning approach we use in this paper is described in 
. A Gaussian kernel, or radial basis function, is centred on each training feature embedding. The probability that examplehas class label is computed as:
is a shared standard deviation andis the set of training nearest neighbours for example .
During training the Gaussian kernels pull examples of the same class together and push examples of different classes apart. The loss for a given training example is the negative logarithm of the true class probability. The approach is made scalable to large numbers of classes and examples through the use of fast approximate nearest neighbour search. Training is made feasible and efficient by periodic asynchronous updates of the training embeddings and nearest neighbours, negating the need to do so after every network update.
Although many deep metric learning approaches perform well on transfer learning tasks, the feature embeddings learned by commonly used triplet approaches are not well suited to classification . In contrast, the Gaussian kernel approach performs well for transfer learning as well as classification, outperforming softmax classification on several datasets . This makes the model suitable for our open set recognition and active learning setting.
Iii-B Novelty Detection and Open Set Recognition
We now describe the problem of detecting out-of-distribution observations. Let denote the distribution of the known training data and denote the distribution of unknown data that is outside of . Our open set recognition system should determine whether an observation is from the known distribution or the unknown distribution . If is from , the classifier should predict a class label , otherwise it should be labelled as unknown/novel.
The deep metric learning model used by our approach  stores all training set feature embeddings and computes the Euclidean distance between an example embedding and its set of nearest neighbours for the purpose of classification. Distance between examples in the metric space can be used as a measure of semantic similarity. We expect that most observed examples from a known class will be located nearby training set embeddings of the same class. Observed examples that are not located nearby any training set embeddings are novel to the model and are likely from . Since it is known the metric learning model transfers well to novel classes, we expect the model to be well suited to novelty detection. This assumption is evaluated in Section IV-B.
Our open set classifier and novelty detector predicts a class label for example as follows:
where is a novelty function and is a threshold.
We investigate the effectiveness of several simple distance-based novelty measures () for use in a deep metric space. Distance-based measures are appropriate since we know that the metric learning model transfers well to novel classes.
Iii-B1 Nearest neighbour distance (NN dist.)
is the distance between an observed example and its nearest training set embedding. The Euclidean distance between the embeddings and is denoted as .
Iii-B2 Maximum class density (density)
is the Shannon entropy of the class density distribution from Equation 1.
Since the metric learning model’s class probability distribution is computed based on class densities, thedensity measure is equivalent to measuring novelty based on the maximum class probability. Density is suggested as a suitable measure for novelty detection in  and we adapt the method for our approach, using the shared Gaussian .
|Baseline  Max Pr. px||0.8331||0.8116||0.7832||0.7334||0.8509||0.8051||0.8004||0.7873||0.7311||0.6933||0.7277||0.6473|
|Baseline  Entropy px||0.8512||0.8374||0.7865||0.7395||0.8559||0.8206||0.8015||0.7907||0.7397||0.7017||0.7280||0.6422|
|OpenMax  px||-||-||0.8055||0.7515||-||-||0.7985||0.7588||-||-||0.7628||0.6893|
|ODIN  Max Pr. px||0.8613||0.8443||0.8021||0.7531||0.8712||0.8471||0.8021||0.7531||0.7383||0.7031||0.7271||0.6404|
|ODIN  Entropy px||0.8668||0.8469||0.8089||0.7623||0.8690||0.8447||0.8085||0.7848||0.7400||0.7034||0.7260||0.6389|
|Ours: DML Density px||0.8901||0.8671||0.8263||0.7878||0.9043||0.8741||0.8442||0.8255||0.7838||0.7475||0.7419||0.6895|
|Ours: DML Entropy px||0.9013||0.8710||0.8454||0.8033||0.9084||0.8718||0.8477||0.8299||0.7981||0.7601||0.7559||0.7012|
|Ours: DML NN Dist. px||0.9028||0.8706||0.8502||0.8041||0.9078||0.8884||0.8543||0.8397||0.7961||0.7489||0.7652||0.6840|
Iii-C Active Learning of Novel Classes
When deployed, a deep metric learning model trained on the distribution , observes new unlabelled data from a mixture of the distributions and . Let represent the set of feature embeddings from unlabelled observations, with . In addition to detecting observations belonging to , our system should select the most informative examples in for labelling by a user. Obtaining a label is referred to as a query. The selected examples should be those that allow the model to learn the most about , from the fewest queries. Metric spaces that transfer knowledge to novel examples enable such efficient label querying. Our proposed open set active learning approach is outlined in Algorithm 1.
We define as the labelling budget, that is, the number of labels that can be obtained from a user. Since may be significantly smaller than the total number of observations, it is important that examples are selected for querying based on an informativeness measure. This selection process is known as query selection. Once queried, labelled observations are included in the set of Gaussian kernel centres . The network is fine-tuned with the new labelled examples, as well as the original training examples, to ensure that knowledge about previously learned classes is not lost.
Observations should be selected for querying based on two criteria. The first is the novelty of the observation with respect to the labelled training examples. The second is the potential informativeness of a given observation to the set of all unlabelled observations. This criteria means that query selection should favour examples that are both in regions of high unlabelled example density and regions of low labelled example density. In other words, we should select feature embeddings that are far from labelled embeddings and nearby many unlabelled embeddings.
Our proposed approach to query selection, shown in Equation 6 (Line 18 of Algorithm 1), is the ratio of unlabelled density and labelled density. The value of is the same as in Equation 1. The next example selected for querying is that with the largest unlabelled to labelled density ratio (ULDR). When an example is labelled, it is removed from the set of unlabelled observations and included in the set of labelled examples , which includes the original training data. The ULDR query selection method ensures examples that are both far from labelled examples and in regions of high unlabelled density are favoured. This is important because a lone unlabelled observation is less informative than one with high unlabelled density. It is likely that a cluster of novel observations indicates the presence of a class that is outside of , but common in . Such observations should be the first to be queried.
As discussed in Section III-A, fast approximate nearest neighbour search and period asynchronous updates of the training embeddings can be utilised to make query selection and network fine-tuning scalable to large numbers of classes and training examples. Nearest neighbours are computed to classify an observation and can be used to consider only a local neighbourhood of training examples for the ULDR computation. These details are discussed in depth in .
Iv-a Experimental Set-up
We evaluate our deep metric learning approaches to open set recognition and active learning of novel classes on three datasets: Stanford Cars196 , Oxford Flowers102  and CUB Birds200 2011 . For each dataset, the first half of classes, that is, the first 98, 51 and 100 classes respectively, are taken as known classes (from ). The remaining half are taken as novel classes (from ). The datasets are split into training, observed and test sets. The training sets contain only known class images, while the observed and test sets contain an equal number of known and novel class images.
A VGG16  architecture is used for all experiments, as we find this network configuration performs well for transfer learning tasks. The second fully connected layer, FC7, is taken as the embedding layer. This produces a 4096 dimension feature embedding for a given input image, and therefore, a 4096 dimension metric space. The network is trained on the training set of known classes, following the methodology described in . Training data is augmented using random cropping and horizontal mirroring. A learning rate of 0.00001, weight decay of 0.0005, momentum of 0.9 and shared Gaussian kernel
of 91, 75 and 103 is used for Cars196, Flowers102 and Birds200, respectively. All hyperparameters are selected as described in.
|(a) Cars196, novel only.||(b) Flowers102, novel only.||(c) Birds200, novel only.|
|(d) Cars196, novel and known.||(e) Flowers102, novel and known.||(f) Birds200, novel and known.|
Iv-B Novelty Detection and Open Set Recognition Results
The network is first trained on the training set of known classes. The observed set, containing examples from both known and novel classes, is then used for the evaluation. We compare a deep metric learning (DML) approach with differing novelty measures (NN dist., density and entropy, as discussed in Section III-B) with the following baselines and state-of-the-art novelty detectors and open set classifiers:
Baseline  Max Pr.: A baseline softmax uncertainty novelty detector with maximum class probability thresholding.
Baseline  Entropy: A baseline softmax uncertainty novelty detector with Shannon entropy thresholding.
OpenMax : Open set version of softmax classification.
ODIN  Max Pr.: Out-of-distribution detector with maximum class probability thresholding.
ODIN  Entropy.: Out-of-distribution detector with Shannon entropy thresholding.
We evaluate novelty detection with the Area Under ROC Curve (AUROC) and Area Under Precision-Recall Curve (AUPR) measures, avoiding threshold selection. We further evaluate with a fixed threshold, analysing the novelty detection F-measure, i.e. , and open set recognition accuracy (), which is the standard classification accuracy with a single unknown/novel superclass for all observations from . Both our approach and  have only one tunable parameter (the threshold ), while  and  each have three. A withheld set of images is used to tune parameters such that the withheld set F-measure is maximised, as suggested in . Note that no parameter tuning is needed for the AUROC and AUPR measures for our approach or the softmax baseline . Since OpenMax  explicitly includes a novel pseudo-class probability, AUROC and AUPR measures cannot be computed. As such, we report only the F-measure and open set accuracy for this approach. Results are shown in Table I. Our approach outperforms the compared methods on all evaluation measures and datasets, in most cases by a significant margin.
|(a) No fine-tune on observed set.||(b) Pseudo-label fine-tune.||(c) AL with b = 10%.||(d) AL with b = 100%.|
Iv-C Open Set Active Learning Results
The network is first trained on the training set of known classes. Query selection is then carried out on the observed set, which contains examples from both known and novel classes. The network is then fine-tuned with the selected observed set examples, together with the original training set examples. In our experiments, labels are provided to the model automatically in response to a query. This simulates the process of a human user providing labels to a robot. We evaluate on the test set, containing unseen examples from both the original known class set and the novel class set. Classification accuracy on both the novel classes only and the combined novel and previously known classes is reported. Note that individual novel classes are used for the accuracy calculation in this section, not a single superclass, as in Section IV-B. The following approaches are compared:
Softmax w/ Uncert.: A conventional softmax approach with a typical query selection method based on classifier uncertainty. The observation with the largest Shannon entropy is queried.
DML w/ Random: Deep metric learning approach with random query selection.
Ours: DML w/ ULDR: Deep metric learning with our unlabelled to labelled density ratio (ULDR) query selection. The observation with the largest ULDR is queried.
Results are shown in Figure 2. Note that the metric learning model from  is used for all DML methods. Our experiments aim show two important points: that deep metric learning is better suited to open set active learning than softmax-based networks, and that our proposed ULDR approach to query selection is efficient and effective. The softmax-based approach is outperformed by even random query selection with a deep metric learning model. Our proposed ULDR query selection method significantly outperforms the compared approaches for small labelling budgets. This shows how our method would allow a robot to efficiently query a user for labels, minimising the number of queries needed to achieve a given performance and therefore minimising the required human effort. For labelling budgets of less than 10%, our approach outperforms the nearest compared method by an average of 16.3% on Cars196.
A t-SNE visualisation  of the novel class test set metric space is shown in Figure 3. Novel test classes are already quite well clustered before any learning has taken place with novel examples. This shows the transfer learning capabilities of deep metric learning that motivate our approach.
|Novel R@1||Novel R@2||Novel R@4||Novel R@8||Known Acc.|
Iv-D Improving Novel Class Representation with Zero Labelling Budget
We further investigate whether a model can improve its representation of observed novel classes with a labelling budget of zero. We use spatial relationships in the metric space to generate pseudo-labels for observed examples. In other words, the knowledge that deep metric spaces transfer well to novel classes is used to generate a training signal. We use -means , with -means++ initialisation , to obtain pseudo-labels for each observed example. The network is fine-tuned using the observed examples with pseudo-labels together with the original training examples. The value of is selected such that the Silhouette Score  is maximised, indicating that the cluster assignments are tight. A value of 240 is used for the Cars196 dataset. Since the true labels of the observed set are not known in this case, we evaluate how well the network has learned to represent the novel set of classes using a recall measure on the test set examples. Recall@m (R@m) is the fraction of test examples that have the same true class label as at least one of their nearest neighbours in the metric space. Experiments are run several times and the results are averaged.
Table II shows the Recall@m of the novel examples in the Cars196 test set. Note that the test metric space contains examples from both known and novel classes. The classification accuracy of the test set examples from known classes (Known Acc.) is also shown. The pseudo-label approach is compared to a model that has not been fine-tuned on the observed set (initial). This is the lower bound on performance. Active learning (AL) results are also included, with labelling budgets of 10% and 100%. The 100% labelling budget is the upper bound on performance, as the entire observed set is labelled. Figure 3 shows a t-SNE visualisation  of the novel test set metric space. Compared to the initial metric space, novel classes are better clustered. Interestingly, these results indicate that although no true labels are available for the observed set, there is merit in allowing observed examples to be pushed into a better region of the metric space. We do not expect known class accuracy to improve with this method, but importantly, it does not deteriorate (see final column of Table II).
In this paper, the suitability of deep metric learning to open-set robotic vision problems was investigated. We showed how a deep metric learning classification model is well suited to novelty detection and open set recognition. A novel approach to the active learning of previously unknown classes was also proposed. At small labelling budgets, our approach significantly outperforms comparable methods. This would allow a robotic vision system to efficiently and effectively extend its understanding of the environment beyond the original training distribution.
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” inAdvances in Neural Information Processing Systems, 2012, pp. 1097–1105.
-  K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” arXiv preprint arXiv:1409.1556, 2014.
-  C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in
-  K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778.
-  D. Hendrycks and K. Gimpel, “A baseline for detecting misclassified and out-of-distribution examples in neural networks,” arXiv preprint arXiv:1610.02136, 2016.
F. Schroff, D. Kalenichenko, and J. Philbin, “FaceNet: A unified embedding for face recognition and clustering,” in2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 815–823.
-  H. O. Song, Y. Xiang, S. Jegelka, and S. Savarese, “Deep Metric Learning via Lifted Structured Feature Embedding,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 4004–4012.
-  K. Sohn, “Improved Deep Metric Learning with Multi-class N-pair Loss Objective,” in Advances in Neural Information Processing Systems 29, 2016, pp. 1857–1865.
V. B. G. Kumar, G. Carneiro, and I. Reid, “Learning Local Image Descriptors with Deep Siamese and Triplet Convolutional Networks by Minimizing Global Loss Functions,” in2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 5385–5394.
-  H. O. Song, S. Jegelka, V. Rathod, and K. Murphy, “Learnable Structured Clustering Framework for Deep Metric Learning,” arXiv preprint arXiv:1612.01213, 2016.
-  V. B. G. Kumar, B. Harwood, G. Carneiro, I. Reid, and T. Drummond, “Smart Mining for Deep Metric Learning,” arXiv preprint arXiv:1704.01285, 2017.
-  B. J. Meyer, B. Harwood, and T. Drummond, “Deep metric learning and image classification with nearest neighbour gaussian kernels,” in International Conference on Image Processing, 2018.
-  M. A. F. Pimentel, D. A. Clifton, L. Clifton, and L. Tarassenko, “A review of novelty detection,” Signal Processing, vol. 99, pp. 215–249, 2014.
-  F. E. Grubbs, “Procedures for detecting outlying observations in samples,” Technometrics, vol. 11, no. 1, pp. 1–21, 1969.
-  C. C. Aggarwal and P. S. Yu, “Outlier detection with uncertain data,” in Proceedings of the 2008 SIAM International Conference on Data Mining. SIAM, 2008, pp. 483–493.
-  F. Angiulli and C. Pizzuti, “Fast Outlier Detection in High Dimensional Spaces,” in Principles of Data Mining and Knowledge Discovery, 2002, pp. 15–27.
-  V. Hautamaki, I. Karkkainen, and P. Franti, “Outlier detection using k-nearest neighbour graph,” in Proceedings of the 17th International Conference on Pattern Recognition, vol. 3, aug 2004, pp. 430–433 Vol.3.
-  C. P. Diehl and J. B. Hampshire, “Real-time object classification and novelty detection for collaborative video surveillance,” in Proceedings of the 2002 International Joint Conference on Neural Networks, vol. 3, 2002, pp. 2620–2625.
-  B. Schölkopf, R. C. Williamson, A. J. Smola, J. Shawe-Taylor, and J. C. Platt, “Support vector method for novelty detection,” in Advances in neural information processing systems, 2000, pp. 582–588.
-  Z. He, S. Deng, and X. Xu, “An Optimization Model for Outlier Detection in Categorical Data,” in Advances in Intelligent Computing, 2005, pp. 400–409.
-  S. Ando, “Clustering Needles in a Haystack: An Information Theoretic Analysis of Minority and Outlier Detection,” in Seventh IEEE International Conference on Data Mining (ICDM 2007), oct 2007, pp. 13–22.
V. Chandola, A. Banerjee, and V. Kumar, “Anomaly detection: A survey,”ACM computing surveys (CSUR), vol. 41, no. 3, p. 15, 2009.
-  M. Kliger and S. Fleishman, “Novelty Detection with GAN,” arXiv preprint arXiv:1802.10560, 2018.
-  A. Mandelbaum and D. Weinshall, “Distance-based Confidence Score for Neural Network Classifiers,” arXiv preprint arXiv:1709.09844, 2017.
-  A. Bendale and T. E. Boult, “Towards open set deep networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 1563–1572.
-  S. Liang, Y. Li, and R. Srikant, “Enhancing the reliability of out-of-distribution image detection in neural networks,” in International Conference on Learning Representations, 2018.
-  M. Masana, I. Ruiz, J. Serrat, J. van de Weijer, and A. M. Lopez, “Metric learning for novelty and anomaly detection,” arXiv preprint arXiv:1808.05492, 2018.
-  K. Lee, H. Lee, K. Lee, and J. Shin, “Training confidence-calibrated classifiers for detecting out-of-distribution samples,” in International Conference on Learning Representations, 2018.
-  D. D. Lewis and W. A. Gale, “A Sequential Algorithm for Training Text Classifiers,” in SIGIR ’94, 1994, pp. 3–12.
T. Scheffer, C. Decomain, and S. Wrobel, “Active Hidden Markov Models for Information Extraction,” inAdvances in Intelligent Data Analysis, 2001, pp. 309–318.
-  A. J. Joshi, F. Porikli, and N. Papanikolopoulos, “Multi-class active learning for image classification,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition, jun 2009, pp. 2372–2379.
-  B. Settles, M. Craven, and S. Ray, “Multiple-instance active learning,” in Advances in neural information processing systems, 2008, pp. 1289–1296.
N. Roy and A. McCallum, “Toward Optimal Active Learning Through Sampling
Estimation of Error Reduction,” in
Proceedings of the Eighteenth International Conference on Machine Learning, 2001, pp. 441–448.
X. Zhu, J. Lafferty, and Z. Ghahramani, “Combining active learning and semi-supervised learning using gaussian fields and harmonic functions,” inICML 2003 workshop on the continuum from labeled to unlabeled data in machine learning and data mining, vol. 3, 2003.
-  B. Settles, “Active Learning Literature Survey,” University of Wisconsin–Madison, Computer Sciences Technical Report 1648, 2009.
-  K. Wang, D. Zhang, Y. Li, R. Zhang, and L. Lin, “Cost-Effective Active Learning for Deep Image Classification,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 27, no. 12, pp. 2591–2600, 2017.
F. Stark, C. Hazirbas, R. Triebel, and D. Cremers, “CAPTCHA Recognition with Active Deep Learning,” inGCPR Workshop on New Challenges in Neural Computation, Aachen, Germany, 2015.
-  J.-J. Zhu and J. Bento, “Generative Adversarial Active Learning,” arXiv preprint arXiv:1702.07956, 2017.
-  M. Fang, Y. Li, and T. Cohn, “Learning how to Active Learn: A Deep Reinforcement Learning Approach,” in EMNLP, 2017.
-  O. Sener and S. Savarese, “Active Learning for Convolutional Neural Networks: A Core-Set Approach,” arXiv preprint arXiv:1708.00489, 2017.
-  O. Rippel, M. Paluri, P. Dollar, and L. Bourdev, “Metric learning with adaptive density discrimination,” International Conference on Learning Representations, 2016.
-  J. Krause, M. Stark, J. Deng, and L. Fei-Fei, “3d object representations for fine-grained categorization,” in Proceedings of the IEEE International Conference on Computer Vision Workshops, 2013, pp. 554–561.
-  M.-E. Nilsback and A. Zisserman, “Automated Flower Classification over a Large Number of Classes,” in Proceedings of the Indian Conference on Computer Vision, Graphics and Image Processing, 2008.
-  P. Welinder, S. Branson, T. Mita, C. Wah, F. Schroff, S. Belongie, and P. Perona, “Caltech-UCSD Birds 200,” California Institute of Technology, Tech. Rep. CNS-TR-2010-001, 2010.
-  L. van der Maaten and G. Hinton, “Visualizing data using t-SNE,” Journal of Machine Learning Research, vol. 9, no. Nov, pp. 2579–2605, 2008.
-  J. Wang, E. Sung, and W. Yau, “Active learning for solving the incomplete data problem in facial age classification by the furthest nearest-neighbor criterion,” IEEE Transactions on Image Processing, vol. 20, no. 7, pp. 2049–2062, 2011.
-  S. Lloyd, “Least squares quantization in PCM,” IEEE transactions on information theory, vol. 28, no. 2, pp. 129–137, 1982.
D. Arthur and S. Vassilvitskii, “K-means++: The Advantages of Careful Seeding,” inProceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, 2007, pp. 1027–1035.
P. J. Rousseeuw, “Silhouettes: a graphical aid to the interpretation and validation of cluster analysis,”Journal of computational and applied mathematics, vol. 20, pp. 53–65, 1987.