1 Introduction
The open world recognition framework has been introduced in 2015 by Bendale et al [2], as an attempt to move beyond the dominant classification methods assuming a static setting, where the number of training images is fixed as well as the number of classes that a model can handle. Its aim is to address the intrinsically dynamic nature of recognition in unconstrained settings, i.e. scenarios where it is not possible to predict a priori how many objects, and which, the system will have to recognize. This is true for robots equipped with cameras deployed in hospitals or public spaces, or automatic tagging systems that have to deal with dynamically growing datasets, and so forth.
Open world recognition systems differ from standard, static visual classification algorithms in three key features: (a) their ability to incrementally update the model of the known categories as new data arrives; (b) their ability to learn new categories, not seen initially during training, without the need to retrain the whole system from scratch, and (c) their ability to detect whether an incoming image depicts a known category, or if it is something new that needs to be learned. The requirement of adding new classes on the fly favours metric learning approaches (like nearest neighbours and nearest class mean classifiers) over SVMs [2]. Several metric learning methods have been proposed so far, presenting some or all of these features [24, 27, 2]
. Still, all these methods estimate the used metric, and the threshold for novelty detection, on an initial closed set of classes, and keep the metric and threshold fixed as the problem evolves. This conflicts with the very same definition of open world recognition, where the structure of the problem is progressively revealed as more data are observed, and the optimal parameters are likely to change over time.
In this paper we argue that to properly model the dynamics of the challenging open world recognition scenario, it is necessary to learn online the metric and the novelty threshold as new instances and new classes arrive, rather than estimating them from an initial, closed set of classes as done so far [24, 27, 2]. This objective is similar as in online learning [31] and stream mining [12, 7]. Therefore we learn our classifiers online to incrementally update the model whenever new data is available, while at the same time being uptodate for the predictions of both known (previously learned) classes and unknown classes (Figure 1). Experimentally our incremental metric learning approaches demonstrate that continuously updating the metric as new data and new classes arrive leads to a better performance for both closed set accuracy and open set accuracy. Furthermore, we introduce a method to incremental learn the threshold for novelty detection, which uses the current internal confidences of the classifier for the known classes. This continuously tuning of the rejection threshold shows better performance as new classes are added to the classifier compared to a fixed threshold as previously used in [2]. Our third contribution, is to introduce a nonlinear local metric learning approach which adapts to the local complexity of the space with respect to the classes. Experimentally we show that this is especially beneficial in the open world recognition setting, since it is more flexible in modeling the border between known classes and unknown classes.
Our findings are general, and applicable to a large class of algorithms. We demonstrate this by proposing online and incremental learning extensions of three nonparametric methods: (i) the Nearest Class Mean classifier (NCM) [24], previously used for incremental adding novel classes in [27]
; (ii) the Nearest NonOutlier classifier (NNO)
[2], which is an extension of NCM proposed for open world recognition; (iii) the Nearest Ball Classifier (NBC) [7], a local learning method incrementally adding balls (prototypes), and has been used in the streaming context before. For all three algorithms, experiments show that the proposed extensions lead to a sizable advantage.2 Related Work
Our work is at the intersection of incremental and online learning, scalable learning, open set learning and open world recognition. In the following we will review previous work in the fields.
Incremental Learning.
There is a huge literature on incremental learning, such as various extensions of SVM [25, 39, 26]. However, incremental SVMs suffer from several drawbacks, among which the most important is the extremely expensive update [19]. There are some more efficient implementations [5, 32] but multiclass incremental learning does not permit the addition of new classes as well as other incremental classifiers [37, 21]. Kuzborskij et al [18] proposed a maxmargin based approach for incremental learning of novel classes that exploited prior knowledge from previous classes, but the method had a conservative behavior, tending to privilege older classes with respect to the new one, performancewise.
Scalable Learning.
The goal of scalable systems is to achieve a good tradeoff between prediction efficiency at test time and classification accuracy. Among these methods, treebased approaches [23, 22, 8] showed some success in addressing scalability at testtime on large scale visual recognition challenges [9, 3]
. Recently, these challenges have become dominated by deep learning methods
[17, 34, 33]. Again, the main drawback of these approaches is the need of a priori knowledge of the categories and of the availability of the whole training data during the learning phase.Open Set Learning.
Open set recognition considers the incompleteness about the knowledge of the world when learning a classifier, and the possible lack of knowledge of new classes during testing [20, 29]. Scheirer et al. [29] formulated the problem of open set recognition in a static onevsall setting balancing open space risk and empirical error. The setting was then extended [30, 14]
by introducing the compact abating probability model. This work offers robust methods to handle unseen classes. However, as it relies on the SVM decision scores, it does not scale. Fragoso et al.
[11] proposed a scalable version for modeling the matching scores, but they do not contextualized it in a general recognition problem. A scalable incremental method on which we leverage on is the NCM classifier [24]. Recently, NCM has been adapted for larger scale vision problems [24, 35, 36, 27], with the most recent approaches combining NCM with metric learning [24]and with random forests
[27]. In contrast to the linear NCM classifier, the nearest ball classifier (NBC) [7] is a nonlinear local classifier. This incremental learning method adapts to the problem by adding new balls (prototypes). The NBC classifier has been used for classification in data streams [7] and action recognition in videos [6]. To the best of our knowledge the NBC has not been applied with metric learning nor for the open set recognition setting of this paper.Open World Recognition.
Bendale and Boult further extended the notion of open set recognition to include incremental and scalable learning, leading to a more comprehensive problem that they called “open world recognition” [2]. To address it, the NCM algorithm was coupled with a module to limiting the open space risk for model combinations and transformed spaces, resulting in a new model, the nearestnon outlier (NNO) described in Section 3.2.
3 Online Open World Recognition
In this section we introduce the online and incremental metric learning extension to three recent nonparametric classifiers. These classifiers will then be used within our open world online learning template, described in Algorithm 1, to predict the label of each incoming sample.
3.1 Closed Set MultiClass Prediction
For the closedset multiclass prediction we focus on Nearest Class Mean classifiers (NCM). They assign an instance to the class , where
is the set of possible classes, with the nearest mean vector
[38]. Following [24, 2], we use a multiclass probabilistic interpretation of NCM, and define the probability for class as:(1) 
this is a softmax function over the instancetoclass (squared) lowrank Mahalanobis distances , parameterized by :
(2) 
where and are dimensional vectors and , with acting as regularizer^{1}^{1}1Also related in literature as the intrinsic dimension of the space., which improves computational efficiency. Metric learning is used to find the best lowrank Mahalanobis distance, by optimizing the loglikelihood for correct classification over a training dataset:
(3) 
Once a metric has been learned on a large set of classes, the obtained distance function has been shown to generalize for classifying novel classes [24]. However, all novel instances are used to set the class mean vectors, and the metric is not updated for those novel classes. In contrast, below we describe a method which learns incrementally both the class means and the metric .
Incremental learning.
In our scenario, the number of classes is unknown upfront and may change over time, therefore we learn the metric in an online fashion. Given an example , we update the NCM classifier as follows:
(4)  
(5) 
where denotes the number of instances to class (including the example of time step t) and is a fixed learning rate. Note that the initial mean of a class always equals to the first observation of that class: . The gradient of w.r.t. the model is given by:
(6) 
where we use Iverson brackets to denote the indicator function. The matrix
is initialized by the truncated identity matrix, so it resembles the Euclidean distance. The metric update could be seen as a single step of stochastic gradient descent used in the largescale closed set setting
[24].The NCM classifier is not designed to predict whether an instance is from an unknown class or from the set of known classes. To accommodate for novelty prediction, we next describe the Nearest NonOutlier algorithm for the open world classification scenario.
3.2 Open World Classification
The NearestNon Outlier method is an extension of NCM for the open world scenario [2], where NCM is adjusted to define class boundaries, and instances beyond the class boundaries are assigned to the unknown class (Figure 2). Instead of using multiclass probability as defined in Eq. (1), in NNO the confidence score for class is given by:
(7) 
where is a threshold value to determine a ball around each class mean, and is a normalization factor to assure that integrates to 1 on the domain (using the standard gamma function ). An example is rejected for class when , and assigned to the unknown class when it is rejected by all classes. In [2] the metric of NNO is learned offline on an initial set of known classes.
Incremental learning and rejection.
We extend NNO to allow for incremental learning of the metric and automatically tuning of the classrejection threshold . We formulate the prediction confidence similarly to the RBFKernel:
(8) 
This assigns a confidence value between to the sample at time step for class , using the current metric . The advantage of this RBF formulation is that the function is strictly bounded. Using Eq. (8) also reduces the open space risk as defined in [2], since it obeys to the abating property [30], given that the function value decreases in areas away from the observed training data. The bandwidth parameter is learned incrementally, using the expected value of distances to all class means (initialized with ):
(9) 
The threshold parameter is used to determine that an instance does not belong to one of the known classes. We assign an instance to the unknown class if the confidence of the nearest class . We also learn incrementally from the data, as the mean of the confidence values observed since the last added novel class, and it is given by:
(10) 
where is the current training sample and is the number of training samples since the last addition of a novel class. The value of can be seen as the expected value of the internal confidence associated with the observed training data.
For learning the means and the metric , we resort to the incremental NCM updates defined in Eqs. (45). A known limitation of the class mean models is the limited flexibility of the representation, which results in linear classifiers. In the next section we introduce a local learning approach which allows for nonlinear classification.
3.3 Local Learning in the Open World
To achieve nonlinearity through local learning, we use a nearest ball classifier, see Figure 2 (right), where balls are added incrementally and combine it with incremental metric learning. A ball is defined by its center and its radius . It has a local class probability , where is the number of (training) samples within this ball assigned to class y and is the total number of samples assigned to this ball. For predicting the class label of an example , the ball classifier uses the local class probability for the nearest ball , where is the current set of covering balls. To learn the set of balls we follow [7], which uses the distance (i.e. , and the identity matrix for ). During training, the sequence of observed training examples is used to incrementally build a set of balls that cover the region of the feature space they span. At time step , let denote the nearest ball of training example , then the updates are:
 if

The example falls beyond the nearest ball and is used to create a new ball which is added to the current set of balls. This ball is initialized with:(11) (12) the radius is set to the distance to the nearest current ball in order to span the full space between and . The label is used to initialize the local class probability .
 otherwise

The example is considered to belong to the ball , and the local class probability is updated using . The mean and radius are updated depending on the predicted class label :(13) (14) where is the intrinsic dimension of the space (which we fix to of the lowrank matrix in the experiments). The mean is updated using only correctly predicted samples . The radius is updated using the initial radius and a count of the number of errors made within this ball so far.
While this training procedure incrementally adds novel balls, it is not designed to predict unknown classes, and it uses the standard distance metric.
Novelty detection.
The ball classifier has two important local properties, the local class probability and the ball radius. The latter could be seen as an indicator of the local complexity in the feature space: if the feature space is locally smooth with respect to the class labels, the radius is likely to be large for this ball, while for a complex, nonsmooth feature space the ball radius will be small. We combine these two properties for the estimation of the prediction confidence.
Given the nearest ball for the example , we estimate the prediction confidence as follows:
(15) 
which combines the local class probability , with the RBF kernel estimate, where the local bandwidth is set to twice the radius of the ball . Intuitively, it assigns the highest confidence to the examples closer to a ball with a pure distribution. As opposed to global bandwidth in NNO, we use local bandwidths defined by the ball radii.
The threshold parameter , used to assign instances to the unknown class, is learned incrementally similar to Eq. (10), albeit only using samples which are assigned to ball (i.e. ), and using the confidence function Eq. (15). Since the NBC uses more class centroids (compared to NCM/NNO) the estimate converges slowly to the true value. To mitigate this problem, we use the Hoeffding bound [13], since we consider the input samples i.i.d and the confidence Eq. (15) is limited in , defined as:
(16) 
where is the desired confidence level, which we set inversely proportional to the time and number of current classes , . This bound becomes closer to with increasingly more training examples, and less tight when the number of classes increase. For novelty prediction we assign an instance to the unknown class when .
Metric learning.
For learning the metric , we use a nonlinear variant of the NCM classifier. We define the class probability of class as:
(17) 
where denotes the set of balls which are assigned to class , for this assignment we use a majority vote, i.e. . At each time step we do a single SGD update of the metric w.r.t. the loglikelihood of this model, similar to Eq. (5).
This formulation is similar to the nonlinear NCM variant proposed in [24]
, albeit they used a fixed number of centroids per class and kmeans to determine these centroids
a priori. In contrast our method learns the number of balls, the number of balls per class and the centroids of each ball incrementally.4 Experiments
In this section we validate our online metric learning approaches on three different validation scenarios. We show that all three proposed extensions, the online metric learning, the incremental updating of the thresholds, and the local ball classifier lead to better predictions on two different datasets. We will make available the used features, evaluation protocols and data upon publication.
4.1 Datasets
ImageNet ILSVRC’10 [3].
The first dataset we use is the subset of ImageNet used for the ILSVRC’10 challenge. It contains about 1.2M images for training (with
images per class), 50K images for validation and 150K images for testing. For this dataset we use densely sampled SIFT features clustered into visual words provided in [3]. Though more advanced features are available [28, 17, 34], this combination of dataset and features allow for fair comparison to the performance of NCMForests [27] and the original NNO [2] methods.Places2 [40].
The second dataset we consider is the recent Places2 dataset, which contains over 10M images of 400 different scene types. The dataset features 5000 to 30,000 training images per class, consistent with realworld frequencies of occurrence. For this dataset, we use deep learning features by training a GoogLeNet style ConvNet [34]
on all 15K ImageNet classes which have more than 200 images using Caffe
[15]. Subsequently we process the images of the Places2 dataset and extract the final last 1024 dimensional layer as image representation.method \ # of classes  50  100  200  500  1000 

Baselines — results from [27]  
Multiclass SVM [1]  42  34  22  10  5 
SVMForest [27]  47  38  29  19  14 
NCM [24]  44  36  27  19  14 
Incremental learning — results from [27]  
NCMFix metric  32      9  6 
NCMForest  41      16  11 
SVMForest  45      19  14 
Online learning — this paper  
oNCM  42  37  32  24  19 
oNBC  42  34  30  21  16 
4.2 Scenario 1: LargeScale Incremental Learning
In this experiment we follow a largescale incremental learning scenario as used by [27]. The experimental setup is as follows:

Parameters and metric (if relevant) are learned on an initial set of 20 classes;

Classes are incrementally added in batches of classes;

Performance is evaluated on the test set after , and classes.
We use the best performing incremental methods from [27] for comparison, specifically: NCM with initial metric, NCMForest, and SVMForest. We compare against three nonincremental baselines: multiclass SVMs [1], metric learning NCM [24], and SVMForest [27]
. We use our online oNCM and oNBC (without novelty detection) in this comparison. Our methods are learned incrementally from the start, while shuffling the data within each batch before learning. For the whitening of the features (to avoid numerical instabilities), we use the mean and standard deviation calculated on the initial set of 20 classes. Performance is measured using the Top1 Accuracy, as commonly used on the ILSVRC dataset.
Results are shown in Table 1; we highlight two findings. First, we observe that among metric learning approaches, the NCM variants are on par with SVM approaches. Second, we notice that the performance for all algorithms decreases as the number of classes increases. This is to be expected, as the classification problem becomes harder as the number of classes grows. Still, the decrease is definitely more graceful when the metric is being learned incrementally, as for oNCM and oNBC. We believe this is mainly due to the incremental learning of the metric that leads to continuously adapting to the new classes, rather than relying only on the initial, limited knowledge of the problem.
4.3 Scenario 2: Open World Recognition
In this experiment we follow the open world protocol proposed in [2], where methods are tested on both known and unknown classes. The experimental setup is as follows:

Parameters and metric are learned on an initial set of 50 classes;

Images of 50 classes are added in each iteration;

Performance is evaluated on a testset of known and unknown classes.
The open world performance is measured considering the unknown classes as a single new category. This allows us to calculate the standard multiclass 1top accuracy , as in [2].
We compare our proposed methods against several baselines. First, we evaluate against a standard linear SVM [10] and the 1vSet SVM [29]. The latter is designed for openset recognition, which allow to classify images of unknown classes; note that this method is not able to learn incrementally new classes. We also compare against NCM [24], NNO [2] and NBC [7], which all allow to adjust towards new classes in an incremental way. Of these three methods, only NNO is designed to assign images to an unknown class. NCM and NNO train their metric on the initial set, and NBC is using the metric with the incremental ball set construction.
We use our online oNCM, oNNO, and oNBC in this comparison, all trained incrementally from the start. Both oNNO and oNBC are able to assign images to unknown classes, while oNCM does not have this property.
To assess performance in the open world recognition setting one had to consider two variables: the number of known categories in incremental learning, and the number of unknown categories during testing. We visualize our results in Figure 3. On the left, we show the top1 accuracy as the number of known training classes grows, in the case of 0 unknown classes. On the right, we show how the top1 accuracy changes as the number of unknown test classes increases, for a fixed number of known classes (set to 50).
Our main observation is that our online approaches clearly outperform all the other in both the closed set and open world settings. The lack of rejecting images from unknown classes yield the almost random performance of the NCM method. Note that oNCB adapts to the classification problems and reject images from unknown classes, indeed prediction becomes easier when the number of unknown and known classes are unbalanced. In Figure 4, we show a surface plot over a different range of known classes and unknown classes for our proposed online methods.
4.4 Scenario 3: Online Image Stream Prediction
In this experiment, we aim to simulate an online image stream prediction setting, for which we introduce a novel evaluation protocol. We believe it is a more realistic protocol, that permits to fully represents the dynamical behavior of the algorithm simultaneously during the updating and testing phases. The experimental setup we consider follows Algorithm 1, where we consider a stream of incoming images. At time the learner:

Predicts the label for sample using the current models;

Updates the online accuracy using and the groundtruth label ;

Updates the current models using training tuple .
For practical reasons we generate the stream from 1200 images of each of the 200 most frequent classes from ILVRC’10 and Places2, 100 classes as being known and 100 for the unknown classes. In this way the final number of instances for both close an open set classes is totally balanced. The data stream is generated as follows:

The stream is divided in 40 streamsegments;

The first 20 segments introduce 5 known and 5 unknown classes each;

The learner is given 60 images per active class per segment;

Any introduced class dries up after 20 segments;

The number of images per segment varies, with a peak halfway;

The online accuracy is recorded after each of the 40 streamsegments.
We believe this setting is interesting because the evaluated known and unknown classes evolve over time, both by increasing the number of classes as well as reducing the number of classes.
For evaluating the performance of the stream, we use the online accuracy [12]
of the harmonic mean (
, also known as the FScore) between the closed set accuracy and open set accuracy as follows:
for and . We coin this method the online harmonic top1 accuracy. This equally weights the performance of closed set accuracy and open set accuracy. Moreover, a method which performs well on one of the two accuracies and poorly on the other obtains a low harmonic mean, which is a favorable property.
For this experiment we use the NNO and NBC methods on the ILVRC’10 and Places2 dataset. The results of this experiment are presented in Figure 5, in which we compare oNNO and oNBC to variants using just an initial learned metric, these are learned in an online learning phase of 5 streamsegments (and indicated by NNO/NBC in the figure). In the toprow figures, we show the online harmonic accuracy, and once again the incremental metric learning methods oNNO and oNBC have a clear benefit over their (fixed) metriclearning counterparts. This becomes clearer when more images and classes are added in later stream segments. Moreover, the local learning NBC classifier can adjust more precisely to the added classes and therefore outperforms the linear NNO classifier. Notice that after no addition of new classes both methods start to gain performances as they are learning the already explored categories. Finally, the significant difference in performance between the ILVRC’10 and Places2 datasets, while using the same amount of classes and images, is likely to be due to the more powerful features used for the Places2 dataset.
In the bottom row figures of Figure 5, we show the mean of the confidence values assigned to the closed set (CC) and the open set (OC), together with the mean of the estimated thresholds (Thr) by our methods within each streamsegment. In order to achieve good performances (open and close), the threshold for rejecting an image into the unknown class should lie between the closed set and open set confidence. From the results, it can be observed that the open set and closed set confidence is almost identical for the oNNO classifier, therefore finding a good threshold value is almost impossible. For the oNBC method, where the confidence function and the estimated threshold depend on more local information, the open set and closed set confidences are well set apart. We remark that using a fixed threshold tuned on an initial set (as the literature methods do) can not lead to good performances as the confidence change over time.
5 Conclusions
In this paper we addressed the open world recognition problem and proposed three extensions to its current formulation: online metric learning, incremental updating of thresholds for novelty detection, and local learning through nearest ball classification. We evaluated the effect of these extensions over three different existing algorithms, NCM, NNO and NBC, and we assessed the effects of our extensions over three different experimental scenarios: largescale incremental learning, open world recognition and online image stream prediction. This last setting is a new protocol for evaluation of online open world recognition, which we believe mimics better outofthelab applications. For all the three scenarios, our proposed methods performed substantially better than the baselines, showcasing the importance of fully embracing online learning for open world recognition.
Future work will focus on studying the suitability of active learning in this scenario
[16], where an interaction module has to balance the number of true label requests and the performance at any query rate. Another setting we will investigate will be the bandit one [4] where the learners can access to the labels only when they are making correct predictions.References
 [1] Z. Akata, F. Perronnin, Z. Harchaoui, and C. Schmid. Good practice in largescale learning for image classification. IEEE Trans. PAMI, 2013.
 [2] A. Bendale and T. Boult. Towards open world recognition. In CVPR, 2015.
 [3] A. Berg, J. Deng, and L. FeiFei. The ImageNet large scale visual recognition challenge 20102015. http://www.imagenet.org/challenges/LSVRC/2015, 2010.
 [4] S. Bubeck and N. CesaBianchi. Regret analysis of stochastic and nonstochastic multiarmed bandit problems. arXiv preprint arXiv:1204.5721, 2012.

[5]
K. Crammer, O. Dekel, J. Keshet, S. ShalevShwartz, and Y. Singer.
Online passiveaggressive algorithms.
Journal of Machine Learning Research
, 7:551–585, 2006.  [6] R. De Rosa, N. CesaBianchi, I. Gori, and F. Cuzzolin. Online action recognition via nonparametric incremental learning. In BMVC, 2014.
 [7] R. De Rosa, F. Orabona, and N. CesaBianchi. The ABACOC algorithm: a novel approach for nonparametric classification of data streams. In ICDM, 2015.
 [8] J. Deng, S. Satheesh, A. C Berg, and F. Li. Fast and balanced: Efficient label tree learning for large scale object recognition. In NIPS, 2011.
 [9] M. Everingham, L. Van Gool, C. KI Williams, John Winn, and A. Zisserman. The pascal visual object classes (voc) challenge. IJCV, 88(2):303–338, 2010.
 [10] R. Fan, K. Chang, C. Hsieh, X. Wang, and C. Lin. Liblinear: A library for large linear classification. Journal of Machine Learning Research, 9:1871–1874, 2008.
 [11] V. Fragoso, P. Sen, S. Rodriguez, and M. Turk. Evsac: accelerating hypotheses generation by modeling matching scores with extreme value theory. In ICCV, 2013.
 [12] J. Gama, R. Sebastiao, and P. Rodrigues. On evaluating stream learning algorithms. Machine Learning, 2013.

[13]
W. Hoeffding.
Probability inequalities for sums of bounded random variables.
Journal of the American statistical association, 58(301):13–30, 1963.  [14] L. P Jain, W. J Scheirer, and T. E Boult. Multiclass open set recognition using probability of inclusion. In ECCV, 2014.
 [15] Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093, 2014.
 [16] A. J Joshi, F. Porikli, and N. Papanikolopoulos. Multiclass active learning for image classification. In CVPR. IEEE, 2009.

[17]
A. Krizhevsky, I. Sutskever, and G. E Hinton.
Imagenet classification with deep convolutional neural networks.
In NIPS, 2012. 
[18]
Ilja Kuzborskij, Francesco Orabona, and Barbara Caputo.
From n to n+1: Multiclass transfer incremental learning.
In
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
, June 2013.  [19] P. Laskov, C. Gehl, S. Krüger, and K. Müller. Incremental support vector learning: Analysis, implementation and applications. Journal of Machine Learning Research, 7:1909–1936, 2006.

[20]
F. Li and H. Wechsler.
Open set face recognition using transduction.
IEEE Trans. PAMI, 27(11):1686–1697, 2005.  [21] L. Li and L. FeiFei. Optimol: automatic online picture collection via incremental model learning. IJCV, 88(2):147–168, 2010.
 [22] B. Liu, M. Sadeghi, F.and Tappen, O. Shamir, and C. Liu. Probabilistic label trees for efficient large scale image classification. In CVPR, 2013.
 [23] M. Marszałek and C. Schmid. Constructing category hierarchies for visual recognition. In ECCV, 2008.
 [24] T. Mensink, J. Verbeek, F. Perronnin, and G. Csurka. Distancebased image classification: Generalizing to new classes at nearzero cost. IEEE Trans. PAMI, 35(11):2624–2637, 2013.

[25]
T. Poggio.
Incremental and decremental support vector machine learning.
In NIPS, 2001.  [26] B. Pronobis, A.and Caputo. The more you learn, the less you store: memory–controlled incremental svm. Technical report, IDIAP, 2006.
 [27] M. Ristin, M. Guillaumin, J. Gall, and L. van Gool. Incremental learning of ncm forests for largescale image classification. In IEEE Trans. PAMI, 2016.
 [28] J. Sánchez, F. Perronnin, T. Mensink, and J. Verbeek. Image classification with the Fisher vector: Theory and practice. IJCV, 2013.
 [29] W. J Scheirer, A. de Rezende Rocha, A. Sapkota, and T. E. Boult. Toward open set recognition. IEEE Trans. PAMI, 35(7):1757–1772, 2013.
 [30] W. J Scheirer, L. P Jain, and T. E Boult. Probability models for open set recognition. IEEE Trans. PAMI, 36(11):2317–2324, 2014.
 [31] S. ShalevShwartz. Online learning and online convex optimization. Foundations and Trends in Machine Learning, 4(2), 2011.
 [32] S. ShalevShwartz, Y. Singer, N. Srebro, and A. Cotter. Pegasos: Primal estimated subgradient solver for svm. Mathematical programming, 127(1):3–30, 2011.
 [33] K. Simonyan and A. Zisserman. Very deep convolutional networks for largescale image recognition. Technical report, arXiv preprint arXiv:1409.1556, 2014.
 [34] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. Technical report, arXiv Preprint arxiv:1409.4842, 2014.
 [35] C. Veenman and M. Reinders. The nearest subclass classifier: A compromise between the nearest mean and nearest neighbor classifier. IEEE Trans. PAMI, 27(9):1417–1429, 2005.
 [36] C. Veenman and D. Tax. A weighted nearest mean classifier for sparse subspaces. In CVPR, 2005.
 [37] Z. Wang, K. Crammer, and S. Vucetic. Multiclass pegasos on a budget. In ICML, 2010.
 [38] A. R. Webb. Statistical pattern recognition. Wiley, NewYork, NY, USA, 2002.
 [39] T. Yeh and T. Darrell. Dynamic visual category learning. In CVPR, 2008.

[40]
B. Zhou, A. Khosla, A. Lapedriza, A. Torralba, and A. Oliva.
Places2: A largescale database for scene understanding.
Technical report, ArXiV preprint, 2015.
Comments
There are no comments yet.