Online Open World Recognition

As we enter into the big data age and an avalanche of images have become readily available, recognition systems face the need to move from close, lab settings where the number of classes and training data are fixed, to dynamic scenarios where the number of categories to be recognized grows continuously over time, as well as new data providing useful information to update the system. Recent attempts, like the open world recognition framework, tried to inject dynamics into the system by detecting new unknown classes and adding them incrementally, while at the same time continuously updating the models for the known classes. incrementally adding new classes and detecting instances from unknown classes, while at the same time continuously updating the models for the known classes. In this paper we argue that to properly capture the intrinsic dynamic of open world recognition, it is necessary to add to these aspects (a) the incremental learning of the underlying metric, (b) the incremental estimate of confidence thresholds for the unknown classes, and (c) the use of local learning to precisely describe the space of classes. We extend three existing metric learning algorithms towards these goals by using online metric learning. Experimentally we validate our approach on two large-scale datasets in different learning scenarios. For all these scenarios our proposed methods outperform their non-online counterparts. We conclude that local and online learning is important to capture the full dynamics of open world recognition.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

page 6

08/26/2019

Multi-stage Deep Classifier Cascades for Open World Recognition

At present, object recognition studies are mostly conducted in a closed ...
12/18/2014

Towards Open World Recognition

With the of advent rich classification models and high computational pow...
10/28/2019

Moving Towards Open Set Incremental Learning: Readily Discovering New Authors

The classification of textual data often yields important information. M...
04/20/2020

Boosting Deep Open World Recognition by Clustering

While convolutional neural networks have brought significant advances in...
02/27/2019

The Importance of Metric Learning for Robotic Vision: Open Set Recognition and Active Learning

State-of-the-art deep neural network recognition systems are designed fo...
05/17/2021

Open-set Recognition based on the Combination of Deep Learning and Ensemble Method for Detecting Unknown Traffic Scenarios

An understanding and classification of driving scenarios are important f...
09/14/2017

Denoising Autoencoders for Overgeneralization in Neural Networks

Despite the recent developments that allowed neural networks to achieve ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Figure 1:

Our proposed online open world recognition workflow: as labeled data are presented continuously to the model, they are used to predict using the current classifiers, then to compute the accuracy, and finally to update the Mahalanobis metric, class centroids, bandwidths, and novelty thresholds incrementally. The resulting model is able to update continuously the internal representation of known classes, as well as detecting new one and adding them to the system on the fly.

The open world recognition framework has been introduced in 2015 by Bendale et al [2], as an attempt to move beyond the dominant classification methods assuming a static setting, where the number of training images is fixed as well as the number of classes that a model can handle. Its aim is to address the intrinsically dynamic nature of recognition in unconstrained settings, i.e. scenarios where it is not possible to predict a priori how many objects, and which, the system will have to recognize. This is true for robots equipped with cameras deployed in hospitals or public spaces, or automatic tagging systems that have to deal with dynamically growing datasets, and so forth.

Open world recognition systems differ from standard, static visual classification algorithms in three key features: (a) their ability to incrementally update the model of the known categories as new data arrives; (b) their ability to learn new categories, not seen initially during training, without the need to retrain the whole system from scratch, and (c) their ability to detect whether an incoming image depicts a known category, or if it is something new that needs to be learned. The requirement of adding new classes on the fly favours metric learning approaches (like -nearest neighbours and nearest class mean classifiers) over SVMs [2]. Several metric learning methods have been proposed so far, presenting some or all of these features [24, 27, 2]

. Still, all these methods estimate the used metric, and the threshold for novelty detection, on an initial closed set of classes, and keep the metric and threshold fixed as the problem evolves. This conflicts with the very same definition of open world recognition, where the structure of the problem is progressively revealed as more data are observed, and the optimal parameters are likely to change over time.

In this paper we argue that to properly model the dynamics of the challenging open world recognition scenario, it is necessary to learn online the metric and the novelty threshold as new instances and new classes arrive, rather than estimating them from an initial, closed set of classes as done so far [24, 27, 2]. This objective is similar as in online learning [31] and stream mining [12, 7]. Therefore we learn our classifiers online to incrementally update the model whenever new data is available, while at the same time being up-to-date for the predictions of both known (previously learned) classes and unknown classes (Figure 1). Experimentally our incremental metric learning approaches demonstrate that continuously updating the metric as new data and new classes arrive leads to a better performance for both closed set accuracy and open set accuracy. Furthermore, we introduce a method to incremental learn the threshold for novelty detection, which uses the current internal confidences of the classifier for the known classes. This continuously tuning of the rejection threshold shows better performance as new classes are added to the classifier compared to a fixed threshold as previously used in [2]. Our third contribution, is to introduce a non-linear local metric learning approach which adapts to the local complexity of the space with respect to the classes. Experimentally we show that this is especially beneficial in the open world recognition setting, since it is more flexible in modeling the border between known classes and unknown classes.

Our findings are general, and applicable to a large class of algorithms. We demonstrate this by proposing online and incremental learning extensions of three non-parametric methods: (i) the Nearest Class Mean classifier (NCM)  [24], previously used for incremental adding novel classes in [27]

; (ii) the Nearest Non-Outlier classifier (NNO) 

[2], which is an extension of NCM proposed for open world recognition; (iii) the Nearest Ball Classifier (NBC) [7], a local learning method incrementally adding balls (prototypes), and has been used in the streaming context before. For all three algorithms, experiments show that the proposed extensions lead to a sizable advantage.

0:  Initialise online performances, sample stream
1:  for  do
2:     Receive sample
3:     Predict ) using current models and metric
4:     Receive true label
5:     Output the online performances
6:     Update model and metric using , and the metric .
7:  end for
Algorithm 1 Open World Online Learning Template

2 Related Work

Our work is at the intersection of incremental and online learning, scalable learning, open set learning and open world recognition. In the following we will review previous work in the fields.

Incremental Learning.

There is a huge literature on incremental learning, such as various extensions of SVM [25, 39, 26]. However, incremental SVMs suffer from several drawbacks, among which the most important is the extremely expensive update [19]. There are some more efficient implementations [5, 32] but multi-class incremental learning does not permit the addition of new classes as well as other incremental classifiers [37, 21]. Kuzborskij et al [18] proposed a max-margin based approach for incremental learning of novel classes that exploited prior knowledge from previous classes, but the method had a conservative behavior, tending to privilege older classes with respect to the new one, performance-wise.

Scalable Learning.

The goal of scalable systems is to achieve a good trade-off between prediction efficiency at test time and classification accuracy. Among these methods, tree-based approaches [23, 22, 8] showed some success in addressing scalability at test-time on large scale visual recognition challenges [9, 3]

. Recently, these challenges have become dominated by deep learning methods 

[17, 34, 33]. Again, the main drawback of these approaches is the need of a priori knowledge of the categories and of the availability of the whole training data during the learning phase.

Open Set Learning.

Open set recognition considers the incompleteness about the knowledge of the world when learning a classifier, and the possible lack of knowledge of new classes during testing [20, 29]. Scheirer et al. [29] formulated the problem of open set recognition in a static one-vs-all setting balancing open space risk and empirical error. The setting was then extended [30, 14]

by introducing the compact abating probability model. This work offers robust methods to handle unseen classes. However, as it relies on the SVM decision scores, it does not scale. Fragoso et al. 

[11] proposed a scalable version for modeling the matching scores, but they do not contextualized it in a general recognition problem. A scalable incremental method on which we leverage on is the NCM classifier [24]. Recently, NCM has been adapted for larger scale vision problems [24, 35, 36, 27], with the most recent approaches combining NCM with metric learning [24]

and with random forests 

[27]. In contrast to the linear NCM classifier, the nearest ball classifier (NBC) [7] is a non-linear local classifier. This incremental learning method adapts to the problem by adding new balls (prototypes). The NBC classifier has been used for classification in data streams [7] and action recognition in videos [6]. To the best of our knowledge the NBC has not been applied with metric learning nor for the open set recognition setting of this paper.

Open World Recognition.

Bendale and Boult further extended the notion of open set recognition to include incremental and scalable learning, leading to a more comprehensive problem that they called “open world recognition” [2]. To address it, the NCM algorithm was coupled with a module to limiting the open space risk for model combinations and transformed spaces, resulting in a new model, the nearest-non outlier (NNO) described in Section 3.2.

3 Online Open World Recognition

In this section we introduce the online and incremental metric learning extension to three recent non-parametric classifiers. These classifiers will then be used within our open world online learning template, described in Algorithm 1, to predict the label of each incoming sample.

3.1 Closed Set Multi-Class Prediction

For the closed-set multi-class prediction we focus on Nearest Class Mean classifiers (NCM). They assign an instance to the class , where

is the set of possible classes, with the nearest mean vector

 [38]. Following [24, 2], we use a multi-class probabilistic interpretation of NCM, and define the probability for class as:

(1)

this is a soft-max function over the instance-to-class (squared) low-rank Mahalanobis distances , parameterized by :

(2)

where and are -dimensional vectors and , with acting as regularizer111Also related in literature as the intrinsic dimension of the space., which improves computational efficiency. Metric learning is used to find the best low-rank Mahalanobis distance, by optimizing the log-likelihood for correct classification over a training data-set:

(3)

Once a metric has been learned on a large set of classes, the obtained distance function has been shown to generalize for classifying novel classes [24]. However, all novel instances are used to set the class mean vectors, and the metric is not updated for those novel classes. In contrast, below we describe a method which learns incrementally both the class means and the metric .

Incremental learning.

In our scenario, the number of classes is unknown upfront and may change over time, therefore we learn the metric in an online fashion. Given an example , we update the NCM classifier as follows:

(4)
(5)

where denotes the number of instances to class (including the example of time step t) and is a fixed learning rate. Note that the initial mean of a class always equals to the first observation of that class: . The gradient of w.r.t. the model is given by:

(6)

where we use Iverson brackets to denote the indicator function. The matrix

is initialized by the truncated identity matrix, so it resembles the Euclidean distance. The metric update could be seen as a single step of stochastic gradient descent used in the large-scale closed set setting 

[24].

The NCM classifier is not designed to predict whether an instance is from an unknown class or from the set of known classes. To accommodate for novelty prediction, we next describe the Nearest Non-Outlier algorithm for the open world classification scenario.

Figure 2: Illustration of different learning settings. In closed-set recognition (left) the whole space is assigned to a specific class, while in open recognition (middle and right) classes have clear boundaries. Local learning (right) allows for more flexible class boundaries which are useful in the open world recognition setting.

3.2 Open World Classification

The Nearest-Non Outlier method is an extension of NCM for the open world scenario [2], where NCM is adjusted to define class boundaries, and instances beyond the class boundaries are assigned to the unknown class (Figure 2). Instead of using multi-class probability as defined in Eq. (1), in NNO the confidence score for class is given by:

(7)

where is a threshold value to determine a ball around each class mean, and is a normalization factor to assure that integrates to 1 on the domain (using the standard gamma function ). An example is rejected for class when , and assigned to the unknown class when it is rejected by all classes. In [2] the metric of NNO is learned offline on an initial set of known classes.

Incremental learning and rejection.

We extend NNO to allow for incremental learning of the metric and automatically tuning of the class-rejection threshold . We formulate the prediction confidence similarly to the RBF-Kernel:

(8)

This assigns a confidence value between to the sample at time step for class , using the current metric . The advantage of this RBF formulation is that the function is strictly bounded. Using Eq. (8) also reduces the open space risk as defined in [2], since it obeys to the abating property [30], given that the function value decreases in areas away from the observed training data. The bandwidth parameter is learned incrementally, using the expected value of distances to all class means (initialized with ):

(9)

The threshold parameter is used to determine that an instance does not belong to one of the known classes. We assign an instance to the unknown class if the confidence of the nearest class . We also learn incrementally from the data, as the mean of the confidence values observed since the last added novel class, and it is given by:

(10)

where is the current training sample and is the number of training samples since the last addition of a novel class. The value of can be seen as the expected value of the internal confidence associated with the observed training data.

For learning the means and the metric , we resort to the incremental NCM updates defined in Eqs. (4-5). A known limitation of the class mean models is the limited flexibility of the representation, which results in linear classifiers. In the next section we introduce a local learning approach which allows for non-linear classification.

3.3 Local Learning in the Open World

To achieve non-linearity through local learning, we use a nearest ball classifier, see Figure 2 (right), where balls are added incrementally and combine it with incremental metric learning. A ball is defined by its center and its radius . It has a local class probability , where is the number of (training) samples within this ball assigned to class y and is the total number of samples assigned to this ball. For predicting the class label of an example , the ball classifier uses the local class probability for the nearest ball , where is the current set of covering balls. To learn the set of balls we follow [7], which uses the distance (i.e. , and the identity matrix for ). During training, the sequence of observed training examples is used to incrementally build a set of balls that cover the region of the feature space they span. At time step , let denote the nearest ball of training example , then the updates are:

if


The example falls beyond the nearest ball and is used to create a new ball which is added to the current set of balls. This ball is initialized with:

(11)
(12)

the radius is set to the distance to the nearest current ball in order to span the full space between and . The label is used to initialize the local class probability .

otherwise


The example is considered to belong to the ball , and the local class probability is updated using . The mean and radius are updated depending on the predicted class label :

(13)
(14)

where is the intrinsic dimension of the space (which we fix to of the low-rank matrix in the experiments). The mean is updated using only correctly predicted samples . The radius is updated using the initial radius and a count of the number of errors made within this ball so far.

While this training procedure incrementally adds novel balls, it is not designed to predict unknown classes, and it uses the standard distance metric.

Novelty detection.

The ball classifier has two important local properties, the local class probability and the ball radius. The latter could be seen as an indicator of the local complexity in the feature space: if the feature space is locally smooth with respect to the class labels, the radius is likely to be large for this ball, while for a complex, non-smooth feature space the ball radius will be small. We combine these two properties for the estimation of the prediction confidence.

Given the nearest ball for the example , we estimate the prediction confidence as follows:

(15)

which combines the local class probability , with the RBF kernel estimate, where the local bandwidth is set to twice the radius of the ball . Intuitively, it assigns the highest confidence to the examples closer to a ball with a pure distribution. As opposed to global bandwidth in NNO, we use local bandwidths defined by the ball radii.

The threshold parameter , used to assign instances to the unknown class, is learned incrementally similar to Eq. (10), albeit only using samples which are assigned to ball (i.e. ), and using the confidence function Eq. (15). Since the NBC uses more class centroids (compared to NCM/NNO) the estimate converges slowly to the true value. To mitigate this problem, we use the Hoeffding bound [13], since we consider the input samples i.i.d and the confidence Eq. (15) is limited in , defined as:

(16)

where is the desired confidence level, which we set inversely proportional to the time and number of current classes , . This bound becomes closer to with increasingly more training examples, and less tight when the number of classes increase. For novelty prediction we assign an instance to the unknown class when .

Metric learning.

For learning the metric , we use a non-linear variant of the NCM classifier. We define the class probability of class as:

(17)

where denotes the set of balls which are assigned to class , for this assignment we use a majority vote, i.e. . At each time step we do a single SGD update of the metric w.r.t. the log-likelihood of this model, similar to Eq. (5).

This formulation is similar to the non-linear NCM variant proposed in [24]

, albeit they used a fixed number of centroids per class and k-means to determine these centroids

a priori. In contrast our method learns the number of balls, the number of balls per class and the centroids of each ball incrementally.

4 Experiments

In this section we validate our online metric learning approaches on three different validation scenarios. We show that all three proposed extensions, the online metric learning, the incremental updating of the thresholds, and the local ball classifier lead to better predictions on two different datasets. We will make available the used features, evaluation protocols and data upon publication.

4.1 Datasets

ImageNet ILSVRC’10 [3].

The first dataset we use is the subset of ImageNet used for the ILSVRC’10 challenge. It contains about 1.2M images for training (with

images per class), 50K images for validation and 150K images for testing. For this dataset we use densely sampled SIFT features clustered into visual words provided in [3]. Though more advanced features are available [28, 17, 34], this combination of dataset and features allow for fair comparison to the performance of NCM-Forests [27] and the original NNO [2] methods.

Places-2 [40].

The second dataset we consider is the recent Places-2 dataset, which contains over 10M images of 400 different scene types. The dataset features 5000 to 30,000 training images per class, consistent with real-world frequencies of occurrence. For this dataset, we use deep learning features by training a GoogLeNet style ConvNet [34]

on all 15K ImageNet classes which have more than 200 images using Caffe 

[15]. Subsequently we process the images of the Places-2 dataset and extract the final last 1024 dimensional layer as image representation.

method \  # of classes 50 100 200 500 1000
Baselines — results from  [27]
Multi-class SVM [1] 42 34 22 10 5
SVM-Forest [27] 47 38 29 19 14
NCM [24] 44 36 27 19 14
Incremental learning — results from  [27]
NCM-Fix metric 32 - - 9 6
NCM-Forest 41 - - 16 11
SVM-Forest 45 - - 19 14
Online learning — this paper
oNCM 42 37 32 24 19
oNBC 42 34 30 21 16
Table 1: Comparison on incremental learning on the ILSVRC’10 dataset, all using the same features. The bottom two rows show our proposed incremental metric learning approaches, the other results are taken from [27]. The two incremental metric learning algorithms clearly outperform other methods when the number of classes increases

4.2 Scenario 1: Large-Scale Incremental Learning

In this experiment we follow a large-scale incremental learning scenario as used by [27]. The experimental setup is as follows:

  • Parameters and metric (if relevant) are learned on an initial set of 20 classes;

  • Classes are incrementally added in batches of classes;

  • Performance is evaluated on the test set after , and classes.

We use the best performing incremental methods from [27] for comparison, specifically: NCM with initial metric, NCM-Forest, and SVM-Forest. We compare against three non-incremental baselines: multi-class SVMs [1], metric learning NCM [24], and SVM-Forest [27]

. We use our online oNCM and oNBC (without novelty detection) in this comparison. Our methods are learned incrementally from the start, while shuffling the data within each batch before learning. For the whitening of the features (to avoid numerical instabilities), we use the mean and standard deviation calculated on the initial set of 20 classes. Performance is measured using the Top-1 Accuracy, as commonly used on the ILSVRC dataset.

Results are shown in Table 1; we highlight two findings. First, we observe that among metric learning approaches, the NCM variants are on par with SVM approaches. Second, we notice that the performance for all algorithms decreases as the number of classes increases. This is to be expected, as the classification problem becomes harder as the number of classes grows. Still, the decrease is definitely more graceful when the metric is being learned incrementally, as for oNCM and oNBC. We believe this is mainly due to the incremental learning of the metric that leads to continuously adapting to the new classes, rather than relying only on the initial, limited knowledge of the problem.

Figure 3: Comparison of results on the open world recognition on the ILSVRC’10 dataset. The proposed incremental/online algorithms oNCM, oNNO, and oNBC clearly outperform their non incremental counterparts.
Figure 4: Surface plot of the proposed open world online metric learning methods. The local learning oNBC method clearly outperforms the other method when the number of unknown test classes increases.

4.3 Scenario 2: Open World Recognition

In this experiment we follow the open world protocol proposed in [2], where methods are tested on both known and unknown classes. The experimental setup is as follows:

  • Parameters and metric are learned on an initial set of 50 classes;

  • Images of 50 classes are added in each iteration;

  • Performance is evaluated on a test-set of known and unknown classes.

The open world performance is measured considering the unknown classes as a single new category. This allows us to calculate the standard multi-class 1-top accuracy , as in [2].

We compare our proposed methods against several baselines. First, we evaluate against a standard linear SVM [10] and the 1vSet SVM [29]. The latter is designed for open-set recognition, which allow to classify images of unknown classes; note that this method is not able to learn incrementally new classes. We also compare against NCM [24], NNO [2] and NBC [7], which all allow to adjust towards new classes in an incremental way. Of these three methods, only NNO is designed to assign images to an unknown class. NCM and NNO train their metric on the initial set, and NBC is using the metric with the incremental ball set construction.

We use our online oNCM, oNNO, and oNBC in this comparison, all trained incrementally from the start. Both oNNO and oNBC are able to assign images to unknown classes, while oNCM does not have this property.

To assess performance in the open world recognition setting one had to consider two variables: the number of known categories in incremental learning, and the number of unknown categories during testing. We visualize our results in Figure 3. On the left, we show the top-1 accuracy as the number of known training classes grows, in the case of 0 unknown classes. On the right, we show how the top-1 accuracy changes as the number of unknown test classes increases, for a fixed number of known classes (set to 50).

Our main observation is that our online approaches clearly outperform all the other in both the closed set and open world settings. The lack of rejecting images from unknown classes yield the almost random performance of the NCM method. Note that oNCB adapts to the classification problems and reject images from unknown classes, indeed prediction becomes easier when the number of unknown and known classes are unbalanced. In  Figure 4, we show a surface plot over a different range of known classes and unknown classes for our proposed online methods.

Figure 5: Results on ImageNet and Places-2 dataset

4.4 Scenario 3: Online Image Stream Prediction

In this experiment, we aim to simulate an online image stream prediction setting, for which we introduce a novel evaluation protocol. We believe it is a more realistic protocol, that permits to fully represents the dynamical behavior of the algorithm simultaneously during the updating and testing phases. The experimental setup we consider follows Algorithm 1, where we consider a stream of incoming images. At time the learner:

  1. Predicts the label for sample using the current models;

  2. Updates the online accuracy using and the ground-truth label ;

  3. Updates the current models using training tuple .

For practical reasons we generate the stream from 1200 images of each of the 200 most frequent classes from ILVRC’10 and Places-2, 100 classes as being known and 100 for the unknown classes. In this way the final number of instances for both close an open set classes is totally balanced. The data stream is generated as follows:

  1. The stream is divided in 40 stream-segments;

  2. The first 20 segments introduce 5 known and 5 unknown classes each;

  3. The learner is given 60 images per active class per segment;

  4. Any introduced class dries up after 20 segments;

  5. The number of images per segment varies, with a peak half-way;

  6. The online accuracy is recorded after each of the 40 stream-segments.

We believe this setting is interesting because the evaluated known and unknown classes evolve over time, both by increasing the number of classes as well as reducing the number of classes.

For evaluating the performance of the stream, we use the online accuracy [12]

of the harmonic mean (

, also known as the F-Score) between the closed set accuracy and open set accuracy as follows:

for and . We coin this method the online harmonic top-1 accuracy. This equally weights the performance of closed set accuracy and open set accuracy. Moreover, a method which performs well on one of the two accuracies and poorly on the other obtains a low harmonic mean, which is a favorable property.

For this experiment we use the NNO and NBC methods on the ILVRC’10 and Places2 dataset. The results of this experiment are presented in Figure 5, in which we compare oNNO and oNBC to variants using just an initial learned metric, these are learned in an online learning phase of 5 stream-segments (and indicated by NNO/NBC in the figure). In the top-row figures, we show the online harmonic accuracy, and once again the incremental metric learning methods oNNO and oNBC have a clear benefit over their (fixed) metric-learning counterparts. This becomes clearer when more images and classes are added in later stream segments. Moreover, the local learning NBC classifier can adjust more precisely to the added classes and therefore outperforms the linear NNO classifier. Notice that after no addition of new classes both methods start to gain performances as they are learning the already explored categories. Finally, the significant difference in performance between the ILVRC’10 and Places2 datasets, while using the same amount of classes and images, is likely to be due to the more powerful features used for the Places2 dataset.

In the bottom row figures of Figure 5, we show the mean of the confidence values assigned to the closed set (CC) and the open set (OC), together with the mean of the estimated thresholds (Thr) by our methods within each stream-segment. In order to achieve good performances (open and close), the threshold for rejecting an image into the unknown class should lie between the closed set and open set confidence. From the results, it can be observed that the open set and closed set confidence is almost identical for the oNNO classifier, therefore finding a good threshold value is almost impossible. For the oNBC method, where the confidence function and the estimated threshold depend on more local information, the open set and closed set confidences are well set apart. We remark that using a fixed threshold tuned on an initial set (as the literature methods do) can not lead to good performances as the confidence change over time.

5 Conclusions

In this paper we addressed the open world recognition problem and proposed three extensions to its current formulation: online metric learning, incremental updating of thresholds for novelty detection, and local learning through nearest ball classification. We evaluated the effect of these extensions over three different existing algorithms, NCM, NNO and NBC, and we assessed the effects of our extensions over three different experimental scenarios: large-scale incremental learning, open world recognition and online image stream prediction. This last setting is a new protocol for evaluation of online open world recognition, which we believe mimics better out-of-the-lab applications. For all the three scenarios, our proposed methods performed substantially better than the baselines, showcasing the importance of fully embracing online learning for open world recognition.

Future work will focus on studying the suitability of active learning in this scenario 

[16], where an interaction module has to balance the number of true label requests and the performance at any query rate. Another setting we will investigate will be the bandit one [4] where the learners can access to the labels only when they are making correct predictions.

References

  • [1] Z. Akata, F. Perronnin, Z. Harchaoui, and C. Schmid. Good practice in large-scale learning for image classification. IEEE Trans. PAMI, 2013.
  • [2] A. Bendale and T. Boult. Towards open world recognition. In CVPR, 2015.
  • [3] A. Berg, J. Deng, and L. Fei-Fei. The ImageNet large scale visual recognition challenge 2010-2015. http://www.image-net.org/challenges/LSVRC/2015, 2010.
  • [4] S. Bubeck and N. Cesa-Bianchi. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. arXiv preprint arXiv:1204.5721, 2012.
  • [5] K. Crammer, O. Dekel, J. Keshet, S. Shalev-Shwartz, and Y. Singer. Online passive-aggressive algorithms.

    Journal of Machine Learning Research

    , 7:551–585, 2006.
  • [6] R. De Rosa, N. Cesa-Bianchi, I. Gori, and F. Cuzzolin. Online action recognition via nonparametric incremental learning. In BMVC, 2014.
  • [7] R. De Rosa, F. Orabona, and N. Cesa-Bianchi. The ABACOC algorithm: a novel approach for nonparametric classification of data streams. In ICDM, 2015.
  • [8] J. Deng, S. Satheesh, A. C Berg, and F. Li. Fast and balanced: Efficient label tree learning for large scale object recognition. In NIPS, 2011.
  • [9] M. Everingham, L. Van Gool, C. KI Williams, John Winn, and A. Zisserman. The pascal visual object classes (voc) challenge. IJCV, 88(2):303–338, 2010.
  • [10] R. Fan, K. Chang, C. Hsieh, X. Wang, and C. Lin. Liblinear: A library for large linear classification. Journal of Machine Learning Research, 9:1871–1874, 2008.
  • [11] V. Fragoso, P. Sen, S. Rodriguez, and M. Turk. Evsac: accelerating hypotheses generation by modeling matching scores with extreme value theory. In ICCV, 2013.
  • [12] J. Gama, R. Sebastiao, and P. Rodrigues. On evaluating stream learning algorithms. Machine Learning, 2013.
  • [13] W. Hoeffding.

    Probability inequalities for sums of bounded random variables.

    Journal of the American statistical association, 58(301):13–30, 1963.
  • [14] L. P Jain, W. J Scheirer, and T. E Boult. Multi-class open set recognition using probability of inclusion. In ECCV, 2014.
  • [15] Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093, 2014.
  • [16] A. J Joshi, F. Porikli, and N. Papanikolopoulos. Multi-class active learning for image classification. In CVPR. IEEE, 2009.
  • [17] A. Krizhevsky, I. Sutskever, and G. E Hinton.

    Imagenet classification with deep convolutional neural networks.

    In NIPS, 2012.
  • [18] Ilja Kuzborskij, Francesco Orabona, and Barbara Caputo. From n to n+1: Multiclass transfer incremental learning. In

    The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    , June 2013.
  • [19] P. Laskov, C. Gehl, S. Krüger, and K. Müller. Incremental support vector learning: Analysis, implementation and applications. Journal of Machine Learning Research, 7:1909–1936, 2006.
  • [20] F. Li and H. Wechsler.

    Open set face recognition using transduction.

    IEEE Trans. PAMI, 27(11):1686–1697, 2005.
  • [21] L. Li and L. Fei-Fei. Optimol: automatic online picture collection via incremental model learning. IJCV, 88(2):147–168, 2010.
  • [22] B. Liu, M. Sadeghi, F.and Tappen, O. Shamir, and C. Liu. Probabilistic label trees for efficient large scale image classification. In CVPR, 2013.
  • [23] M. Marszałek and C. Schmid. Constructing category hierarchies for visual recognition. In ECCV, 2008.
  • [24] T. Mensink, J. Verbeek, F. Perronnin, and G. Csurka. Distance-based image classification: Generalizing to new classes at near-zero cost. IEEE Trans. PAMI, 35(11):2624–2637, 2013.
  • [25] T. Poggio.

    Incremental and decremental support vector machine learning.

    In NIPS, 2001.
  • [26] B. Pronobis, A.and Caputo. The more you learn, the less you store: memory–controlled incremental svm. Technical report, IDIAP, 2006.
  • [27] M. Ristin, M. Guillaumin, J. Gall, and L. van Gool. Incremental learning of ncm forests for large-scale image classification. In IEEE Trans. PAMI, 2016.
  • [28] J. Sánchez, F. Perronnin, T. Mensink, and J. Verbeek. Image classification with the Fisher vector: Theory and practice. IJCV, 2013.
  • [29] W. J Scheirer, A. de Rezende Rocha, A. Sapkota, and T. E. Boult. Toward open set recognition. IEEE Trans. PAMI, 35(7):1757–1772, 2013.
  • [30] W. J Scheirer, L. P Jain, and T. E Boult. Probability models for open set recognition. IEEE Trans. PAMI, 36(11):2317–2324, 2014.
  • [31] S. Shalev-Shwartz. Online learning and online convex optimization. Foundations and Trends in Machine Learning, 4(2), 2011.
  • [32] S. Shalev-Shwartz, Y. Singer, N. Srebro, and A. Cotter. Pegasos: Primal estimated sub-gradient solver for svm. Mathematical programming, 127(1):3–30, 2011.
  • [33] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. Technical report, arXiv preprint arXiv:1409.1556, 2014.
  • [34] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. Technical report, arXiv Preprint arxiv:1409.4842, 2014.
  • [35] C. Veenman and M. Reinders. The nearest subclass classifier: A compromise between the nearest mean and nearest neighbor classifier. IEEE Trans. PAMI, 27(9):1417–1429, 2005.
  • [36] C. Veenman and D. Tax. A weighted nearest mean classifier for sparse subspaces. In CVPR, 2005.
  • [37] Z. Wang, K. Crammer, and S. Vucetic. Multi-class pegasos on a budget. In ICML, 2010.
  • [38] A. R. Webb. Statistical pattern recognition. Wiley, New-York, NY, USA, 2002.
  • [39] T. Yeh and T. Darrell. Dynamic visual category learning. In CVPR, 2008.
  • [40] B. Zhou, A. Khosla, A. Lapedriza, A. Torralba, and A. Oliva.

    Places2: A large-scale database for scene understanding.

    Technical report, ArXiV preprint, 2015.