Deep Active Learning via Open Set Recognition

07/04/2020 ∙ by Jaya Krishna Mandivarapu, et al. ∙ Georgia State University 0

In many applications, data is easy to acquire but expensive and time consuming to label prominent examples include medical imaging and NLP. This disparity has only grown in recent years as our ability to collect data improves. Under these constraints, it makes sense to select only the most informative instances from the unlabeled pool and request an oracle (e.g a human expert) to provide labels for those samples. The goal of active learning is to infer the informativeness of unlabeled samples so as to minimize the number of requests to the oracle. Here, we formulate active learning as an open-set recognition problem. In this latter paradigm, only some of the inputs belong to known classes; the classifier must identify the rest as unknown.More specifically, we leverage variational neuralnetworks (VNNs), which produce high-confidence (i.e., low-entropy) predictions only for inputs that closely resemble the training data. We use the inverse of this confidence measure to select the samples that the oracle should label. Intuitively, unlabeled samples that the VNN is uncertain about are more informative for future training. We carried out an extensive evaluation of our novel, probabilistic formulation of active learning, achieving state-of-the-art results on CIFAR-10 andCIFAR-100. In addition, unlike current active learning methods, our algorithm can learn tasks with non i.i.d distribution, without the need for task labels. As our experiments show, when the unlabeled pool consists of a mixture of samples from multiple tasks, our approach can automatically distinguish between samples from seen vs. unseen tasks.



There are no comments yet.


page 10

page 11

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Supervised deep learning has achieved remarkable results across a variety of domains by leveraging large, labeled datasets

LeCun et al. (2015). However, our ability to collect data far outstrips our ability to label it, and this difference only continues to grow. This problem is especially stark in applications, such as medical imaging, where the ground truth must be provided by a highly trained specialist. Even in cases where labeled data is sufficient, there may be reasons to limit the amount of data used to train a model, e.g., time, financial constraints, or to minimize the model’s carbon footprint.

Fortunately, the relationship between a model’s performance and the amount of training data is not linear—there often exists a small subset of highly informative samples that can provide most of the information needed to learn to solve a task. In this case, we can achieve nearly the same performance by labeling (and training on) only those informative samples, rather than the entire dataset. The challenge, of course, is that the true usefulness of a sample can only be established a posteriori, after we have used it to train our model.

The growing field of active learning (AL) is concerned with automatically predicting which samples from an unlabeled dataset are most worth labeling.111As noted in Sinha et al. (2019), active learning can also refer to approaches that generate or synthesize novel samples. In this paper, however, we will only be concerned with sampling-based active learning. In the standard AL framework, a selector identifies an initial set of promising samples; these are then labeled by an oracle (e.g., a human expert) and used to train a task network Gal et al. (2017). The selector then progressively requests labels for additional batches of samples, up to either a percentage threshold (e.g., 40% of the total data) or until a performance target is met. In short, an active learning system seeks to construct the smallest possible training set which will produce the highest possible performance on the underlying task/s.

In this paper, we formulate active learning as an open-set recognition (OSR) problem, a generalization of the standard classification paradigm. In OSR only some of the inputs are from one of the known classes; the classifier must label the remaining inputs as out-of-distribution (OOD) or unknown. Intuitively, our hypothesis is that the samples most worth labeling are those that are most different from the currently labeled pool. Training on these samples will allow the network to learn features that are underrepresented in the existing training data. In short, our AL selection mechanism consists of picking unlabeled samples that are OOD relative to the labeled pool.

Figure 1 illustrates our proposed approach. In more detail, our classifier is a variational neural network (VNN) Mundt et al. (2019b), which produces high-confidence (i.e., low-entropy) outputs only for inputs that are highly similar to the training set. We use the inverse of this confidence measure to select which unlabeled samples to query next. In other words, our selector requests labels for the samples that the classifier is least confident about because this implies that the existing training set does not contain items that are similar to them. As we detail in Sec. 4, our OSR-based approach achieved state-of-the-art results in a number of datasets and AL variations, far surpassing existing methods.

The rest of this paper is organized as follows. In Sec. 2, we provide a brief overview of current active learning and open-set recognition methods. In Sec. 3, we present our proposed approach, then detail our experiments in Sec. 4. Finally, we discuss avenues for future work in Sec. 5.

Figure 1: Framework overview: Our proposed system has two models which we call M1 and M2 from now a) Encoder followed by a linear classifier (M1) b Encoder with optional Decoder Architecture (M2). Selected Model will be trained with initial labeled pool with K samples. Once the initial training stage finishes on labeled pool our selector will be given unlabeled pool and trained Model . Our sampling module will return top-k most informative samples that needs to be sent to oracle for annotation according to the budget size available. After the annotation those top k images will be removed from unlabeled pool , added to labeled pool .

2 Related Work

Recent approaches to the problem of Active Learning can be categorized as query-acquiring or query-synthesizing Sinha et al. (2019). The distinction lies in whether the unlabeled OOD samples are immediately accessible (pool-based), or are instead synthesized using a generative model Mahapatra et al. (2018); Mayer and Timofte (2018); Zhu and Bento (2017). Assuming access to a pool of unlabeled OOD data, a strategy must be devised which selects only the most useful or informative samples from that distribution. It has been routinely demonstrated that training samples do not contain equal amounts of useful information Settles (2010). In other words, some training distributions result in better task performance than others. Thus, the aim of an active learning system is to minimize the amount of training data required to achieve the highest possible performance on an underlying task, e.g. image classification. This is a form of learning efficiency, which we wish to improve, or maximize. As in Gal et al. (2017), we therefore wish to learn the acquisition function that chooses the data points for which a label should be requested. The learned acquisition function is, in essence, an intelligent sampling function, the aim of which is to outperform random iid sampling of the unlabeled distribution, thereby maximizing the learning efficiency of the system. As such, various sampling strategies have been proposed which can typically be grouped into three broad categories Sinha et al. (2019). They include uncertainty-based techniques, representation-based models Sener and Savarese (2017), and hybrid approaches Nguyen and Smeulders (2004).

Uncertainty calibration

Open Set Recognition (OSR), on the other hand, refers to the ability of a system to distinguish between data it has already seen (the training distribution), and data to which it has not yet been exposed (OOD data). Though OSR has been scrutinized for decades, recent progress has come about via careful design of the heuristics used quantify the similarity between historical and current data distributions Mundt et al. (2019a, b); Higgins et al. (2017). Since such measures have been shown to substantially improve OSR performance, and since OSR is an inherent necessity for Active Learning, it stands to reason that AL systems would benefit from the integration of such techniques. This is intuitive since it would be unhelpful for a system to request labels for data points which are nearly identical to those it has already seen. Redundancy, or excess similarity, in the training distribution can therefore be said to decrease the learning efficiency of the system. One of the most promising approaches to OSR incorporates ideas from Extreme Value Theory (EVT) in order to quantify the epistemic uncertainty of the model Mundt et al. (2019a).

Though seemingly complimentary, very little work has been done to merge the distinct fields of Active Learning and OSR. In this work, we explicitly merge the fields of OSR and AL by adopting the heuristics used in Mundt et al. (2019a) to quantify a model’s predictive uncertainty w.r.t. newly acquired unlabeled data, in order to infer the degree to which the data is likely to improve the performance of a classifier, if that data were to be integrated into the labeled training distribution. In the process, we demonstrate how EVT-inspired heuristics can assist in improving the learning efficiency of deep learning systems.

3 Methodology

As noted above, our active learning approach iteratively selects samples from an unlabeled pool based on the confidence level of its OSR classifier. Below, we first formalize the active learning paradigm we are tackling, then detail our proposed system. In particular, we provide an overview of VNNs and explain how we use their outputs to select new data points to label.

3.1 Formal problem definition

Let us now describe our active learning protocol while introducing few notations. Each active learning problem is denoted as P = (C, , ), it is dedicated to the classification of classes in a set C, coming with two sets of examples, the first one being used to infer a prediction model, and the second one, , being used to evaluate the inferred model where .

Let be a dataset consisting of i.i.d. data points where each sample is a

-dimensional feature vector and

represents the target label. The samples in are partitioned into two disjoint subsets: a labeled set and an unlabeled set . We denote the state of a subset at at a given iteration of our algorithm as (, resp), for . For simplicity, we assume that .

At , we randomly select data samples from and request the oracle to provide the labels for these points. We then remove the selected data samples from and add them to along with their labels. Finally, we train a classifier with parameters on the labeled pool . In each subsequent iteration, we use our OSR criterion (see Sec. 3.2) to select additional data samples from . We query the labels of these new samples, add them to , and train our classifier on all the labeled data. We continue this process until the size of reaches a predefined limit (40% in our experiments).

Importantly, unlike other formulations of AL which assumes access to task boundaries and an i.i.d distribution. This has clear limitations when the i.i.d assumption is not satisfied or when the task boundaries are not available. In our experimental setup we assume can contain training data from multiple tasks. In addition, we assume no task IDs. Our OSR selection criterion allows our system to learn multiple tasks without specifying the current task.

3.2 Active learning system

Our AL system (Fig. 1) has two main components: a variational neural network Mundt et al. (2019b), which serves as our classifier, and an entropy-based selection mechanism. We discuss each component below.

3.2.1 Variational Neural Networks (VNNs)

Variational neural networks (VNNs) Mundt et al. (2019b) are a supervised variant of

-variational autoencoders (

-VAE) Higgins et al. (2017). The latter is itself a variant of VAEs Doersch (2016) but with a regularized cost function. That is, the cost function for a -VAE consists of two terms: the reconstruction error, as with a regular VAE, and an entanglement penalty on the latent vector. This penalty forces the dimensions of the latent space to be as uncorrelated as possible, making them easier to interpret.

A VNN combines the encoder-decoder architecture of a -VAE with a probabilistic linear classifier (see Fig. 1

for a visual representation). As such, its loss function includes a classification error, i.e., a supervised signal, in addition to the reconstruction and entanglement terms:


As detailed in Mundt et al. (2019b), , , and are the parameters of the encoder, decoder, and classifier, resp, while and

are the reconstruction and classification terms. The last term is the entanglement penalty, which is given by the Kullback-Leibler divergence between the latent vector distribution and an isotropic Gaussian distribution.

In this work, we evaluated both the full framework discussed above (dubbed in our experiments), which uses the loss function in Eq. 1, and a simplified version () without the reconstruction error:


Following a variational formulation as shown in the Mundt et al. (2019b), the models and have natural means to capture epistemic uncertainty. As our experiments show, both versions outperform the state of the art, but achieves better results overall.

3.2.2 Sample Selection

Motivated by class disentanglement ability of the Eq. 1, we aim to select samples from the unlabeled pool . However, instead of using information about extreme distance values in the penultimate layer activations to modify a Softmax prediction’s confidence, we propose to employ the EVT based on the class conditional posterior. In this sense, any unlabeled sample will be regarded as a sample containing useful information if its distance to the classes latent means is extreme with respect to what has been observed for the majority of correctly predicted data instances, i.e., the sample falls into a region of low density under the aggregate posterior and is more likely to have information which is unknown to the neural network with parameters at this point.

So we define two sampling algorithms like below
1. Uncertainty Sampling : This is the conventional sampling method in which sample is picked from the unlabeled pool for which the underlying model is most uncertain. Model uncertainity can be measured in several ways. One approach is shown in (2), is to capture models predictions for any given input from the unlabeled pool . we compute the utility of the unlabeled instances in to collect number of informative samples where the utility is model uncertainity which captures most of the epistemic uncertainity. The selected informative samples are sent to the oracle to obtain its label

. Note that the closer the probability is to zero, the more likely it is that it model is very uncertain about that sample.

2.Wiebull Distribution Sampling : Our second sampling technique as shown in Algorithm (3) is based on heavy-tail weibull distribution, let us consider any selected stage of active learning lifecycle at which is the data which used for training the model and we first obtain each class mean latent vector for all correctly predicted seen data instances i.e., m=1,…, to construct a statistical meta-recognition model as shown in below equation 3 which quantifies all the per class latent means of the correctly classified training samples present in .


and the respective set of latent distances of correctly classified point to the all the means as as


where signifies the choice of distance metric. We proceed to fit a per class heavy-tail weibull distribution on for a given tail-size . As the distance are based on each of the individual class conditional approximate posterior, thus it bounds the latent space regions of such a high density. The tightness bounds is characterized through

which can been seen as prior belief with respect to outlier quantity present in the data inherently. The choice of

determines the dimensionlity of obtained distance distributions. For our experiments, we find that the cosine distance and thus a univariate Weibull distance distribution per class seems to be sufficient.

Using the cumulative distribution function of this Weibull model we can estimate the oulier probability of any given data sample using (

5). If the output probability of outlier is larger than our threshold probability ,the instance is considered to be an outlier as it is very far from all the known classes. Note that the closer the probability is to one, the more likely it is that model doesn’t seen the sample ever before. we set our threshold to be in range of between 0.5 to 0.8 as this will help in eliminating the total outliers and pick the ones which are most useful for the model.


3.3 Noisy Oracle and Non-i.i.d Setup:

When applying active learning to real world applications, human experts traditionally function as oracles to provide lables for the requested samples. When a user/model requests the labels for the selected data samples from the Oracle, the quality or accuracy of the labels depends on the expertness of the Oracle. However, human makes mistakes, hence these mistakes leads to noisy labels.

We consider both types of the oracle an ideal oracle which provides labels for requested samples with no error and a noisy oracle, which provides labels for the required images with some percentage of error or noise. This error might be occurring because of a lack of human expertise (oracle) or maybe oracle getting confused between similar classes of images as some classes causing ambiguity for the oracle. To create the same paradox here, we also applied noise to related classes.

Input: partioned into unlabeled pool , labeled pool for where .
Require: Active Learning Model, Optimizer, Sampling Strategy
Require: initialize (budget),

(Model parameters), Steps, Epochs

for idx,curr_stage in enumerate(Steps) do

      for  =1 to epochs do
            Sample from Labeled Pool
if selected Task Network is M1 then
                  calculate the loss L using Equation 1  
                  calculate the loss L using Equation 2 
             end if
            Update Model by descending gradients:
       end for
      if selected Sampling Technique is S1 then
            New_Labled_Pool( ) = Sampling Strategy(, unlabeled Pool , budget )(2)
            New_Labled_Pool() = Sampling Strategy(, labeled Pool , unlabeled Pool , budget ) (3)
       end if
end for
Algorithm 1 Active Learning
1:Input: budget b, Unlabeled Pool , Trained Task Model (TN), budget
2:Output: ,
3: Select samples with
5: Add Labeled data samples into labeled pool.
6: return New_Labled_Pool
Algorithm 2 Uncertainty Sampling
1:Input: budget b,Labeled Pool , Unlabeled Pool , Trained Task Model, budget
2:Output: ,
3:Step 1 : Calculate Classifier probabilities and samples from the approximate posterior for each data sample in labeled pool
4:Step 2 : For each class , let = for each correctly classified training example
5:Step 3: for  do
6:                   Get per class latent mean
       7:                    Weibull model Fit Weibull
end for
8: For a novel data sample ; sample
9: Compute distances to
10:for  do
       11:            Weibull CDF
end for
12:Select samples with
13: Add Labeled data samples into labeled pool.
14: return New_Labled_Pool
Algorithm 3 Weibull Sampling

4 Experimental Results

In all our experiments we start with a random acquisition of K samples from the unlabeled pool where K is an initial budget and set to 10% of training dataset . The acquired data samples are sent to oracle for labeling, the annonated data will be added to labeled pool which serves as training data for our model . The unlabeled pool consists of remaining data samples for which labels are unknown. After the model training , based on the utility estimates provided for each sample by the sampling function. we sample an additional b samples from the unlabeled pool where is budget and set to 5% of training dataset at each stage. Once the oracle had labeled the requested data samples, the newly labeled samples will be added to labeled pool , and model is trained using the new labeled pool . The annotation, training process continues until annotated the 40% of the training set . We assume both the cases where the unlabeled pool consists of samples from the same distributions from the training dataset and unknown dataset. We presume the oracle is perfect unless stated otherwise0.

We have evaluated our model on standard Image classification tasks such as CIFAR10, CIFAR 100 both with 60k images of size 32 by 32. We have measured the performance of both of our models by measuring the average accuracy over the 5 runs. We have trained our model at each step with 10%, 15%, 20%, 25%, 30%, 35%, 40% of annotated data out of the training set as it becomes available with labels provided by the oracle.

4.1 Implementation Details and Baselines

Baselines: We compare our method against various competitive methods including Variational Adversarial Active Learning (VAAL) Sinha et al. (2019), Core-Set Sener and Savarese (2017), Monte-Carlo Dropout Gal and Ghahramani (2016), and Ensembles using Vatiation Ratios (Ensembles w. VarR ) Freeman (1965) Beluch et al. (2018). We also showed the performance of deep Bayesian AL (DBAL) Gal et al. (2017) by following and performing sampling using their proposed approach Gal et al. (2017) and perform sampling using their proposed max entropy scheme to measure the uncertainty. We also show the results achieved using the uniform random sampling in which samples are picked from unlabeled pool using the random sampling methodology. This random sampling method still serves as competitive baseline in the field of active learning.

Implementation Details We used VGG16 Simonyan and Zisserman (2014), for our probabilistic model encoder and for our joint model’s encoder. We base our encoder as VGG16 and optional decoder architecture on 14-layer wide residual networks Higgins et al. (2017) Zagoruyko and Komodakis (2016), in the variational cases with a latent dimensionality of size 60. The encoder is followed by a classifier that consists of a single linear layer. We optimize all models using a mini-batch size of 128 using optimizers such as SGD, ADAM Kingma and Ba (2014) with a learning rate of 0.001 , weight decay value of . For the EVT based outlier rejection, we fit Weibull models with a tail-size set to 5 % of training data examples per class, and the distance metric used is cosine. Training continues for 200 epochs for all the datasets. The initial labeled pool size for all the experiments has been chosen to be 10 % of the training set , which is equivalent to 5000, 5000 for CIFAR 10 Krizhevsky et al. (2009), CIFAR100 Krizhevsky et al. (2009) for all the experiments. The budget side is set to 5 % of the training set, which is equivalent to 2500, 2500 for CIFAR 10, and CIFAR 100. For the clarity of nomenclature we define our models as below

    - Model M1 as shown in Eq. 2 with optimizer as SGD.
- Model M1 as shown in Eq.2 with optimizer as ADAM.
    - Model M2 as shown in Eq.1, with optimizer as SGD
- Model M2 as shown in Eq.1 with optimizer as ADAM.

4.2 Performance on image classification benchmarks

Fig.2 (a) (left) shows the performance of our model’s compared to competing existing methods. On CIFAR 10, our methods [] achieves a mean accuracy of [84.4%, 89.24%, 89.97%, 91.4%] by using 40% of the annotated data after 6 stages with a budget of 2500 per stage after the initial budget, whereas the baseline accuracy is 92.63% using the entire dataset, denoted as Top-1 accuracy in Fig.2 (a) (left) . As shown in Fig.2 (a) (left) the method which performs closest to our model’s is VAAL with accuracy of 80.71% , core-set with accuracy of 80.37% and Ensemble w VarR with accuracy of 79.465%. Mean accuracy of both of our models consistently evidently outperforms all the methods as shown in the Figure 2 (a)(left) including random sampling, DBAL and MC-Dropout. To test the effect of choice of optimizer we ran all of our models on both SGD and ADAM optimizer and found out that using ADAM as an optimizer with our Model 1 significantly outperforms our Model 1 with SGD.

To evaluate the scalability of our approach we evaluate our approach on CIFAR100 dataset with larger no of classes. The maximum achievable mean accuracy is 63.14% on CIFAR100 using 100% of the data denoted as Top-1 accuracy in Fig.2 (a) (right). While our models [] achieves a mean accuracy of [54.47%, 60.68%, 61.25%, 61.93%] over 5 runs by using 40% of the annotated data after 6 stages with a budget of 2500 per stage. As shown in Fig.2 (b)(right) other methods which performs closest to our model’s is VAAL with accuracy of 54.47 %, core-set, Ensemble w VarR with accuracy of and with accuracy of 46.78%. Moreover as shown in the figure Fig.2 (b)(right) the proposed methods [] can achieve the top performance of VAAL using 20% of the annotated training data itself and [] by using 30% of the annotated training data. The proposed models consistently outperform the existing baselines.

Figure 2: Performance on classification tasks using a) CIFAR10, b) CIFAR100 compared to VAAL Sinha et al. (2019), Core-set Sener and Savarese (2017), Ensembles w. VarR Beluch et al. (2018), MC-Dropout Gal and Ghahramani (2016), DBAL Gal et al. (2017), and Random Sampling. M1 indicates our model (2) and M2 indicates our model (1). All the Legend names are in descending order of final accuracies. Best visible in color. Data and code required to reproduce are provided in supplementary material.

4.3 Effect of biased initial pool

We investigated the performance of our models [] , where are initial labeled pool is biased. A good AL system is expected to discover data samples of unknown classes in an early stage. Intuitively, initial bias can affect the model training such that it causes the initially labeled samples to be not representative of the underlying data distribution by being inadequate to cover most of the regions in the latent space. We perform this by intentionally removing the data samples for classes. Such that the initial labeled pool won’t have any data sample belonging to these classes. We have performed our experimentation on CIFAR 100 for values of 10, 20 where randomly classes data is removed from the initial labeled pool where superscript 1 indicates the stage 1 to see how it affects the performance of the model. We compare it to the case where samples are randomly selected from all classes. As shown in the figure 6 our method is superior to VAAL, Core-set and random sampling in selecting informative samples from the classes that were underrepresented in the initial labeled set. As our models [] achieves an accuracy of [53.35%, 60.54%, 61.36%, 61.55%] for =20 and [54.72%, 60.79%, 61.53%, 61.57] for =10, whereas the closest method to oour approach is VAAL and Core-set with each having accuracies [46.91%, 46.55%] for =20 and [47.10%, 47.63%] for =20 and random sampling has achieved an accuracy of 45.33% for =10 and 45.87% for =20

4.4 Effect of budget size on performance

We repeated the experiements as described in the experiments section 4.1 to test effect of different budget sizes on our model compared to the most competitive baselines on CIFAR100. So for our experiments we tested our model on budget sizes of = 5% and =10%. As shown in the figure our model outperforms VAAL, Core-Set and Ensemble and random sampling, on both the budget sizes of = 5% and =10%. VAAL comes as the second best followed by the Core-set, Ensemble as shown in the figure’s 3. [] achieves an accuracy of [61.52%, 61.57%, 61.07%, 61.82%] for =10 and [54.32%, 60.68%, 61.29%, 61.9%] for =20 and the closest approaches fall in the range of 46 % accuracy.

Figure 3: Robustness of our approach on classification task CIFAR100 to (a) budget size (left), (b) biased initial labeled pool (right), with compared to VAAL Sinha et al. (2019), Core-set Sener and Savarese (2017) , Ensembles w. VarR Beluch et al. (2018), MC-Dropout Gal and Ghahramani (2016), DBAL Gal et al. (2017), and Random Sampling. M1 indicates our model (2) and M2 indicates our model (1). Best visible in color. Data and code required to reproduce are provided in our supplementary material.

4.5 Noisy Oracle and Noisy UnLabeled Pool

In this analysis we investigate the performance of our models [] in the presence of noisy data caused by an inaccurate oracle instead of ideal oracle. we assume similar to VAAL Sinha et al. (2019) setup that erroneous labels are due to the ambiguity between some classes and are not adversarial attacks. a coarse label (the super-class to which it belongs). As shown in the figure LABEL:fig:noisyoracle our models consistently overcomes the existing models like VAAL,Core-set which are independent of task learner. In the case of VAAL a separate VAE, Discriminator is used as part of their sampling strategy.

We also consider an extreme case of Active learning where the i.i.d assumption has been relaxed. We intentionally added 20% data which is equivalent to 10,000 images from other datasets to our existing Unlabeled pool, so our network should not only distinguish between the informative samples and non-informative samples but also given a task to distinguish the data samples from current distribution vs out of the distribution. Whenever models select wrong sample from the other dataset and send to oracle, the human expert discards the sample so it will have an effect on overall budget size and the discarded samples from other datasets are placed back in the unlabeled pool. Which means at any given stage the total no of images from other datasets are 10,000. As you can see at the end of graph in the FIg the increase in the accuracy is pretty less as unlabled pool with have higher impact on the sampling methodology. We have specifically used our sampling strategy 2 to handle this scenario.

Figure 4: Robustness of our approach on classification tasks to (a) Noisy Oracle on CIFAR100 (left), (b) Mixed Unlabeled pool on CIFAR10 (right),. M1 indicates our model (2) and M2 indicates our model (1). All the Legend names are in descending order of final accuracies (left). Best visible in color. Data and code required to reproduce are provided in our supplementary code.

4.6 Sampling Time Analysis

The sampling method in the active learning system plays a major role in time efficent training. We have compared our sampling time agnaist other baseline farmeworks. The analysis is done on CIFAR10 dataset using a single NVIDIA 1080 TI and overall time required for each model is shown in the table. As you can see our method have closer sampling times similar to VAAL, DBAL. We have slightly more than VAAL because we need to pass our latent vector through the linear classifier, incase of value the discriminator just output’s the probability itsel. But we do have better training time when compared to VALL because VAAL contains a classifier which is VGG16 similar to our, VAE , followed by a Discriminator and the VAE and discriminator are trained using min-max game approach and optimizing all of them togeather take far more time thatn optimizing a single model like in our case. MC-Dropout collects the uncertainty using multiple forward passes to measure the uncertainty from 10 dropout masks which leads to it’s increased sampling time.

Sampling Time Analysis
Method VAAL Sinha et al. (2019) Our Sampling Method 1 DBAL Gal et al. (2017) Our Sampling Method 2 Ensembles w. VarR Beluch et al. (2018) Core-set Sener and Savarese (2017) MC-Dropout Gal and Ghahramani (2016)
Sampling Time 10.69 10.89 11.05 20.41 20.48 75.33 83.65
Figure 5: Performance on classification tasks using CIFAR10, CIFAR100 compared to VAAL Sinha et al. (2019), Core-set Sener and Savarese (2017) , Ensembles w. VarR Beluch et al. (2018), MC-Dropout Gal and Ghahramani (2016), DBAL Gal et al. (2017), and Random Sampling. M1 indicates our model with Encoder and Classifier and M2 indicates our model with encoder-decoder and classifier Best visible in color. Data and code required to reproduce are provided in our code repository.
Figure 6: Performance on classification tasks using CIFAR10, CIFAR100 compared to VAAL Sinha et al. (2019), Core-set Sener and Savarese (2017) , Ensembles w. VarR Beluch et al. (2018), MC-Dropout Gal and Ghahramani (2016), DBAL Gal et al. (2017), and Random Sampling. M1 indicates our model with Encoder and Classifier and M2 indicates our model with encoder-decoder and classifier Best visible in color. Data and code required to reproduce are provided in our code repository.

5 Conclusions and Final work

We present a novel deep learning approach for active learning using the Open set recognition techniques. Extensive experiments conducted over the standard datasets to help verify the effectiveness of the proposed approach.


References follow the acknowledgments. Use unnumbered first-level heading for Sinha et al. (2019) the references. Any choice o Gal and Ghahramani (2016) consistent. It is permissible to reduce the font size to small (9 point) when listing the references. Note that the Reference section does not count towards the eight pages of content that are allowed.


  • [1] W. H. Beluch, T. Genewein, A. Nurnberger, and J. M. Kohler (2018) The power of ensembles for active learning in image classification. In

    2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Vol. , pp. 9368–9377. Cited by: Figure 2, Figure 3, Figure 5, Figure 6, §4.1, §4.6.
  • [2] C. Doersch (2016-06) Tutorial on Variational Autoencoders. arXiv e-prints, pp. arXiv:1606.05908. External Links: 1606.05908 Cited by: §3.2.1.
  • [3] L. C. Freeman (1965) Elementary applied statistics: for students in behavioral science. John Wiley & Sons. Cited by: §4.1.
  • [4] Y. Gal and Z. Ghahramani (2016) Dropout as a bayesian approximation: representing model uncertainty in deep learning. In

    international conference on machine learning

    pp. 1050–1059. Cited by: Figure 2, Figure 3, Figure 5, Figure 6, §4.1, §4.6, References.
  • [5] Y. Gal, R. Islam, and Z. Ghahramani (2017) Deep bayesian active learning with image data. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 1183–1192. Cited by: §1, §2, Figure 2, Figure 3, Figure 5, Figure 6, §4.1, §4.6.
  • [6] I. Higgins, L. Matthey, A. Pal, C. Burgess, X. Glorot, M. Botvinick, S. Mohamed, and A. Lerchner (2017) Beta-vae: learning basic visual concepts with a constrained variational framework.. ICLR 2 (5), pp. 6. Cited by: §2, §3.2.1, §4.1.
  • [7] D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §4.1.
  • [8] A. Krizhevsky, G. Hinton, et al. (2009) Learning multiple layers of features from tiny images. Cited by: §4.1.
  • [9] Y. LeCun, Y. Bengio, and G. Hinton (2015-05-01) Deep learning. Nature 521 (7553), pp. 436–444. External Links: ISSN 1476-4687, Document, Link Cited by: §1.
  • [10] D. Mahapatra, B. Bozorgtabar, J. Thiran, and M. Reyes (2018) Efficient active learning for image classification and segmentation using a sample selection and conditional generative adversarial network. CoRR abs/1806.05473. External Links: Link, 1806.05473 Cited by: §2.
  • [11] C. Mayer and R. Timofte (2018) Adversarial sampling for active learning. CoRR abs/1808.06671. External Links: Link, 1808.06671 Cited by: §2.
  • [12] M. Mundt, S. Majumder, I. Pliushch, and V. Ramesh (2019) Unified probabilistic deep continual learning through generative replay and open set recognition. CoRR abs/1905.12019. External Links: Link, 1905.12019 Cited by: §2, §2.
  • [13] M. Mundt, I. Pliushch, S. Majumder, and V. Ramesh (2019) Open set recognition through deep neural network uncertainty: does out-of-distribution detection require generative classifiers?. 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), pp. 753–757. Cited by: §1, §2, §3.2.1, §3.2.1, §3.2.1, §3.2.
  • [14] H. T. Nguyen and A. W. M. Smeulders (2004) Active learning using pre-clustering. In ICML, External Links: Link Cited by: §2.
  • [15] O. Sener and S. Savarese (2017)

    Active learning for convolutional neural networks: a core-set approach

    arXiv preprint arXiv:1708.00489. Cited by: §2, Figure 2, Figure 3, Figure 5, Figure 6, §4.1, §4.6.
  • [16] B. Settles (2010-07) Active learning literature survey. pp. . Cited by: §2.
  • [17] K. Simonyan and A. Zisserman (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. Cited by: §4.1.
  • [18] S. Sinha, S. Ebrahimi, and T. Darrell (2019) Variational adversarial active learning. In Proceedings of the IEEE International Conference on Computer Vision, pp. 5972–5981. Cited by: §2, Figure 2, Figure 3, Figure 5, Figure 6, §4.1, §4.5, §4.6, References, footnote 1.
  • [19] S. Zagoruyko and N. Komodakis (2016) Wide residual networks. arXiv preprint arXiv:1605.07146. Cited by: §4.1.
  • [20] J. Zhu and J. Bento (2017) Generative adversarial active learning. CoRR abs/1702.07956. External Links: Link, 1702.07956 Cited by: §2.