1 Introduction
Deep neural networks (DNNs) have proven to be successful on a variety of machine learning tasks. However, they are mostly fueled by large amounts of data. These DNNs can be trained using multiple objective functions that require labeled or unlabeled data. Towards this end, we face multiple data related challenges in training DNNs.
Firstly, most realworld datasets have a natural classimbalance. For instance, in the medical imaging domain, cancerous images are rare in comparison to noncancerous images. Secondly, obtaining labeled data is notoriously timeconsuming and expensive. This issue is highly pronounced in the biomedical domain where the annotators need to be well compensated since they are experts like doctors, radiologists, Thirdly, in many scenarios, even procuring unlabeled data is challenging. For example, acquiring few samples of medical data of a new disease is rare and involves several privacy constraints. Hence, it is critical to use the unlabeled data even when a few labeled data points are available.The above introduced data related issues are wellknown, and the community has devised several techniques to tackle these issues seperately. For mitigating labeling costs, active learning (AL) [2, 13, 11, 12, 23, 24] is an established paradigm that samples uncertain or diverse data points from an unlabeled set. The goal is to acquire a subset that entails the largest improvement in performance of the model. Another technique, called semisupervised learning (SSL) [16, 20, 15, 3, 26, 28] leverages the unlabeled data when only a small amount of labeled data is available. Lastly, several subset selection techniques that tackle class imbalance [12, 14, 13] have also been proposed. Evidently, these techniques revolve around the idea of obtaining the best possible model at a minimum cost. However, individually, each of these techniques suffer from limitations that the other ones do not. For example, existing AL and SSL techniques are known to suffer from class imbalance, thereby leading to learning biased models. Also, existing AL and subset selection methods do not leverage the remaining unlabeled data, and simply discard it. To bridge these gaps in existing methods, we propose Basil, a unified framework that actively samples data points perclass to create a balanced labeled set followed by SSL to make the most of the remaining unlabeled data.
1.1 Related work
Active Learning (AL). Uncertainty based methods aim to select the most uncertain data points for labeling. The most common technique is Entropy [24] that aims to select data points with maximum entropy. The main drawback of uncertainty based methods is that they lack diversity within the acquired subset. To mitigate this, a number of approaches have proposed to incorporate diversity. A recent approach called Badge [2] uses the last linear layer gradients to represent data points and runs Kmeans++ [1] to obtain centers that have a high gradient magnitude. The centers being representative and having high gradient magnitude ensures uncertainty and diversity at the same time. However, for batch active learning, this diversity and uncertainty are limited within the batch and not across all batches. Another method, BatchBald [11] requires a large number of Monte Carlo dropout samples to obtain significant mutual information which limits its application to medical domains where data is scarce. Recently, [12] proposed the use of submodular information measures for active learning in realistic scenarios, while [13] used them to find rare objects in an autonomous driving object detection problem. However, they focus on acquiring data points only from the rare classes or slices. Our proposed method maximizes the perclass mutual information, thereby selecting data points for each class to obtain a balanced labeled set, which is critical for training unbiased models.
Semisupervised Learning (SSL). The goal of SSL methods is to leverage unlabeled data alongside the labeled data to obtain a better representation of the dataset than supervised learning [21]. The most basic SSL method, pseudolabeling [16]
uses model predictions as target labels as a regularizer, and a standard supervised loss function for the unlabeled dataset. Some SSL methods such as
Model [15, 22] and Mean Teacher [26]use consistency regularization, by using data augmentation and dropout techniques. Mean Teacher obtains a more stable target output by using an exponential moving average of parameters across previous epochs. Virtual Adversarial Training (VAT)
[20] uses an effective regularization technique that involves slight perturbations such that the prediction of the unlabeled samples is affected the most. ICT [28]encourages the prediction at an interpolation of unlabeled points to be consistent with the interpolation of the predictions at those points. More recent techniques like FixMatch
[25], MixMatch [3] and UDA [29] use data augmentations like flip, rotation, and crops to predict pseudolabels. All the above methods depend on the model trained using a small labeled set. Hence, they are susceptible to using a biased model if the labeled set is randomly sampled from an unlabeled set with class imbalance. In this paper, we study the effect of selecting a balanced seed set using Basil for a wide array of SSL techniques.1.2 Our contributions
We summarize our contributions as follows: 1) We emphasize on the need of jointly addressing multiple realworld data related problems such as class imbalance, expensive labeling costs, and leveraging the unlabeled data. Particularly, we show that these problems coexist in the medical domain. 2) We propose Basil, a novel algorithm that can tackle these problems in an endtoend manner. Concretely, we acquire a balanced subset of the unlabeled data by maximizing perclass instantiations of submodular mutual information functions in an active learning loop followed by semisupervised learning (see Fig. 1). 3) Basil can leverage any SSL method and yield improved performance over the vanilla SSL approach. 4) We evaluate the effectiveness of Basil on two diverse modalities of medical data. Namely, histopathology (PathMNIST [9]) and Abdominal CT (OrganMNIST [10]). 5) We conduct rigorous experiments on 6 AL strategies and 8 SSL techniques, and show that balanced labeled set selection using Basil outperforms existing AL methods and obtains larger gain in performance for various SSL techniques (see Tab. 1 and Tab. 2).
2 Preliminaries
Submodular Functions: We let denote the groundset of data points and a set function . The function is submodular [4] if it satisfies the diminishing marginal returns, namely for all . Facility location, graph cut, log determinants, etc. are some examples [8].
Submodular Mutual Information (Smi): Given a set of items , the submodular mutual information (MI) [5, 7] is defined as . Intuitively, this measures the similarity between and and we refer to as the query set.
Kothawade et. al. [14] extend Smi to handle the case when the target can come from a different set apart from the ground set . In the context of imbalanced medical image classification, is the source set of images and the query set is the target set containing the rare class images. To find an optimal subset given a query set , we can define , and maximize the same.
2.1 Examples of Smi functions
For balanced subset selection via Basil we use the recently introduced Smi functions in [7, 5] and their extensions introduced in [14] as acquisition functions. Note that we only use a subset of functions presented in [14], that are the most scalable, for perclass selection of data points. For any two data points and , let denote the similarity between them.
Graph Cut MI (Gcmi): The Smi instantiation of graphcut (Gcmi) is defined as: . Since maximizing Gcmi maximizes the joint pairwise sum with the query set, it will lead to a summary similar to the query set . In fact, specific instantiations of Gcmi have been intuitively used for queryfocused summarization for videos [27] and documents [18, 17].
Facility Location MI (Flmi): We consider two variants of Flmi. In the first variant of facility location which is defined over (Flvmi), the Smi instantiation can be defined as: . The first term in the min(.) of Flvmi models diversity, and the second term models query relevance.
For the second variant, which is defined over (Flqmi), the Smi instantiation can be defined as: . Flqmi is very intuitive for query relevance as well. It measures the representation of data points that are the most relevant to the query set and vice versa. It can also be thought of as a bidirectional representation score.
3 Basil: Our Active Semisupervised Learning framework
In this section, we present Basil, a unified framework that jointly tackles data problems of class imbalance, high labeling costs and leveraging unlabeled data. We do so by gradually acquiring balanced subsets in an active learning loop, followed by training the final model using the balanced labeled set and the remaining unlabeled set (see Fig. 1).
The main idea in Basil is to maximize perclass instantiations of submodular mutual information (SMI) functions to obtain a balanced labeled set. Concretely, we formulate an SMI function using a query set that is a subset of the current labeled set with data points from class . The SMI functions are instantiated using a similarity kernel , where is the pairwise similarity between data points and , represented by gradients computed using . Specifically, we define , where is the labeled loss on the
th data point. Note that we use hypothesized labels, the label with maximum class probability, for computing gradients of
. Next, we optimize the SMI function for class using a greedy strategy [19] with a constraint that the budget is equally divided across all classes:(1) 
Hence, Basil gradually builds a balanced labeled set by accumulating smaller balanced sets in every AL round. Finally, we can use any SSL technique to train the model with the balanced labeled set and the remaining unlabeled set . We summarize our method in Algo. 1, illustrate the architecture in Fig. 1, and discuss its scalability in Appendix. 0.C.
4 Experiments
In this section, we evaluate the effectiveness of Basil on two modalities of medical data, viz., histopathology (see Sec. 4.1) using the PathMNIST dataset [9, 30] and Abdominal CT (see Sec. 4.2) using the OrganMNIST dataset [10, 30]. For evaluation, we compare the test accuracy of the model obtained after training via a semisupervised learning algorithm on a combination of labeled set (selected using active learning) and the unlabeled set. We also compare the imbalance ratio (IR) of all AL methods in Fig. 2. The IR is computed as follows: , where contains class indices of the rare classes and contains class indices of the remaining frequent classes. Note that IR()=1 when is perfectly balanced.
Our results show that selecting a balanced labeled set using Basil (see Fig. 2) outperforms existing AL baselines (supervised row in Tab. 1, 2). Importantly, we observe that, independent of the SSL algorithm used, using a balanced labeled set selected using Basil helps leverage the remaining imbalanced unlabeled set better.
Baseline AL methods and SSL techniques. We compare the performance of Basil against an uncertainty based AL method (Entropy), and a diversity based AL method (Badge). We discuss the details of all baselines in Sec. 1.1. Lastly, we compare with random sampling (Random). We evaluate the subset selected by Basil and the AL baselines on a wide array of SSL algorithms such as PsuedoLabel (PL), Mean Teacher (MT), Model, MixMatch (MM), ICT, VAT, VAT + Entropy Minimization (EM) on two medical imaging datasets which are described in Sec. 4.1 and Sec. 4.2.
Experimental setup:
We use the same training procedure and hyperparameters for all AL methods to ensure a fair comparison. For the first AL round, we randomly sample data points for labeling from the unlabeled set
. We do so in order to obtain meaningful model parameters for the AL acquisition functions in the next round of AL. For all experiments, we train a Wide ResNet (WRN) [6] model using an Adam optimizer with an initial learning rate of 3e4. For each AL round, the weights are reinitialized using Xavier initialization and the model is trained for 100K iterations. After obtaining the labeled set using AL, we reinitialize the WRN model and train it using SSL for 500K iterations. We run each experiment on a V100 GPU and provide the error bars (std deviation). We discuss dataset splits and hyperparameters below and provide more details in Appendix. 0.B.4.1 Analysis on Histopathology data
PathMNIST Dataset: PathMNIST [9] is a dataset based on a prior study for predicting survival from colorectal cancer histology slides. It includes a training dataset of 100,000 nonoverlapping image patches from hematoxylin and eosin stained histological images, and a test dataset of 7,180 image patches from a different clinical center. These patches are categorized into 9 types of tissues, resulting in a multiclass classification task. For our experiments, we use a preprocessed version of PathMNIST [30], where the original images are resized to 3 28 28. We consider a subset of 26K data points to create the initial imbalanced unlabeled set . The unlabeled set is imbalanced by randomly choosing 4 rare classes and randomly selecting 250 data points for each class. For the remaining 5 classes, we randomly select 5000 data points for each class.
SSL \ AL  Random  Badge  Entropy  Flqmi  Gcmi  Flvmi 

Supervised  69.371.33  76.560.4  70.541.76  78.510.58  78.761.84  77.680.58 
PL [16]  69.382.19  80.111.68  66.161.8  78.662.23  80.331.6  81.240.91 
ICT [28]  75.450.28  80.250.17  74.910.78  82.290.96  82.580.13  82.080.66 
Model [15]  64.590.19  78.631.03  67.290.73  80.621.72  80.591.19  80.770.71 
MT [26]  70.981.76  79.620.85  74.191.56  82.140.91  82.431.44  83.360.52 
MM [3]  67.131.78  67.771.43  67.010.44  73.781.22  76.080.09  73.421.54 
VAT [20]  80.271.95  83.051.72  77.491.8  85.721.83  83.70.37  84.380.81 
VAT+EM [20]  81.481.1  84.110.06  82.60.75  85.461.54  84.11.94  85.221.51 
Results: We present results for Active SSL on PathMNIST in Tab. 1. We observe that the SMI based AL acquisition functions outperform existing AL methods in the supervised and across all semisupervised learning methods. This is due to the fact that perclass selection using SMI functions in Basil results in a more balanced labeled set (see Fig. 2). This reinforces the need for a framework like Basil for training models in a supervised or semisupervised manner in class imbalance scenarios. Interestingly, we observe that the choice of SMI function depends on the modality of medical data and the SSL method. For histopathology, we see that PL, Model and MT methods perform the best when an acquisition function like Flvmi that balances between queryrelevance and diversity is used. For ICT and MM, Gcmi shows the best results, which models query relevance only. Whereas, on VAT and VAT+EM, Flqmi shows the best results which models queryrelevance and representation.
4.2 Analysis on Abdominal CT data
OrganMNIST Dataset: OrganMNIST [10] is a dataset based on 3D computed tomography (CT) images from Liver Tumor Segmentation Benchmark (LiTS). For our experiments, we use a preprocessed version of OrganMNIST [30], where images are cropped using boundingbox annotations of 11 body organs. The images are resized into 1 28 28 to perform multiclass classification of 11 body organs. We consider a subset of 21.6K data points to create the initial imbalanced unlabeled set . The unlabeled set is imbalanced by randomly choosing 4 classes and randomly selecting 150 data points for each class. For the remaining 5 classes, we randomly select 3000 data points for each class.
Results: We present results for Active SSL on OrganMNIST in Tab. 2. Similar to our results on PathMNIST, we observe that the SMI based AL acquisition functions outperform existing AL methods across all supervised and SSL methods. We observe that the facility location based SMI functions dominate for the Abdominal CT modality. Particularly, Flqmi which models queryrelevance and representation, outperforms other baselines and SMI functions for all SSL methods except VAT. The Flvmi functions performs slightly better than Flqmi when the SSL method is VAT or VAT+EM.
SSL \ AL  Random  Badge  Entropy  Flqmi  Gcmi  Flvmi 

Supervised  60.630.73  64.321.29  61.031.36  65.580.28  62.151.63  64.410.6 
PL [16]  62.831.79  64.431.47  64.270.16  67.681.43  64.481.3  65.560.35 
ICT [28]  61.10.57  60.291.52  64.011.27  65.320.86  60.331.37  62.221.24 
Model [15]  64.611.32  65.611.73  62.671.83  65.941.03  61.851.62  65.430.94 
MT [26]  64.640.63  66.491.18  65.530.34  66.890.01  60.810.63  64.810.54 
MM [3]  53.621.28  50.341.4  51.351.51  58.570.51  55.860.73  56.081.07 
VAT [20]  71.821.98  70.480.53  72.170.97  75.250.73  72.511.62  76.170.1 
VAT+EM [20]  72.670.28  73.520.22  71.950.43  75.571.12  73.171.73  76.580.69 
5 Conclusion
We demonstrate the effectiveness of a unifying algorithm like Basil for selecting a balanced labeled set in scenarios with class imbalanced data. Through rigorous experiments on diverse modalities of medical datasets, we show that Basil selects a more balanced labeled set than other AL acquisition functions, resulting in relatively unbiased models leading to better performance for supervised and semisupervised learning.
References
 [1] (2007) Kmeans++: the advantages of careful seeding. In SODA ’07: Proceedings of the eighteenth annual ACMSIAM symposium on Discrete algorithms, Philadelphia, PA, USA, pp. 1027–1035. External Links: ISBN 9780898716245 Cited by: §1.1.
 [2] (2020) Deep batch active learning by diverse, uncertain gradient lower bounds.. In ICLR, Cited by: §1.1, §1.
 [3] (2019) Mixmatch: a holistic approach to semisupervised learning. arXiv preprint arXiv:1905.02249. Cited by: §1.1, §1, Table 1, Table 2.
 [4] (2005) Submodular functions and optimization. Elsevier. Cited by: §2.
 [5] (2020) The online submodular cover problem. In ACMSIAM Symposium on Discrete Algorithms, Cited by: §2.1, §2.

[6]
(2016)
Deep residual learning for image recognition.
In
Proceedings of the IEEE conference on computer vision and pattern recognition
, pp. 770–778. Cited by: §4.  [7] (2020) Submodular combinatorial information measures with applications in machine learning. arXiv preprint arXiv:2006.15412. Cited by: §2.1, §2.
 [8] (2015) Submodular optimization and machine learning: theoretical results, unifying and scalable algorithms, and applications. Ph.D. Thesis. Cited by: §2.

[9]
(2019)
Predicting survival from colorectal cancer histology slides using deep learning: a retrospective multicenter study
. PLoS medicine 16 (1), pp. e1002730. Cited by: §1.2, §4.1, §4.  [10] (2018) Identifying medical diagnoses and treatable diseases by imagebased deep learning. Cell 172 (5), pp. 1122–1131. Cited by: §1.2, §4.2, §4.
 [11] (2019) Batchbald: efficient and diverse batch acquisition for deep bayesian active learning. arXiv preprint arXiv:1906.08158. Cited by: §1.1, §1.
 [12] (2021) Similar: submodular information measures based active learning in realistic scenarios. Advances in Neural Information Processing Systems 34. Cited by: §1.1, §1.
 [13] (2021) TALISMAN: targeted active learning for object detection with rare classes and slices using submodular mutual information. arXiv preprint arXiv:2112.00166. Cited by: §1.1, §1.
 [14] (2021) PRISM: a rich class of parameterized submodular information measures for guided subset selection. arXiv preprint arXiv:2103.00128. Cited by: §1, §2.1, §2.
 [15] (2016) Temporal ensembling for semisupervised learning. arXiv preprint arXiv:1610.02242. Cited by: §1.1, §1, Table 1, Table 2.
 [16] (2013) Pseudolabel: the simple and efficient semisupervised learning method for deep neural networks. In Workshop on challenges in LRtation learning, ICML, Vol. 3. Cited by: §1.1, §1, Table 1, Table 2.

[17]
(2012)
Multidocument summarization via submodularity
. Applied Intelligence 37 (3), pp. 420–430. Cited by: §2.1. 
[18]
(2012)
Submodularity in natural language processing: algorithms and applications
. Ph.D. Thesis. Cited by: §2.1. 
[19]
(2015)
Lazier than lazy greedy.
In
Proceedings of the AAAI Conference on Artificial Intelligence
, Vol. 29. Cited by: 1st item, Appendix 0.C, §3.  [20] (2018) Virtual adversarial training: a regularization method for supervised and semisupervised learning. IEEE transactions on pattern analysis and machine intelligence 41 (8), pp. 1979–1993. Cited by: §1.1, §1, Table 1, Table 2.
 [21] (2018) Realistic evaluation of deep semisupervised learning algorithms. arXiv preprint arXiv:1804.09170. Cited by: §1.1.
 [22] (2016) Regularization with stochastic transformations and perturbations for deep semisupervised learning. Advances in neural information processing systems 29, pp. 1163–1171. Cited by: §1.1.

[23]
(2018)
Active learning for convolutional neural networks: a coreset approach
. In International Conference on Learning Representations, Cited by: §1.  [24] (2009) Active learning literature survey. Technical report University of WisconsinMadison Department of Computer Sciences. Cited by: §1.1, §1.
 [25] (2020) Fixmatch: simplifying semisupervised learning with consistency and confidence. arXiv preprint arXiv:2001.07685. Cited by: §1.1.
 [26] (2017) Mean teachers are better role models: weightaveraged consistency targets improve semisupervised deep learning results. arXiv preprint arXiv:1703.01780. Cited by: §1.1, §1, Table 1, Table 2.

[27]
(2017)
Queryadaptive video summarization via qualityaware relevance estimation
. In Proceedings of the 25th ACM international conference on Multimedia, pp. 582–590. Cited by: §2.1.  [28] (2019) Interpolation consistency training for semisupervised learning. arXiv preprint arXiv:1903.03825. Cited by: §1.1, §1, Table 1, Table 2.
 [29] (2019) Unsupervised data augmentation for consistency training. arXiv preprint arXiv:1904.12848. Cited by: §1.1.
 [30] (2021) MedMNIST v2: a largescale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2008. Cited by: §4.1, §4.2, §4.
Supplementary Material
Appendix 0.A Summary of Notations
Topic  Notation  Explanation 

Unlabeled set of instances  
Basil (Sec. 3)  A subset of  
Similarity between any two data points and  
A submodular function  
Labeled set of data points  
Query set  
Query set containing data points from class for perclass SMI selection,  
Deep model  
Active learning selection budget  
Labeled Loss function used to train model and compute gradients  
Unlabeled Loss function used for semisupervised learning  
Pairwise similarity matrix computed using gradients  
Gradients of some subset 
Appendix 0.B Details of Datasets and Experimental setting
0.b.1 Datasets
In this section, we will describe details of the datasets PathMNIST (Histopathology data) and OrganMNIST (Abdominal CT data).
0.b.1.1 PathMNIST
A dataset based on a prior study for predicting survival from colorectal cancer histology slides, which provides 100,000 nonoverlapping image patches from hematoxylin and eosin stained histological images, and a test dataset of 7,180 image patches from a different clinical center. 9 types of tissues are involved, resulting a multiclass classification task. We resize the source images of 3 x 224 x 224 into 3 x 28 x 28.
Out of 100K images we take 5K images each from classes 0,1,4,6,8 and 250 images each from classes 2,3,5,7 and form a unlabeled train dataset of 26K images and for validation set we had 10 samples each from all classes thus forming a validation set of size 90. For the test dataset we used default one which consists of 7180 samples. For PathMNIST we had active learning selection budget() as 900.
0.b.1.2 OrganMNIST
A dataset based on 3D computed tomography (CT) images from Liver Tumor Segmentation Benchmark (LiTS). HounsfieldUnit (HU) of the 3D images are transformed into grey scale with a abdominal window; we then crop 2D images from the center slices of the 3D bounding boxes in axial views (planes). The images are resized into 1 x 28 x 28 to perform multiclass classification of 11 body organs.
There are total of 34581 training samples out of which we pick 3000 samples from classes 4,5,6,7,8,9,10 and 150 samples from classes 0,1,2,3 and form an unlabeled train dataset of 21.6K images. And for validation set we have selected 10 points each from a class to form a validation set of size 110. And the test set consists a total of around 17K images. For OrganMNIST we had active learning selection budget() as 990.
0.b.2 Experimental Setting
Given an unlabeled dataset and an active learning selection budget , we have to select points in each round of active learning. In the first round, we select points randomly from the unlabeled dataset and label those points. Then in further rounds we train the WideResNet model using an Adam optimizer with an initial learning rate of (same across different active learning methods to make better comparison) for 100K iterations using the labeled set formed till then and extract the gradients of remaining unlabeled dataset from model and use the gradients to perform perclass selection from SMI functions to get a Balanced Dataset.
After forming a labeled set of size , we perform SemiSupervised learning using labeled set and unlabeled dataset . We use many SSL algorithms to evaluate our selected balanced datasets such as PsuedoLabel, ICT, Model, MeanTeacher, MixMatch, VAT, VAT+EM. For each of these methods the loss component consists of two components, they are supervised loss (loss corresponding to labeled dataset) and SSL loss (loss corresponding to unlabeled dataset). Supervised loss is common to all, whereas SSL loss depends on the algorithm which we are using. We use the same parameters for one SSL method across different selections of active learning for better comparison.
0.b.2.1 Parameters used for each SSL method

Supervised: "lr":

PL: "threshold": 0.95, "lr": , "consistencycoefficient": 1

ICT: "ema_factor": 0.999, "lr": , "consistencycoefficient": 100, "alpha": 0.1

Model: "lr": , "consistencycoefficient": 20.0

MT: "ema_factor": 0.95, "lr": , "consistencycoefficient": 8

MM: "lr": , "consistencycoefficient": 100, "alpha": 0.75, "T": 0.5, "K": 2

VAT: "xi": , "lr": , "consistencycoefficient": 0.3, "eps": 6

VAT+EM: "xi": , "lr": , "consistencycoefficient": 0.3, "eps": 6, "em": 0.06
Details on the computation of Imbalance Ratio: The IR is computed as follows: , where contains class indices of the rare classes and contains class indices of the remaining frequent classes. For example, for a dataset with 5 total classes, assume it has 2 rare classes, , then size of . The remaining classes are frequent classes, and, . Further, denotes the set of data points that belong to the rare classes. Similarly, denotes the set of data points that belong to the frequent classes. Note that IR()=1 when is perfectly balanced.
Appendix 0.C Scalability of Basil
Below, we provide a detailed analysis of the complexity of creating and optimizing the different SMI functions. Denote as the size of set . Also, let (the ground set size, which is the size of the unlabeled set in this case).

Facility Location: We start with FLVMI. The complexity of creating the kernel matrix is . The complexity of optimizing it is (using memoization)^{1}^{1}1: Ignoring logfactors if we use the stochastic greedy algorithm [19] and with the naive greedy algorithm. The overall complexity is . For FLQMI, the cost of creating the kernel matrix is , and the cost of optimization is also (with naive greedy, it is ).

GraphCut: For GCMI, we require a kernel matrix, and the complexity of the stochastic greedy algorithm is also .
We end with a few comments. First, most of the complexity analysis above is with the stochastic greedy algorithm [19]. If we use the naive or lazy greedy algorithm, the worstcase complexity is a factor larger. Secondly, we ignore logfactors in the complexity of stochastic greedy since the complexity is actually , which achieves an approximation.