Log In Sign Up

Federated Active Learning (F-AL): an Efficient Annotation Strategy for Federated Learning

by   Jin-Hyun Ahn, et al.

Federated learning (FL) has been intensively investigated in terms of communication efficiency, privacy, and fairness. However, efficient annotation, which is a pain point in real-world FL applications, is less studied. In this project, we propose to apply active learning (AL) and sampling strategy into the FL framework to reduce the annotation workload. We expect that the AL and FL can improve the performance of each other complementarily. In our proposed federated active learning (F-AL) method, the clients collaboratively implement the AL to obtain the instances which are considered as informative to FL in a distributed optimization manner. We compare the test accuracies of the global FL models using the conventional random sampling strategy, client-level separate AL (S-AL), and the proposed F-AL. We empirically demonstrate that the F-AL outperforms baseline methods in image classification tasks.


page 1

page 2

page 3

page 4


Group Personalized Federated Learning

Federated learning (FL) can help promote data privacy by training a shar...

Knowledge-Aware Federated Active Learning with Non-IID Data

Federated learning enables multiple decentralized clients to learn colla...

Cartography Active Learning

We propose Cartography Active Learning (CAL), a novel Active Learning (A...

VAFL: a Method of Vertical Asynchronous Federated Learning

Horizontal Federated learning (FL) handles multi-client data that share ...

Federated Learning over Coupled Graphs

Graphs are widely used to represent the relations among entities. When o...

A Multi-agent Reinforcement Learning Approach for Efficient Client Selection in Federated Learning

Federated learning (FL) is a training technique that enables client devi...

Towards Multi-Objective Statistically Fair Federated Learning

Federated Learning (FL) has emerged as a result of data ownership and pr...

1 Introduction

Federated learning (FL) (McMahan et al., 2017) enables the collaborative training from datasets residing on distributed clients with the help of a parameter server. In numerous previous works, including (Smith et al., 2017; Li et al., 2020b; Sattler et al., 2019; Li et al., 2019), the superiority of FL has been validated through numerical results and convergence analysis in independently identically distributed (IID) and non-IID datasets. While the recent literature related to FL primarily addresses communication efficiency, fairness, robustness, privacy of FL, and personalization, almost all of the previous works have assumed that the training datasets at clients are perfectly ready to be used for training.

However, the annotation step should not be overlooked or ignored for the practical implementations of FL, like other machine learning (ML), since the cost for labeling is generally high and might be even dominant over the FL itself. Considering this problem, we study the annotation strategies in the FL framework, where the clients participating in FL should label their datasets prior to FL execution. For the annotations, we apply active learning (AL)

(Settles, 2009) at each client participating in FL. Because labeling all the instances is rarely a practical or cost-effective, AL aims to maximize the model’s performance based on the fewest samples by selectively sampling and labeling the most informative instances.

To validate the proposed method, we establish a FL framework with the annotation step, where various active learning strategies in the FL are compared: 1) conventional FL with random sampling, 2) client-level separated active learning (S-AL), and 3) the proposed federated active learning (F-AL). In the F-AL, the clients collaboratively execute the AL to select the instances that are considered informative to FL in a distributed optimization manner. For the S-AL and F-AL, the state-of-the-art AL algorithms are incorporated.

The AL certainly outperforms the random sampling in the centralized learning. However, to the best of the authors’ knowledge, there has been no work for considering the AL in the FL framework and investigating the effect of AL on the performance of FL. This work demonstrates that AL can surprisingly reduce the cost of labeling for FL, and the cost-saving is fascinating in the FL environments. Furthermore, we show that the proposed F-AL considerably improves the performance of AL in the FL environment. We summarize our contributions below:

  • We establish a general FL framework combining with the annotation step. We evaluate the three types of methods: conventional FL with random sampling, S-AL, and F-AL. With the S-AL, the clients independently apply AL in their datasets. The F-AL encourages the clients’ collaboration for AL.

  • We empirically demonstrate that the AL is effective in the FL environment through various experiments with AL algorithms and datasets. The numerical result indicates that the AL methods outperform random sampling in terms of test accuracy of global FL models.

  • We demonstrate that F-AL outperforms the other methods. We highlight that the F-AL magnifies the benefit of AL in the FL environment.

2 Related Work & Background

2.1 Federated Learning

FL can be categorized into cross-device FL and cross-silo FL (Kairouz et al., 2019). In both of FL, data is locally generated and stored while the data is centrally managed and distributed to clients in the setting of datacenter distributed learning. The cross-device FL supposes that the clients are an enormous number of mobile or IoT devices connected by wi-fi or slow connections. Therefore, uplink communication is the main bottleneck of performance. Furthermore, it generally encounters fresh training samples which are never seen before since most clients participate only once in an entire FL process.

On the other hand, cross-silo FL typically supposes that the distribution scale is - clients, which are generally different organizations or geo-distributed data centers such as hospitals or banks. Therefore, it supposes that all clients are available during the whole FL process, and the clients’ datasets are repeatedly used for training from round to round. The performance degradation due to communication bottleneck is not as severe as the case of cross-device FL. Instead, the performance heavily depends on the number or quality of the training dataset (Fenza et al., 2021).

FL can be executed in various ways in terms of optimization strategy of the knowledge among the clients. The most classic algorithms in FL are federated stochastic gradient descent (FedSGD), or federated averaging (FedAvg)

(McMahan et al., 2017) which are based on the averaging of the clients’ parameters. Beyond the vanilla algorithms, FedProx (Li et al., 2020b) and FedDF (Lin et al., 2020) tackles the systems and statistical heterogeneity, FedMA (Wang et al., 2020a) and FetchSGD (Rothchild et al., 2020) alleviate the communication bottleneck, and TERM (Li et al., 2020a) and Ditto (Li et al., 2021) are related to the fairness and robustness in personalized FL.

2.2 Active Learning

AL selects the informative instances to be labeled prior to the other instances and aims to maximize the model’s performance based on the fewest samples. It has been demonstrated that AL can considerably reduce the number of labeling samples and alleviate the heavy burden of cost for annotation (Settles, 2009; Ren et al., 2021). In fact, it has been proved that an effective AL strategy can theoretically obtain exponential acceleration in the efficiency of labeling (Balcan et al., 2009)

. Even when it is applied in the area of deep learning (DL), the cost saving in the annotation is much more fascinating since DL has its explicit limitation due to the high cost of labeling the numerous instances, even brutal in the professional field that requires rich knowledge

(Bengio et al., 2007; Krizhevsky et al., 2012).

The sampling strategies of AL can be categorized into uncertainty-based sampling, representation-based sampling, other sampling strategies leverage the characteristic of deep learning such as learning loss (LL) (Yoo and Kweon, 2019), Monte-Carlo dropout (MC-dropout) (Gal and Ghahramani, 2016), adversarial active learning (Sinha et al., 2019; Kim et al., 2021), and hybrid sampling using the strategies jointly. Uncertainty-based sampling (Lewis and Gale, 1994; Beluch et al., 2018) queries the instances which are the most uncertain to the model trained on the current training samples. Representation-based sampling (Geifman and El-Yaniv, 2017; Sener and Savarese, 2017)

measures the representativeness of unlabeled samples and encourages the sampling strategy to select the instances from different areas of the distribution. Since the sampling strategy only concerned with uncertainty may skew the model due to the similarity of the sampled instances in a particular distribution, the balance between uncertainty and representativeness is one of the main issues in the performance of AL strategies

(Sener and Savarese, 2017).

Furthermore, most of the recent work related to AL focuses on the AL strategy for DL by leveraging the aspects of ML model such as estimated training loss

(Yoo and Kweon, 2019), length of gradient (Freytag et al., 2014) and MC dropout (Gal and Ghahramani, 2016) for uncertainty estimation. Adversarial active learning (Sinha et al., 2019; Wang et al., 2020b; Zhang et al., 2020; Kim et al., 2021)

trains a generative adversarial network (GAN) structured auxiliary network which learns a low dimensional latent space and discriminates the labeled and unlabeled samples in order to select unlabeled instances which are most different from the labeled instances. Furthermore,

(Cho et al., 2021)

recently proposed Maximum Classifier Discrepancy for Active Learning (MCDAL) which is the first work that leverages classifier discrepancy for sampling in active learning.

3 Problem Definition

This section provides FL framework where the annotation step is included before the execution of FL. We first introduce the FL environment comprising a parameter server and clients. The annotation step in the FL framework is described and formulated in more detail. Furthermore, we provide the AL which the clients execute in the annotation step.

3.1 Federated Learning Environment

We consider a cross-silo FL comprising a parameter server and clients. The clients store their own local dataset , which are the unlabeled datasets. Before the start of FL, each -th client selects the instances from and labels the instances to obtain where is the selected instance from and is the label of . We denote the sampling function as and the selected instances, , as


for .

Let denote the global model to be optimized in FL. The local loss at the -th client is , where is the labeled dataset at the -th client and

is the loss function determined by the network model. Accordingly, the global loss

is defined as . The goal of FL is to train the optimized parameter minimizing the global loss, namely .

The FedSGD (McMahan et al., 2017) is applied for the FL updates, where

is obtained through iterative stochastic gradient descent (SGD) allowing the parallel computation of gradients at the clients. The parameter vector

at the -th iteration is updated according to , where is the learning rate at the -th iteration, , and is the stochastic gradient of computed at the -th client as . The update is equivalently given by




In this work, the FedAvg (McMahan et al., 2017) computes the converged solution at each client by repeating (3) multiple times before the average. The overall FL framework is summarized in Algorithm 1.

Figure 1: Annotation strategies for federated learning.
  Input: unlabeled datasets,
  Input: initialized model,
  Input: learning rate,
     Annotation step:
        for  to  do
           annotate to obtain
        end for
     FL step:
        for each round  do
           Client executes:
              do multiple iterations of (3)
           Server executes:
              average model parameters as in (2)
        end for
Algorithm 1 Federated Learning with annotation step

3.2 Active Learning

In the proposed FL framework, we introduce the sampling function, , which finds the instances to be labeled from the unlabeled dataset prior to the process of FL. For an example of random sampling, the acquired sample instances from the unlabeled dataset is , where is to randomly choose the instances from . In terms of the sampling function, the goal of AL is to find the best sampling function which selects the most informative and effective instances to the performance of the main task.

Most of the AL algorithms generally searches instances with the highest score in the unlabeled data pool (McCallumzy and Nigamy, 1998) as


where is the budget of sampling, and is the score function of . The score function of effective AL should perfectly reflect the potential informativeness of instances in the unlabeled dataset. Hence, the AL algorithms can be described by how to design . The score includes uncertainty, representativeness such as diversity, density, training loss, and dissimilarity to the labeled dataset.

Since the informativeness depends on the current labeled dataset, the score function is also conditioned on the current state of the labeled dataset. Hence, the score is generally calculated based on the trained model with the current labeled dataset, namely


where is the auxiliary model that is trained with the labeled dataset , starting from the randomly initialized model. Furthermore, AL adopts multiple rounds for sampling and gradually samples from the unlabeled dataset. When it is desired to add instances to be labeled after rounds, it samples instances at each round. We summarize the description of AL algorithm, , in Algorithm 2.

  Input: unlabeled dataset,
  Input: initially labeled dataset,
  Input: number of AL round,
  Input: initialized models for ,
  Input: number of annotation budget,
     for  to  do
        train , starting from
        sample ,
     end for
     return with size of
Algorithm 2 Active Learning,
  Input: unlabeled dataset,
  Input: initially labeled dataset,
  Input: number of AL round,
  Input: initialized models for ,
  Input: number of annotation budget,
     for  to  do
        FL step:
           train , starting from
        Sampling step:
           for  to  do
              sample ,
           end for
     end for
     return with size of ,
Algorithm 3 Federated Active Learning,

4 Federated Active Learning (F-AL)

This section introduces AL methods in the FL framework: S-AL and F-AL. In the benchmark scheme of conventional FL adopting random sampling, we set in the Algorithm 1, as we introduce in the Section 3.2.

4.1 Separate Active Learning (S-AL)

In S-AL, the clients separately perform the AL before the FL execution. With S-AL, the -th client applies of Algorithm 2 to its unlabeled dataset at the annotation step in the FL framework. The S-AL directly leverages the AL in the FL framework, including the annotation step, and might seem straightforward. However, no related work establishes AL in FL and investigates the effect of AL in FL.

4.2 Federated Active Learning (F-AL)

In S-AL, the clients independently accomplish AL and achieve the instances which are informative to the local datasets as in . At the -th round, the -th client selects with the highest score, , where denotes in the Algorithm 2 in the perspective of -th client.

Since the clients execute FL after the annotation step, however, it should be the main objective to obtain instances which are informative to the aggregate labeled dataset, as in (5). Therefore, the score function in F-AL is conditioned on and defined as . But, cannot be built because , should not be compiled to satisfy the constraint of FL. Thus, we replace with the model trained by FL, , which is


Accordingly, with F-AL, clients carry out FL to obtain the score function that represents the informativeness of the aggregate labeled dataset.

As a more clear perspective for the explanation, uncertainty (Lewis and Gale, 1994; Beluch et al., 2018) can illustrate the ground why should be leveraged for the calculation of score function in order to improve FL performance. If the AL applies uncertainty-based sampling or the sampling related to uncertainty, referred as to task aware AL in (Kim et al., 2021), it utilizes the uncertainty score, which is measured by the main task model trained with the current labeled dataset. Therefore, the auxiliary model is the main task model, namely, in Algorithm 2 or the set of auxiliary models includes the main tasks model in the case of several auxiliary models. Hence, we remark that the auxiliary model should also be obtained through FL since the main task model is trained by FL.

After attaining in F-AL, the instances can be ideally sampled as


where , . Under the annotation workload condition that the -th client annotates instances at each round, we have , where


Therefore, each -th client samples as in (11) and follows the remaining steps in Algorithm 2. In fact, the sampling step in can be executed at the server by exchanging the scores and indices of instances. However, we do not go any further since it might break the fairness of annotation workload among clients.

5 Experiments

This section provides the implementation details and the numerical results with related discussion. We compare the performance of FL using the random sampling, S-AL, and the proposed F-AL in image classification tasks. The annotation strategies are applied for the annotation step in the Algorithm 1

, where the test accuracy of the obtained model is measured for the performance metric. For the image classification tasks, we evaluate the performances of the annotation strategies on the classical public datasets, Fashion-MNIST

(Xiao et al., 2017)

, CIFAR-10

(Krizhevsky et al., 2009), and CIFAR-100 (Krizhevsky et al., 2009). The Fashion-MNIST dataset is a more challenging alternative dataset for the MNIST dataset. It consists of a training dataset of 60,000 images for 10 types of clothing and a test dataset of 10,000 images. CIFAR-10 and CIFAR-100 contain 50,000 training images and 10,000 test images. CIFAR-10 has 10 classes, while CIFAR-100 has 100 classes.

5.1 Active learning algorithms

First, we evaluate the performance of annotation strategies when the AL algorithm is the recently proposed Maximum Classifier Discrepancy for Active Learning (MCDAL) (Cho et al., 2021) which is one of the state-of-the-art AL algorithms. It utilizes the prediction discrepancies between two auxiliary classifiers after learning the auxiliary classifiers to maximize the discrepancies. It replaces the classic uncertainty with the discrepancies in the predictions of the auxiliary classifiers. It empirically demonstrates that this approach outperforms the state-of-the-art AL algorithms on the several image classifications, including CIFAR-10 and CIFAR-100.

For more discussion, we evaluate the performance of annotation strategies for the various kinds of AL algorithms to achieve consistency in performance comparison. The first category of the AL algorithms uncertainty-related AL algorithms. This category of AL algorithms includes the classic uncertainty-based sampling with maximum entropy (Lewis and Gale, 1994), MC-dropout with maximum entropy (Gal and Ghahramani, 2016), Learning Loss (LL) (Yoo and Kweon, 2019), and MCDAL (Cho et al., 2021). The other AL algorithms are the core-set approach (Sener and Savarese, 2017) and variational adversarial active learning (VAAL). The core-set approach is the most widely used AL among the representative-based AL in the literature, and the VAAL represents the recent adversarial AL algorithms (Wang et al., 2020b; Zhang et al., 2020; Kim et al., 2021). All of the algorithms consider the main task model as the auxiliary model for AL. In the LL, MCDAL, and VAAL algorithms, the auxiliary models are additionally assumed and trained for AL. Therefore, the other models can be trained by FL in addition to the main task models for F-AL. For the evaluation of LL, however, we locally train the auxiliary model because no improvements are observed with the FL of auxiliary models in our experiments.

5.2 Implementation details

In the experiments, we assume that clients respectively have disjoint images where of the dataset is initially labeled. In our active learning setup, the of the dataset is added to the labeled dataset at the sampling step of each round. We repeat this AL rounds until the total dataset is labeled. Hence, we set , , and measure the test accuracy of FL model at each -th round of AL, .

We apply the Resnet-18 (He et al., 2016) for the base architecture of main task model for all the exemplary tasks. In the FL implementation, the main task models are optimized by SGD with the learning rate of and learning rate decay of

per global iteration. The number of the local epoch is

, and the global iteration ends when the training loss at the clients decrease below thresholds, , , and for Fashion-MNIST, CIFAR-10, and CIFAR-100, respectively. In the independent learning for S-AL, we use SGD with a learning rate and step decay of at every epoch. Independent learning follows the same stopping criteria as FL. We use random horizontal flips for data augmentations. For the result of the experiments, we use the average accuracy of three runs.

Figure 2: Test accuracies of global model trained by FL per rounds on Fashion-MNIST.

Figure 3: Test accuracies of global model trained by FL per rounds on CIFAR-10.

Figure 4: Test accuracies of global model trained by FL per rounds on CIFAR-100.

5.3 Performance comparison

Fig. 2-4 illustrate the performance of random sampling (conventional FL), S-AL (benchmark), and F-AL (ours) which are the annotation strategies for FL. The AL algorithm is MCDAL, and the datasets are Fashion-MNIST, CIFAR-10, and CIFAR-100. Full budget in the figures denotes the performance of FL when all the clients have labeled dataset. On the Fashion-MNIST, F-AL and S-AL considerably outperform random sampling, and the proposed F-AL shows the best performance compared to the other strategies. In particular, the average improvement compared to random sampling is and for S-AL and F-AL, respectively, at the 2nd round and 3rd round, before converging to the performance of the full budget. On the CIFAR-10, F-AL and S-AL outperform random sampling, and the proposed F-AL shows the best performance compared to the other strategies, same as the case of Fashion-MNIST. The average improvement compared to random sampling is and for S-AL and F-AL, respectively, before the 10th round. At the half of the rounds, the improvement is and for S-AL and F-AL, respectively when the performance of random sampling is .

On the CIFAR-100, it is not observed that the performance of S-AL is better than the performance of random sampling, but F-AL consistently outperforms the random sampling. The average improvement of F-AL is before the th round and the improvement at the th round is while the test accuracy of random sampling is . Fig. 2-4 demonstrate that the proposed F-AL outperforms the baseline methods in the image classification of Fashion-MNIST, CIFAR-10, and CIFAR-100. While the F-AL consistently outperforms the random sampling on all the datasets, the S-AL shows comparable performance with the random sampling on the CIFAR-100.

5.4 Extended results for various AL algorithms

In order to demonstrate that our proposed F-AL outperforms the baseline methods for the general AL algorithms, we extend the experiment with MCDAL in Fig. 5-7. We first consider uncertainty-related AL algorithms: uncertainty-based sampling, MC-dropout with maximum entropy, LL, and MCDAL. Fig. 5-7 illustrate that F-AL outperforms S-AL and random sapling for the considered AL algorithms. The only conflicting case is when LL is applied on the CIFAR-100, as observed in Fig. 7.

Through Fig. 5-7, we can compare the performance of the AL algorithms when they are applied for the FL environment. In Fig. 5, uncertainty-based sampling, MC-dropout, and MCDAL show comparable performance, better than LL in both cases of S-AL and F-AL. In Fig. 6, uncertainty-based sampling and MC-dropout show the best performance, and LL shows poor performance compared to the other algorithms, similar to the result in Fashion-MNIST. Fig. 7 illustrates that uncertainty-based sampling and MC-dropout outperform random sampling while MCDAL and LL show comparable performance with random sampling. As we observed in Fig. 4, F-MCDAL outperforms random sampling, but F-AL does not show considerable improvement for LL. In Table 1, we consider the other categories of AL algorithms, which are the representative-based AL and adversarial AL. We evaluate the performance of random sampling, S-AL, and F-AL on CIFAR-10 when the AL algorithms are core-set approach and VAAL. Table 1 shows that both of core-set approach and VAAL show poor performance and are even worse than random sampling, while F-VAAL shows comparable performance with random sampling. F-AL does not show improvement when the core-set approach is applied for the AL algorithm.

Figure 5: Test accuracies of global model trained by FL per rounds on Fashion-MNIST (uncertainty-related AL algorithms).

Figure 6: Test accuracies of global model trained by FL per rounds on CIFAR-10 (uncertainty-related AL algorithms).

Figure 7: Test accuracies of global model trained by FL per rounds on CIFAR-100 (uncertainty-related AL algorithms).

Labeled Data ()
Table 1: Test accuracies of global model trained by FL per rounds on CIFAR-10 (core-set approach and VAAL).

5.5 Discussion

In Fig. 5-7 and Table 1, it was first observed that uncertainty-based sampling and MC-dropout, which directly utilizes the uncertainty, show the best performance across most rounds of AL, and they have the largest performance increase by F-AL. In the previous literature (Yoo and Kweon, 2019; Cho et al., 2021), it is validated that the LL and MCDAL outperform the classic uncertainty-based sampling and MC-dropout, contrary to the results in our experiments. In fact, LL and MCDAL learn the classifiers for discrepancy and the loss prediction module, respectively, in addition to the main task model, using the unlabeled dataset. Compared to the large-scale dataset stored at one client in the literature (Yoo and Kweon, 2019; Cho et al., 2021), multiple clients relatively have a much less number of instances in the unlabeled dataset for the distributed setting, e.g., clients respectively have of the total dataset in our experiments. This insufficiency of the unlabeled dataset in the FL environment causes the worse performance degradation compared to the classical uncertainty-based AL algorithms.

With VAAL, the sampling step is implemented only by the variational autoencoder (VAE) and discriminator, trained by both the labeled dataset and the unlabeled dataset. Therefore, the performance is degraded by the insufficient unlabeled dataset, compared to the performance evaluated on the large-scale datasets

(Sinha et al., 2019). It is remarkable that F-VAAL outperforms VAAL, as observed in TABLE 1, since the improvement comes from the development of VAE and discriminator with F-AL. The core-set approach generally shows excellent performance under the large-scale dataset (Sener and Savarese, 2017)

since the representativeness-based AL algorithm alleviates the problem of uncertainty-based sampling which selects the similar instances near the decision boundary. However, the problem is suppressed in the distributed setting and the core-set approach performs poor in our FL environment. With the core-set approach, only the feature extraction of main task model is utilized for sampling, so that the core-set approach is rarely improved by the main task model, developed by F-AL. Furthermore, core-set approach requires whole datasets among clients to be stored at one site for the collaborative sampling in (

10). It means that the collaboratively sampled instances with the core-set approach should be


but it cannot be obtained from the local computation as the (11), since it requires in contrast to the FL constraint. Hence, F-AL cannot improve the performance of the core-set approach.

Figure 8: Test accuracies of model trained by independent learning per rounds on CIFAR-10.

Performance of independent learning Through Fig. 2-7 and Table 1, it has been demonstrated that AL is effective in FL environment, and the proposed F-AL outperforms the conventional random sampling and S-AL. For more discussion, we investigate the effect of F-AL in the perspective of local dataset. For this, each client solely trains the main task model with the local dataset after achieving the labeled dataset via the AL strategies. Fig. 8 illustrates the average test accuracy of the models trained at the clients on CIFAR-10 when the AL algorithms are the uncertainty-related AL algorithms which show relatively significant performance increase by F-AL compared to the core-set approach and VAAL. It is observed that F-AL considerably decreases the performance of IL. In contrast, the S-AL certainly outperforms random sampling since S-AL samples the informative instances to the current local dataset. With F-AL, the clients collaborate to sample the informative instances to the aggregate datasets, not the local dataset. It becomes a solid constraint to the sampling of clients in the perspective of local datasets since each client with F-AL does not sample the instances that are not informative to the aggregate dataset even though the instances are informative to its datasets. As illustrated in Fig. 9, the aggregate dataset, which is sampled by F-AL, performs excellent for FL even though the sampled instances can be biased at the distribution of local datasets.

Figure 9: Distributions of the sampled instances

6 Conclusion

In this paper, we focused on the active learning (AL) and sampling strategies into the FL framework to reduce the annotation workload. In our proposed federated active learning (F-AL) method, the clients collaboratively perform the AL to obtain the instances that can maximally improve the global model of FL.. We empirically demonstrate that F-AL outperforms conventional random sampling strategy, client-level separate AL (S-AL) for the various AL algorithms on the image classification applications such as Fashion-MNIST, CIFAR-10, and CIFAR-100.


  • M. Balcan, A. Beygelzimer, and J. Langford (2009) Agnostic active learning. Journal of Computer and System Sciences 75 (1), pp. 78–89. Cited by: §2.2.
  • W. H. Beluch, T. Genewein, A. Nürnberger, and J. M. Köhler (2018) The power of ensembles for active learning in image classification. In

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    pp. 9368–9377. Cited by: §2.2, §4.2.
  • Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle (2007) Greedy layer-wise training of deep networks. In Advances in neural information processing systems, pp. 153–160. Cited by: §2.2.
  • J. W. Cho, D. Kim, Y. Jung, and I. S. Kweon (2021) Mcdal: maximum classifier discrepancy for active learning. arXiv preprint arXiv:2107.11049. Cited by: §2.2, §5.1, §5.1, §5.5.
  • G. Fenza, M. Gallo, V. Loia, F. Orciuoli, and E. Herrera-Viedma (2021) Data set quality in machine learning: consistency measure based on group decision making. Applied Soft Computing 106, pp. 107366. Cited by: §2.1.
  • A. Freytag, E. Rodner, and J. Denzler (2014) Selecting influential examples: active learning with expected model output changes. In European conference on computer vision, pp. 562–577. Cited by: §2.2.
  • Y. Gal and Z. Ghahramani (2016) Dropout as a bayesian approximation: representing model uncertainty in deep learning. In international conference on machine learning, pp. 1050–1059. Cited by: §2.2, §2.2, §5.1.
  • Y. Geifman and R. El-Yaniv (2017) Deep active learning over the long tail. arXiv preprint arXiv:1711.00941. Cited by: §2.2.
  • K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778. Cited by: §5.2.
  • P. Kairouz, H. B. McMahan, B. Avent, A. Bellet, M. Bennis, A. N. Bhagoji, K. Bonawitz, Z. Charles, G. Cormode, R. Cummings, et al. (2019) Advances and open problems in federated learning. arXiv preprint arXiv:1912.04977. Cited by: §2.1.
  • K. Kim, D. Park, K. I. Kim, and S. Y. Chun (2021) Task-aware variational adversarial active learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8166–8175. Cited by: §2.2, §2.2, §4.2, §5.1.
  • A. Krizhevsky, G. Hinton, et al. (2009) Learning multiple layers of features from tiny images. Cited by: §5.
  • A. Krizhevsky, I. Sutskever, and G. E. Hinton (2012) Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25, pp. 1097–1105. Cited by: §2.2.
  • D. D. Lewis and W. A. Gale (1994) A sequential algorithm for training text classifiers. In SIGIR’94, pp. 3–12. Cited by: §2.2, §4.2, §5.1.
  • T. Li, A. Beirami, M. Sanjabi, and V. Smith (2020a) Tilted empirical risk minimization. arXiv preprint arXiv:2007.01162. Cited by: §2.1.
  • T. Li, S. Hu, A. Beirami, and V. Smith (2021) Ditto: fair and robust federated learning through personalization. In International Conference on Machine Learning, pp. 6357–6368. Cited by: §2.1.
  • T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V. Smith (2020b) Federated optimization in heterogeneous networks. Proceedings of Machine Learning and Systems 2, pp. 429–450. Cited by: §1, §2.1.
  • X. Li, K. Huang, W. Yang, S. Wang, and Z. Zhang (2019) On the convergence of fedavg on non-iid data. arXiv preprint arXiv:1907.02189. Cited by: §1.
  • T. Lin, L. Kong, S. U. Stich, and M. Jaggi (2020) Ensemble distillation for robust model fusion in federated learning. arXiv preprint arXiv:2006.07242. Cited by: §2.1.
  • A. K. McCallumzy and K. Nigamy (1998) Employing em and pool-based active learning for text classification. In Proc. International Conference on Machine Learning (ICML), pp. 359–367. Cited by: §3.2.
  • B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas (2017) Communication-efficient learning of deep networks from decentralized data. In Artificial Intelligence and Statistics, pp. 1273–1282. Cited by: §1, §2.1, §3.1.
  • P. Ren, Y. Xiao, X. Chang, P. Huang, Z. Li, B. B. Gupta, X. Chen, and X. Wang (2021) A survey of deep active learning. ACM Computing Surveys (CSUR) 54 (9), pp. 1–40. Cited by: §2.2.
  • D. Rothchild, A. Panda, E. Ullah, N. Ivkin, I. Stoica, V. Braverman, J. Gonzalez, and R. Arora (2020) Fetchsgd: communication-efficient federated learning with sketching. In International Conference on Machine Learning, pp. 8253–8265. Cited by: §2.1.
  • F. Sattler, S. Wiedemann, K. Müller, and W. Samek (2019) Robust and communication-efficient federated learning from non-iid data.

    IEEE transactions on neural networks and learning systems

    31 (9), pp. 3400–3413.
    Cited by: §1.
  • O. Sener and S. Savarese (2017) Active learning for convolutional neural networks: a core-set approach. arXiv preprint arXiv:1708.00489. Cited by: §2.2, §5.1, §5.5.
  • B. Settles (2009) Active learning literature survey. Cited by: §1, §2.2.
  • S. Sinha, S. Ebrahimi, and T. Darrell (2019) Variational adversarial active learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5972–5981. Cited by: §2.2, §2.2, §5.5.
  • V. Smith, C. Chiang, M. Sanjabi, and A. Talwalkar (2017) Federated multi-task learning. arXiv preprint arXiv:1705.10467. Cited by: §1.
  • H. Wang, M. Yurochkin, Y. Sun, D. Papailiopoulos, and Y. Khazaeni (2020a) Federated learning with matched averaging. arXiv preprint arXiv:2002.06440. Cited by: §2.1.
  • S. Wang, Y. Li, K. Ma, R. Ma, H. Guan, and Y. Zheng (2020b) Dual adversarial network for deep active learning. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIV 16, pp. 680–696. Cited by: §2.2, §5.1.
  • H. Xiao, K. Rasul, and R. Vollgraf (2017) Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747. Cited by: §5.
  • D. Yoo and I. S. Kweon (2019) Learning loss for active learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 93–102. Cited by: §2.2, §2.2, §5.1, §5.5.
  • B. Zhang, L. Li, S. Yang, S. Wang, Z. Zha, and Q. Huang (2020) State-relabeling adversarial active learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8756–8765. Cited by: §2.2, §5.1.