Towards Federated Long-Tailed Learning

Data privacy and class imbalance are the norm rather than the exception in many machine learning tasks. Recent attempts have been launched to, on one side, address the problem of learning from pervasive private data, and on the other side, learn from long-tailed data. However, both assumptions might hold in practical applications, while an effective method to simultaneously alleviate both issues is yet under development. In this paper, we focus on learning with long-tailed (LT) data distributions under the context of the popular privacy-preserved federated learning (FL) framework. We characterize three scenarios with different local or global long-tailed data distributions in the FL framework, and highlight the corresponding challenges. The preliminary results under different scenarios reveal that substantial future work are of high necessity to better resolve the characterized federated long-tailed learning tasks.


Federated Learning on Heterogeneous and Long-Tailed Data via Classifier Re-Training with Federated Features

Federated learning (FL) provides a privacy-preserving solution for distr...

GRP-FED: Addressing Client Imbalance in Federated Learning via Global-Regularized Personalization

Since data is presented long-tailed in reality, it is challenging for Fe...

Federated Learning for Cross-block Oil-water Layer Identification

Cross-block oil-water layer(OWL) identification is essential for petrole...

FEDIC: Federated Learning on Non-IID and Long-Tailed Data via Calibrated Distillation

Federated learning provides a privacy guarantee for generating good deep...

Federated Hetero-Task Learning

To investigate the heterogeneity of federated learning in real-world sce...

1 Introduction

Federated learning (FL) has garnered increasing attentions from both academia and industries, as it provides an approach for multiple clients to collaboratively train a machine learning model without exposing their private data [17, 4]. This privacy-preserving feature has prevailed FL in a broad range of applications such as the healthcare, finance, and recommendation systems [2, 31]. The data stem from different sources often exhibits a high level of heterogeneity, e.g., non-IID distribution and/or imbalance in the size, which impedes the FL performance [13, 26]. Although several methods have been proposed to circumvent this issue by tackling the drift and inconsistency between the server and clients [25, 12], the impacts from long-tailed data distribution, which is an extreme case of data heterogeneity and widely exists in the real world data (e.g., healthcare and user behaviors data [11, 19]), has yet been understood.

Figure 1: A comparison between the balanced, imbalanced, and long-tailed data distributions over a dataset with 7 classes. (A) is the balanced data distribution. (B) is the imbalanced data distribution. (C) is the long-tailed data distribution.

Unlike data heterogeneity in the general sense, long-tailed distribution has a severely skewed shape in the distribution curve. To better illustrate this phenomenon, we provide a pictorial example in Figure 

1. We differentiate the term long-tailed distribution from the category of imbalanced distribution in this figure as well as the rest of this paper to emphasize its unique role. Using Figure 1, we can easily conclude that that in the presence of long-tailed data, training an unbiased classification model is generally challenging since the most of training data is concentrated in a few classes (i.e., the head classes) while the other classes (i.e., the tail classes) have very few samples. And it has been shown in [11]

that conventional deep learning models admit a significant performance degradation on real-world data that has a long-tailed distribution. In response, several schemes have been proposed to address such an extreme class imbalance issue. These methods are commonly known as the

long-tailed learning, established via the particular means of re-balancing [34], re-weighting [14]

, and transfer learning techniques

[32]. Recently, decoupled representation and classification learning scheme [11] is investigated to effectively complement the conventional approaches (e.g., class-balanced sampling [27] and distribution-aware loss [14]).

However, these existing solutions are primarily dedicated to the centralized learning (CL) and cannot be directly extended to the FL settings. Specifically, due to the distributed nature of the local data, it is much more difficult to train an unbiased model with the existence of long-tailed data in FL systems. Additionally, the limited local dataset sizes of the local clients as well as the inherent data heterogeneity in FL also constrain the applicability of the approaches developed in the scenarios of CL [33].

We refer to the FL task with long-tailed data as the federated long-tailed learning. Note that long-tailed data distribution may exist in both the local and global level, leading to different challenges during the training procedure. Particularly, the long-tailed data distribution presents an obvious characteristic on the head and tail over different classes (See Figure 1 (C)). In FL systems, different clients could have different long-tailed properties and the overall (global) data distribution would also be balanced or imbalanced in different networks. The distribution of the real-world datasets is closely related to the user habits and geo-locations, such as the image recognition datasets of the natural specifies (e.g., iNaturalist [24]) and the landmarks (e.g., Google Landmarks [29]). Such datasets would have a strongly geographical-dominated long-tailed distribution, and more importantly, images from different clients (in different locations) would present different distributional statistics. It would be more challenging to train models with good generalization on different local long-tailed data distributions than the single-distribution case.

Motivated by the aforementioned issues and the intrinsic properties of federated long-tail learning, the present paper gives a comprehensive analysis to the effect of long-tailed data on both the local and global level of FL, as well as the consequent challenges. In addition, numerical results in different settings are also provided to demonstrate the influence of long-tailed data distribution. Based on this, several future trends and open research opportunities are also discussed.

2 Problem Formulation of Federated Long-Tailed Learning

In this section, we will systematically characterize the Federated Long-Tailed (F-LT) learning problem, with the main difference lies at the distributions of the local data in each FL client and the aggregated global data distributions. The challenges under each setting are also discussed in detail.

2.1 Local and global data distribution

Consider an FL system with clients and an -class visual recognition dataset for classification problems, where represents the local dataset for client . Let denote the size of the local dataset for client (i.e., ), and denote the number of data samples of class in , i.e., .

For a given client , we shall define the local data distribution as


where denotes the ratio of the -th class over the corresponding local dataset size of client .

Note that in a typical FL system, the global server does not hold any data. To better capture the overall data distribution from the system level, we define the global data distribution as the distribution of the aggregated dataset from all clients in the system, which is denoted by


where is the total number of samples in the FL system.

Based on these two length-vectors and , we can illustrate and analyze the distributional statistics of the long-tailed data from both the local and global perspectives. Specifically, the metric imbalance factor (IF) [35, 11] could be used to measure the degree of long-tailed data distribution. Given the local data distribution vector, the local imbalance factor for client is calculated by


Similarly, the global imbalance factor shall be denoted as

Figure 2: An example of the data distributions for the summarized three types in a 20-clients FL system. The first row is the global data distribution in the corresponding type, with a colorbar in the right indicating the number of data samples in each class. Each sub-colorbox represents the number of data samples of each class across all clients.
Global data distribution Local data distributions Objective of learning tasks Datasets
Long-tailed Identical long-tailed Long-tailed datasets
distributions Learn a good global model

(e.g., CIFAR-10-LT)

Long-tailed/ Imbalance/ Long-tailed datasets
Balanced distibution Learn multiple good local models (e.g., CIFAR-10-LT)
Non long-tailed Diversified long-tailed Balanced datasets
distributions Learn multiple good local models (e.g., CIFAR-10)
Table 1: A taxonomy of long-tailed data distribution in FL. The objectives and potential datasets for the corresponding cases in federated long-tail learning are also provided.

2.2 Local and global long-tailed data distribution

Note that either or would be a large number in real-world datasets, which indicates that the long-tailed data distribution may exist in either the local side or global side. For example, the local medical image datasets in hospitals in a big city might follow long-tailed local distributions, while the aggregated city-level global dataset might be long-tailed or non long-tailed. Therefore, considering the relations and differences between the local and global data distributions, we would categorize the federated long-tailed learning tasks into the following three types:

  • Type 1: Both the local and global data distribution follow the same long-tailed distribution. In a homogeneous network, local data from all the clients follow the same distribution. In such a case, if the local data distribution has the long-tail characteristic, then the global data distribution would also be an identical long-tailed distribution.

  • Type 2: Global data distribution is long-tailed, while local data distributions are diverse, and not necessarily long-tailed. Local data of different clients in a heterogeneous network would be typically non-IID, where the pattern of the local data distribution would be rarely identical. Given a global long-tailed data distribution, the local data distributions of different clients could be long-tailed, imbalanced or balanced.

  • Type 3: All or a subset of local clients have long-tailed data distributions, but the global data follows a non long-tailed distribution (e.g., balanced distribution over all classes). In the case that the global data distribution is non long-tailed, the pattern of the local long-tailed data distributions of different clients would be diverse (i.e., different clients are supposed to keep different head and tail classes.).

Incorporating the data heterogeneity (i.e., the non-IID and imbalanced dataset size), the overall three cases represent all possible scenarios of long-tailed data in a typical FL system. As illustrated in Figure 2, we provide an example of the summarized three types for better visualization of the local and global distributions in federated long-tailed learning.

2.3 Objective of learning tasks and potential approaches

With the existence of long-tailed data distributions in FL systems, different cases would bring different challenges to the distributed learning process. We will discuss the characterized three types one by one respectively.

In the first type of long-tailed data distribution, local and global data distributions share the same statistical characteristics. A single well-trained global model has the potential to be well generalized over the local data from different clients in FL systems. As the long-tailed distributions of all clients are the same, one classifier trained for long-tailed data could be applicable for all clients. Nevertheless, potential issues may arise due to the limited local dataset sizes.

In the remaining two types, a single distribution could not cover all possible distributions of the clients in the FL system. Conventional approaches for long-tail learning for a single long-tailed distribution may fail to tackle such diversity issues. We shall consider different learning objectives for different cases of local and global data distributions. Specifically, different local clients could have vastly diverse distributions (e.g., long-tailed and non long-tailed), and the global and local data distributions would be different. Thus, it is necessary to train multiple models to address such discrepancies of data distributions.

Recall that, in the context of the personalized federated learning (PFL) [22], personalized models for each client are trained, as one global model cannot be well generalized to diverse local clients. It would be natural to regard PFL as a key ingredient to tackle such diverse data distribution issues in these two scenarios. For example, a popular solution of PFL is to decouple the local model into base layers and personalization layers [3]. Recent works in the centralized long-tail learning demonstrate that decoupling the representation learning and classifier learning with a re-adjustment on classifier could effectively improve the performance [11, 35]. Such similar decoupling approaches on model parameters would intuitively make PFL approaches to complement the federated long-tail learning.

From a more general explanation, the key idea of the PFL is to find a good trade-off to balance the global shared knowledge and the local task-specific knowledge for personalized local training. Such a learning procedure could be applied to learn unbiased long-tail classifiers with a good generalizable representation. Moreover, multi-task learning (MTL) [21], clustering [8] and transfer learning approaches [7] could also have the potential to be applied to cross-device long-tail learning in FL, which shall be discussed later in detail (See Sec. 4).

width=center Non-LT (IF) IF IF IF Data Setting IID Non-IID IID Non-IID IID Non-IID IID Non-IID - =1 =0.5 - =1 =0.5 - =1 =0.5 - =1 =0.5 FedAvg 0.9369 0.9316 0.9249 0.8806 0.8761 0.8669 0.797 0.7863 0.7689 0.7393 0.7525 0.7205 FedProx 0.9382 0.9327 0.9275 0.8801 0.8785 0.8656 0.7943 0.7783 0.7719 0.7366 0.7499 0.7155 CReFF 0.945 0.9383 0.931 0.8914 0.8791 0.8736 0.8059 0.7953 0.78 0.7427 0.7311 0.7118 FedPer 0.9356 0.9296 0.9259 0.8803 0.873 0.8696 0.7633 0.7503 0.7478 0.7376 0.7358 0.7145

Table 2: Test accuracies of various FL methods on CIFAR-10-LT with different federated data partitions (i.e., Type 2). Results on balanced CIFAR-10 are also provided for reference and comparison.
Local Setting IF = 10 IF = 50 IF = 100
FedAvg 0.8896 0.859 0.8422
FedProx 0.8929 0.8586 0.8444
CReFF 0.8984 0.8646 0.8485
FedPer 0.8951 0.8602 0.8438
Table 3: Test accuracies on CIFAR-10 with different local long-tailed distributions (i.e.,Type 3).

3 Benchmarking the Federated Long-Tailed Learning

To the best of our knowledge, the long-tailed learning in the context of FL has been rarely explored. In this section, we will give a summary on the datasets and the corresponding federated partition approaches. Recent works on long-tail learning in both centralized and federated scenarios will then be discussed. At last, we would give a brief comparison on the two typical long-tailed data settings.

3.1 Datasets and partition methods

Datasets In a centralized paradigm for visual recognition tasks, there are mainly two types of dataset benchmarking for long-tailed study. The first type is the long-tailed version of image datasets modified with synthetic operation, such as exponential sampling (CIFAR10/100-LT [5]

) and Pareto sampling( ImageNet-LT

[16], Places-LT [16]). They are shaped/sampled from the existing balanced dataset and the degree of the long-tail could be controlled with an arbitrary imbalance factor IF. Second type is the real-world large scale datasets with a highly imbalanced label distribution, like iNaturalist [24] and Google Landmarks [29]. More long-tailed datasets are used in some specific tasks, such as object detection Lvis [9], multi-label classification VOC-MLT [30]



Partition methods for long-tailed FL To create different federated (distributed) datasets according to the different patterns of local and global data distribution, different datasets and sampling methods are required. Data distributions in Type 1 could be realized by IID sampling on long-tailed datasets. Similarly, Type 2 could be achieved by Dirichlet-distribution [10] based generation method on the long-tailed datasets. Specifically, the degree of the long-tail and the identicalness of local data distributions could be controlled by the global imbalance factor IF and the concentration parameter respectively. And Type 3 could be realized via the different long-tailed sampling (different head and tail pattern) on the balanced datasets.

3.2 Approaches

Centralized long-tail learning In the centralized scenario, long-tailed learning seeks to address the class imbalance in training data. The most direct way is to rebalance the samples of different classes during the model training, such as ROS and RUS [34], Simple calibration [27] and dynamic curriculum learning [28]

. The balancing ideology could also be implemented in re-weighting and remargining the loss function, such as Focal Loss

[14], LDAM Loss [5]. These class rebalancing methods could improve the tail performance at the expense of head performance.

To address the limitation of information shortage, some studies focus on improving the tail performance by introducing additional information, such as transfer learning, meta learning, and network architecture improvement. In transfer learning, there have been methods FTL [32] and LEAP [15] transferring the knowledge from head classes to boost the performance in tail classes. In [20], meta-learning is empirically proved to be capable of adaptively learning an explicit weighting function directly from data, which guarantees robust deep learning in front of training data bias. Recently, some studies design and improve network architecture specific to long-tailed data. For example, different types of classifiers are proposed to address long-tailed problems, such as norm classifier [11] and Causal classifier [23].

Federated long-tail learning Yet, the only one related work on federated long-tail learning [19] utilized classifier re-training to re-adjust decision boundaries, where the discussion is limited within the global long-tailed distribution with local heterogeneity. Methods for other types of local and global data distribution remain to be further explored.

Nevertheless, in the presence of long-tailed data, the discrepancies among local and global data distributions of different clients in the FL system, could be possibly addressed by the techniques in the federated optimization algorithm, such as dynamic regularization [1], diverse client scheduling [6] and adaptive aggregation. In addition, as we discussed previously in Sec. 2.3, PFL could be applied in federated long-tailed learning to find a balance between the representation and the classification learning. We shall give a detailed discussion on such explorations to boost the performance of federated long-tailed learning in Sec. 4.

Based on the above discussion about the data distribution, datasets and learning objectives, we summarize them into Table 1. Note that, the case, where both the local and global data distributions are non-long-tailed, is not listed in this table, as this case is not within the scope of this paper.

3.3 Performance comparison

To better illustrate the impacts of the long-tail data distribution, we shall provide some numerical results with different types of long-tailed data distribution in Tables 2 and 3. For all the experiments, we consider a FL with clients. And the non-IID data partition is implemented by Dirichlet distribution. Apart from the basedline FedAvg [17], the other three FL algorithms are FedProx [13], CReFF [19] and FedPer [3], which are representative approaches to tackle data heterogeneity, long-tailed data and personalization in FL respectively.

Note that, the main purpose of this subsection is to analyze the performance of the different FL methods with diverse data settings to provide some possible insights to the design of the federated long-tailed learning algorithm.

We choose two typical long-tailed data distributions in the federated setting to evaluate the performance. In Table 2, we give tha results on both the IID and non-IID data settings built upon the global long-tailed dataset CIFAR-10-LT with different imbalance factors 10, 50 and 100. For non-IID data partition, we use Dirichlet distribution-based sampling method with different concentration parameter to control the degree of data heterogeneity. To better demonstrate the impacts of the long-tailed data distribution, we also include a group of experiment results on the (balanced) CIFAR-10 for reference. In Table 3, results on CIFAR-10 are provided, where we consider sample different long-tailed local data distributions (i.e., different head-tail distribution) with the same imbalance factor IF. See Figure 2(C) for an overview.

For the results in Tables 2 and 3, best test accuracies of all algorithms present a descending sort pattern from the left to right, as the degree of the long-tail and heterogeneity is increasing. Interestingly, the federated optimization methods FedProx outperforms FedAvg in the non-long-tailed setting, while it tends to underperform with global long-tailed data in some settings. As a specific method to tackle long-tailed data, CReFF can achieve best results among all four algorithms in most of settings, but it has lower accuracy performances than FedProx with more heterogeneous data distribution. With regard to the PFL methods, our preliminary results illustrate that personalization method outperforms in most of the long-tailed data settings, especially in settings of Table 3 (i.e., diverse local long-tailed distributions in Type 2).

The numerical results indicate that, PFL methods have the potential to enhance the performance without any specialized long-tailed learning techniques. More importantly, the preliminary results also demonstrate the feasibility and possibility to re-purpose the federated optimization and PFL methods with centralized long-tailed learning approaches in federated scenarios.

4 Future Trends and Research Opportunities

Based on the above experimental results and discussions of the federated long-tailed learning, we envision the following directions and opportunities towards the robust and communication-efficient federated long-tailed learning algorithms, architectures and analysis.

  • Incorporate PFL ideas for better federated long-tail learning. As a promising technique, PFL could possibly boost the training performance of federated long-tailed learning with centralized long-tailed learning methods. How to balance the global shared knowledge with local perosnalized knowledge could be incorporated into the design of the representation learning and classification architectures in federated long-tailed learning. Moreover, it would be promising to explore the incorporation of the model-based and data-based PFL approaches [22] with the long-tailed learning.

  • Hierarchical FL architectures. In the presence of diverse data distributions, we may consider to group clients with similar long-tail distributional statistics into clusters to jointly learn cluster-level personalized models or conduct cluster-level MTL [18]. However, the design of a privacy-preserving clustering method remains to be further investigated.

  • Re-purpose of existing federated optimization methods. Local long-tailed data distribution could be regarded as an extremely imbalanced case of data heterogeneity. Hence, how to re-purpose the federated optimization algorithm in the presence of the long-tailed data could be further explored. It would be another open question to develop a heterogeneity-agnostic federated optimization framework. Moreover, MTL-based long-tailed learning could also be a potential approach to address the heterogeneous long-tailed distributions in FL.

  • Design better data partition/sampling schemes or more representative datasets. In addition to the several real-world long-tailed datasets, most of the current work use the long-tailed version of the popular image datasets. Although this method could use the pre-determined imbalance factor IF to control the imbalance, it would also discard a large amount of samples when following the widely-used exponential and Pareto sampling methods. Therefore, the degradation of the performance could also be partially attributed to the small dataset size, especially for scenarios with a larger imbalance factor in federated settings. How to mitigate such negative impacts should be further investigated. Meanwhile, future research could also leverage on real-world scenarios, such as medical images or autonomous cars, to provide more representative and convincing federated long-tailed learning dataset.

5 Concluding Remarks

In this paper, we introduce the federated long-tailed learning task, a general setting motivated by real-world applications but rarely studied in previous research. We characterize three types of F-LT learning settings with diverse local and global long-tailed data distributions. The benchmark results with multiple federated learning architectures suggest that substantial future work is needed for better F-LT. In addition, we highlight the potential techniques and possible trajectories of research towards federated long-tailed learning with real-world data.


  • [1] D. A. E. Acar, Y. Zhao, R. M. Navarro, M. Mattina, P. N. Whatmough, and V. Saligrama (2021) Federated learning based on dynamic regularization. arXiv preprint arXiv:2111.04263. Cited by: §3.2.
  • [2] M. Andreux, J. O. d. Terrail, C. Beguier, and E. W. Tramel (2020) Siloed federated learning for multi-centric histopathology datasets. In Domain Adaptation and Representation Transfer, and Distributed and Collaborative Learning, pp. 129–139. Cited by: §1.
  • [3] M. G. Arivazhagan, V. Aggarwal, A. K. Singh, and S. Choudhary (2019) Federated learning with personalization layers. arXiv preprint arXiv:1912.00818. Cited by: §2.3, §3.3.
  • [4] K. Bonawitz, H. Eichner, W. Grieskamp, D. Huba, A. Ingerman, V. Ivanov, C. Kiddon, J. Konečnỳ, S. Mazzocchi, B. McMahan, et al. (2019) Towards federated learning at scale: system design. Proceedings of Machine Learning and Systems 1, pp. 374–388. Cited by: §1.
  • [5] K. Cao, C. Wei, A. Gaidon, N. Arechiga, and T. Ma (2019) Learning imbalanced datasets with label-distribution-aware margin loss. In Advances in Neural Information Processing Systems, Cited by: §3.1, §3.2.
  • [6] Y. J. Cho, J. Wang, and G. Joshi (2022) Client selection in federated learning: convergence analysis and power-of-choice selection strategies. In Artificial intelligence and statistics, Cited by: §3.2.
  • [7] D. Gao, Y. Liu, A. Huang, C. Ju, H. Yu, and Q. Yang (2019) Privacy-preserving heterogeneous federated transfer learning. In 2019 IEEE International Conference on Big Data (Big Data), pp. 2552–2559. Cited by: §2.3.
  • [8] A. Ghosh, J. Chung, D. Yin, and K. Ramchandran (2020) An efficient framework for clustered federated learning. Advances in Neural Information Processing Systems 33, pp. 19586–19597. Cited by: §2.3.
  • [9] A. Gupta, P. Dollar, and R. Girshick (2019) LVIS: a dataset for large vocabulary instance segmentation. In

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    pp. 5356–5364. Cited by: §3.1.
  • [10] T. H. Hsu, H. Qi, and M. Brown (2019) Measuring the effects of non-identical data distribution for federated visual classification. arXiv preprint arXiv:1909.06335. Cited by: §3.1.
  • [11] B. Kang, S. Xie, M. Rohrbach, Z. Yan, A. Gordo, J. Feng, and Y. Kalantidis (2019) Decoupling representation and classifier for long-tailed recognition. In International Conference on Learning Representations, Cited by: §1, §1, §2.1, §2.3, §3.2.
  • [12] S. P. Karimireddy, S. Kale, M. Mohri, S. Reddi, S. Stich, and A. T. Suresh (2020) Scaffold: stochastic controlled averaging for federated learning. In International Conference on Machine Learning, pp. 5132–5143. Cited by: §1.
  • [13] T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V. Smith (2020) Federated optimization in heterogeneous networks. Proceedings of Machine Learning and Systems 2, pp. 429–450. Cited by: §1, §3.3.
  • [14] T. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár (2017) Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pp. 2980–2988. Cited by: §1, §3.2.
  • [15] J. Liu, Y. Sun, C. Han, Z. Dou, and W. Li (2020) Deep representation learning on long-tailed data: a learnable embedding augmentation perspective. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2970–2979. Cited by: §3.2.
  • [16] Z. Liu, Z. Miao, X. Zhan, J. Wang, B. Gong, and S. X. Yu (2019) Large-scale long-tailed recognition in an open world. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §3.1.
  • [17] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas (2017) Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pp. 1273–1282. Cited by: §1, §3.3.
  • [18] F. Sattler, K. Müller, and W. Samek (2020) Clustered federated learning: model-agnostic distributed multitask optimization under privacy constraints.

    IEEE transactions on neural networks and learning systems

    32 (8), pp. 3710–3722.
    Cited by: 2nd item.
  • [19] X. Shang, Y. Lu, G. Huang, and H. Wang (2022) Federated learning on heterogeneous and long-tailed data via classifier re-training with federated features. arXiv preprint arXiv:2204.13399. Cited by: §1, §3.2, §3.3.
  • [20] J. Shu, Q. Xie, L. Yi, Q. Zhao, S. Zhou, Z. Xu, and D. Meng (2019) Meta-weight-net: learning an explicit mapping for sample weighting. Advances in neural information processing systems 32. Cited by: §3.2.
  • [21] V. Smith, C. Chiang, M. Sanjabi, and A. S. Talwalkar (2017) Federated multi-task learning. Advances in neural information processing systems 30. Cited by: §2.3.
  • [22] A. Z. Tan, H. Yu, L. Cui, and Q. Yang (2022) Towards personalized federated learning. IEEE Transactions on Neural Networks and Learning Systems. Cited by: §2.3, 1st item.
  • [23] K. Tang, J. Huang, and H. Zhang (2020) Long-tailed classification by keeping the good and removing the bad momentum causal effect. Advances in Neural Information Processing Systems 33, pp. 1513–1524. Cited by: §3.2.
  • [24] G. Van Horn, O. Mac Aodha, Y. Song, Y. Cui, C. Sun, A. Shepard, H. Adam, P. Perona, and S. Belongie (2018) The inaturalist species classification and detection dataset. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8769–8778. Cited by: §1, §3.1.
  • [25] J. Wang, Q. Liu, H. Liang, G. Joshi, and H. V. Poor (2020) Tackling the objective inconsistency problem in heterogeneous federated optimization. Advances in neural information processing systems 33, pp. 7611–7623. Cited by: §1.
  • [26] L. Wang, S. Xu, X. Wang, and Q. Zhu (2021) Addressing class imbalance in federated learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35, pp. 10165–10173. Cited by: §1.
  • [27] T. Wang, Y. Li, B. Kang, J. Li, J. Liew, S. Tang, S. Hoi, and J. Feng (2020) The devil is in classification: a simple framework for long-tail instance segmentation. In European conference on computer vision, pp. 728–744. Cited by: §1, §3.2.
  • [28] Y. Wang, W. Gan, J. Yang, W. Wu, and J. Yan (2019) Dynamic curriculum learning for imbalanced data classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5017–5026. Cited by: §3.2.
  • [29] T. Weyand, A. Araujo, B. Cao, and J. Sim (2020) Google landmarks dataset v2-a large-scale benchmark for instance-level recognition and retrieval. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 2575–2584. Cited by: §1, §3.1.
  • [30] T. Wu, Q. Huang, Z. Liu, Y. Wang, and D. Lin (2020) Distribution-balanced loss for multi-label classification in long-tailed datasets. In European Conference on Computer Vision, pp. 162–178. Cited by: §3.1.
  • [31] L. Yang, B. Tan, V. W. Zheng, K. Chen, and Q. Yang (2020) Federated recommendation systems. In Federated Learning, pp. 225–239. Cited by: §1.
  • [32] X. Yin, X. Yu, K. Sohn, X. Liu, and M. Chandraker (2019)

    Feature transfer learning for face recognition with under-represented data

    In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5704–5713. Cited by: §1, §3.2.
  • [33] T. Yoon, S. Shin, S. J. Hwang, and E. Yang (2020) FedMix: approximation of mixup under mean augmented federated learning. In International Conference on Learning Representations, Cited by: §1.
  • [34] Y. Zhang, B. Kang, B. Hooi, S. Yan, and J. Feng (2021) Deep long-tailed learning: a survey. arXiv preprint arXiv:2110.04596. Cited by: §1, §3.2.
  • [35] B. Zhou, Q. Cui, X. Wei, and Z. Chen (2020) Bbn: bilateral-branch network with cumulative learning for long-tailed visual recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9719–9728. Cited by: §2.1, §2.3.