1 Introduction
Federated learning (FL) has garnered increasing attentions from both academia and industries, as it provides an approach for multiple clients to collaboratively train a machine learning model without exposing their private data [17, 4]. This privacypreserving feature has prevailed FL in a broad range of applications such as the healthcare, finance, and recommendation systems [2, 31]. The data stem from different sources often exhibits a high level of heterogeneity, e.g., nonIID distribution and/or imbalance in the size, which impedes the FL performance [13, 26]. Although several methods have been proposed to circumvent this issue by tackling the drift and inconsistency between the server and clients [25, 12], the impacts from longtailed data distribution, which is an extreme case of data heterogeneity and widely exists in the real world data (e.g., healthcare and user behaviors data [11, 19]), has yet been understood.
Unlike data heterogeneity in the general sense, longtailed distribution has a severely skewed shape in the distribution curve. To better illustrate this phenomenon, we provide a pictorial example in Figure
1. We differentiate the term longtailed distribution from the category of imbalanced distribution in this figure as well as the rest of this paper to emphasize its unique role. Using Figure 1, we can easily conclude that that in the presence of longtailed data, training an unbiased classification model is generally challenging since the most of training data is concentrated in a few classes (i.e., the head classes) while the other classes (i.e., the tail classes) have very few samples. And it has been shown in [11]that conventional deep learning models admit a significant performance degradation on realworld data that has a longtailed distribution. In response, several schemes have been proposed to address such an extreme class imbalance issue. These methods are commonly known as the
longtailed learning, established via the particular means of rebalancing [34], reweighting [14], and transfer learning techniques
[32]. Recently, decoupled representation and classification learning scheme [11] is investigated to effectively complement the conventional approaches (e.g., classbalanced sampling [27] and distributionaware loss [14]).However, these existing solutions are primarily dedicated to the centralized learning (CL) and cannot be directly extended to the FL settings. Specifically, due to the distributed nature of the local data, it is much more difficult to train an unbiased model with the existence of longtailed data in FL systems. Additionally, the limited local dataset sizes of the local clients as well as the inherent data heterogeneity in FL also constrain the applicability of the approaches developed in the scenarios of CL [33].
We refer to the FL task with longtailed data as the federated longtailed learning. Note that longtailed data distribution may exist in both the local and global level, leading to different challenges during the training procedure. Particularly, the longtailed data distribution presents an obvious characteristic on the head and tail over different classes (See Figure 1 (C)). In FL systems, different clients could have different longtailed properties and the overall (global) data distribution would also be balanced or imbalanced in different networks. The distribution of the realworld datasets is closely related to the user habits and geolocations, such as the image recognition datasets of the natural specifies (e.g., iNaturalist [24]) and the landmarks (e.g., Google Landmarks [29]). Such datasets would have a strongly geographicaldominated longtailed distribution, and more importantly, images from different clients (in different locations) would present different distributional statistics. It would be more challenging to train models with good generalization on different local longtailed data distributions than the singledistribution case.
Motivated by the aforementioned issues and the intrinsic properties of federated longtail learning, the present paper gives a comprehensive analysis to the effect of longtailed data on both the local and global level of FL, as well as the consequent challenges. In addition, numerical results in different settings are also provided to demonstrate the influence of longtailed data distribution. Based on this, several future trends and open research opportunities are also discussed.
2 Problem Formulation of Federated LongTailed Learning
In this section, we will systematically characterize the Federated LongTailed (FLT) learning problem, with the main difference lies at the distributions of the local data in each FL client and the aggregated global data distributions. The challenges under each setting are also discussed in detail.
2.1 Local and global data distribution
Consider an FL system with clients and an class visual recognition dataset for classification problems, where represents the local dataset for client . Let denote the size of the local dataset for client (i.e., ), and denote the number of data samples of class in , i.e., .
For a given client , we shall define the local data distribution as
(1) 
where denotes the ratio of the th class over the corresponding local dataset size of client .
Note that in a typical FL system, the global server does not hold any data. To better capture the overall data distribution from the system level, we define the global data distribution as the distribution of the aggregated dataset from all clients in the system, which is denoted by
(2) 
where is the total number of samples in the FL system.
Based on these two lengthvectors and , we can illustrate and analyze the distributional statistics of the longtailed data from both the local and global perspectives. Specifically, the metric imbalance factor (IF) [35, 11] could be used to measure the degree of longtailed data distribution. Given the local data distribution vector, the local imbalance factor for client is calculated by
(3) 
Similarly, the global imbalance factor shall be denoted as
(4) 
Global data distribution  Local data distributions  Objective of learning tasks  Datasets 
Longtailed  Identical longtailed  Longtailed datasets  
distributions  Learn a good global model  (e.g., CIFAR10LT) 

Longtailed/ Imbalance/  Longtailed datasets  
Balanced distibution  Learn multiple good local models  (e.g., CIFAR10LT)  
Non longtailed  Diversified longtailed  Balanced datasets  
distributions  Learn multiple good local models  (e.g., CIFAR10) 
2.2 Local and global longtailed data distribution
Note that either or would be a large number in realworld datasets, which indicates that the longtailed data distribution may exist in either the local side or global side. For example, the local medical image datasets in hospitals in a big city might follow longtailed local distributions, while the aggregated citylevel global dataset might be longtailed or non longtailed. Therefore, considering the relations and differences between the local and global data distributions, we would categorize the federated longtailed learning tasks into the following three types:

Type 1: Both the local and global data distribution follow the same longtailed distribution. In a homogeneous network, local data from all the clients follow the same distribution. In such a case, if the local data distribution has the longtail characteristic, then the global data distribution would also be an identical longtailed distribution.

Type 2: Global data distribution is longtailed, while local data distributions are diverse, and not necessarily longtailed. Local data of different clients in a heterogeneous network would be typically nonIID, where the pattern of the local data distribution would be rarely identical. Given a global longtailed data distribution, the local data distributions of different clients could be longtailed, imbalanced or balanced.

Type 3: All or a subset of local clients have longtailed data distributions, but the global data follows a non longtailed distribution (e.g., balanced distribution over all classes). In the case that the global data distribution is non longtailed, the pattern of the local longtailed data distributions of different clients would be diverse (i.e., different clients are supposed to keep different head and tail classes.).
Incorporating the data heterogeneity (i.e., the nonIID and imbalanced dataset size), the overall three cases represent all possible scenarios of longtailed data in a typical FL system. As illustrated in Figure 2, we provide an example of the summarized three types for better visualization of the local and global distributions in federated longtailed learning.
2.3 Objective of learning tasks and potential approaches
With the existence of longtailed data distributions in FL systems, different cases would bring different challenges to the distributed learning process. We will discuss the characterized three types one by one respectively.
In the first type of longtailed data distribution, local and global data distributions share the same statistical characteristics. A single welltrained global model has the potential to be well generalized over the local data from different clients in FL systems. As the longtailed distributions of all clients are the same, one classifier trained for longtailed data could be applicable for all clients. Nevertheless, potential issues may arise due to the limited local dataset sizes.
In the remaining two types, a single distribution could not cover all possible distributions of the clients in the FL system. Conventional approaches for longtail learning for a single longtailed distribution may fail to tackle such diversity issues. We shall consider different learning objectives for different cases of local and global data distributions. Specifically, different local clients could have vastly diverse distributions (e.g., longtailed and non longtailed), and the global and local data distributions would be different. Thus, it is necessary to train multiple models to address such discrepancies of data distributions.
Recall that, in the context of the personalized federated learning (PFL) [22], personalized models for each client are trained, as one global model cannot be well generalized to diverse local clients. It would be natural to regard PFL as a key ingredient to tackle such diverse data distribution issues in these two scenarios. For example, a popular solution of PFL is to decouple the local model into base layers and personalization layers [3]. Recent works in the centralized longtail learning demonstrate that decoupling the representation learning and classifier learning with a readjustment on classifier could effectively improve the performance [11, 35]. Such similar decoupling approaches on model parameters would intuitively make PFL approaches to complement the federated longtail learning.
From a more general explanation, the key idea of the PFL is to find a good tradeoff to balance the global shared knowledge and the local taskspecific knowledge for personalized local training. Such a learning procedure could be applied to learn unbiased longtail classifiers with a good generalizable representation. Moreover, multitask learning (MTL) [21], clustering [8] and transfer learning approaches [7] could also have the potential to be applied to crossdevice longtail learning in FL, which shall be discussed later in detail (See Sec. 4).
Local Setting  IF = 10  IF = 50  IF = 100 
FedAvg  0.8896  0.859  0.8422 
FedProx  0.8929  0.8586  0.8444 
CReFF  0.8984  0.8646  0.8485 
FedPer  0.8951  0.8602  0.8438 
3 Benchmarking the Federated LongTailed Learning
To the best of our knowledge, the longtailed learning in the context of FL has been rarely explored. In this section, we will give a summary on the datasets and the corresponding federated partition approaches. Recent works on longtail learning in both centralized and federated scenarios will then be discussed. At last, we would give a brief comparison on the two typical longtailed data settings.
3.1 Datasets and partition methods
Datasets In a centralized paradigm for visual recognition tasks, there are mainly two types of dataset benchmarking for longtailed study. The first type is the longtailed version of image datasets modified with synthetic operation, such as exponential sampling (CIFAR10/100LT [5]
) and Pareto sampling( ImageNetLT
[16], PlacesLT [16]). They are shaped/sampled from the existing balanced dataset and the degree of the longtail could be controlled with an arbitrary imbalance factor IF. Second type is the realworld large scale datasets with a highly imbalanced label distribution, like iNaturalist [24] and Google Landmarks [29]. More longtailed datasets are used in some specific tasks, such as object detection Lvis [9], multilabel classification VOCMLT [30]and COCOMLT
[30].Partition methods for longtailed FL To create different federated (distributed) datasets according to the different patterns of local and global data distribution, different datasets and sampling methods are required. Data distributions in Type 1 could be realized by IID sampling on longtailed datasets. Similarly, Type 2 could be achieved by Dirichletdistribution [10] based generation method on the longtailed datasets. Specifically, the degree of the longtail and the identicalness of local data distributions could be controlled by the global imbalance factor IF and the concentration parameter respectively. And Type 3 could be realized via the different longtailed sampling (different head and tail pattern) on the balanced datasets.
3.2 Approaches
Centralized longtail learning In the centralized scenario, longtailed learning seeks to address the class imbalance in training data. The most direct way is to rebalance the samples of different classes during the model training, such as ROS and RUS [34], Simple calibration [27] and dynamic curriculum learning [28]
. The balancing ideology could also be implemented in reweighting and remargining the loss function, such as Focal Loss
[14], LDAM Loss [5]. These class rebalancing methods could improve the tail performance at the expense of head performance.To address the limitation of information shortage, some studies focus on improving the tail performance by introducing additional information, such as transfer learning, meta learning, and network architecture improvement. In transfer learning, there have been methods FTL [32] and LEAP [15] transferring the knowledge from head classes to boost the performance in tail classes. In [20], metalearning is empirically proved to be capable of adaptively learning an explicit weighting function directly from data, which guarantees robust deep learning in front of training data bias. Recently, some studies design and improve network architecture specific to longtailed data. For example, different types of classifiers are proposed to address longtailed problems, such as norm classifier [11] and Causal classifier [23].
Federated longtail learning Yet, the only one related work on federated longtail learning [19] utilized classifier retraining to readjust decision boundaries, where the discussion is limited within the global longtailed distribution with local heterogeneity. Methods for other types of local and global data distribution remain to be further explored.
Nevertheless, in the presence of longtailed data, the discrepancies among local and global data distributions of different clients in the FL system, could be possibly addressed by the techniques in the federated optimization algorithm, such as dynamic regularization [1], diverse client scheduling [6] and adaptive aggregation. In addition, as we discussed previously in Sec. 2.3, PFL could be applied in federated longtailed learning to find a balance between the representation and the classification learning. We shall give a detailed discussion on such explorations to boost the performance of federated longtailed learning in Sec. 4.
Based on the above discussion about the data distribution, datasets and learning objectives, we summarize them into Table 1. Note that, the case, where both the local and global data distributions are nonlongtailed, is not listed in this table, as this case is not within the scope of this paper.
3.3 Performance comparison
To better illustrate the impacts of the longtail data distribution, we shall provide some numerical results with different types of longtailed data distribution in Tables 2 and 3. For all the experiments, we consider a FL with clients. And the nonIID data partition is implemented by Dirichlet distribution. Apart from the basedline FedAvg [17], the other three FL algorithms are FedProx [13], CReFF [19] and FedPer [3], which are representative approaches to tackle data heterogeneity, longtailed data and personalization in FL respectively.
Note that, the main purpose of this subsection is to analyze the performance of the different FL methods with diverse data settings to provide some possible insights to the design of the federated longtailed learning algorithm.
We choose two typical longtailed data distributions in the federated setting to evaluate the performance. In Table 2, we give tha results on both the IID and nonIID data settings built upon the global longtailed dataset CIFAR10LT with different imbalance factors 10, 50 and 100. For nonIID data partition, we use Dirichlet distributionbased sampling method with different concentration parameter to control the degree of data heterogeneity. To better demonstrate the impacts of the longtailed data distribution, we also include a group of experiment results on the (balanced) CIFAR10 for reference. In Table 3, results on CIFAR10 are provided, where we consider sample different longtailed local data distributions (i.e., different headtail distribution) with the same imbalance factor IF. See Figure 2(C) for an overview.
For the results in Tables 2 and 3, best test accuracies of all algorithms present a descending sort pattern from the left to right, as the degree of the longtail and heterogeneity is increasing. Interestingly, the federated optimization methods FedProx outperforms FedAvg in the nonlongtailed setting, while it tends to underperform with global longtailed data in some settings. As a specific method to tackle longtailed data, CReFF can achieve best results among all four algorithms in most of settings, but it has lower accuracy performances than FedProx with more heterogeneous data distribution. With regard to the PFL methods, our preliminary results illustrate that personalization method outperforms in most of the longtailed data settings, especially in settings of Table 3 (i.e., diverse local longtailed distributions in Type 2).
The numerical results indicate that, PFL methods have the potential to enhance the performance without any specialized longtailed learning techniques. More importantly, the preliminary results also demonstrate the feasibility and possibility to repurpose the federated optimization and PFL methods with centralized longtailed learning approaches in federated scenarios.
4 Future Trends and Research Opportunities
Based on the above experimental results and discussions of the federated longtailed learning, we envision the following directions and opportunities towards the robust and communicationefficient federated longtailed learning algorithms, architectures and analysis.

Incorporate PFL ideas for better federated longtail learning. As a promising technique, PFL could possibly boost the training performance of federated longtailed learning with centralized longtailed learning methods. How to balance the global shared knowledge with local perosnalized knowledge could be incorporated into the design of the representation learning and classification architectures in federated longtailed learning. Moreover, it would be promising to explore the incorporation of the modelbased and databased PFL approaches [22] with the longtailed learning.

Hierarchical FL architectures. In the presence of diverse data distributions, we may consider to group clients with similar longtail distributional statistics into clusters to jointly learn clusterlevel personalized models or conduct clusterlevel MTL [18]. However, the design of a privacypreserving clustering method remains to be further investigated.

Repurpose of existing federated optimization methods. Local longtailed data distribution could be regarded as an extremely imbalanced case of data heterogeneity. Hence, how to repurpose the federated optimization algorithm in the presence of the longtailed data could be further explored. It would be another open question to develop a heterogeneityagnostic federated optimization framework. Moreover, MTLbased longtailed learning could also be a potential approach to address the heterogeneous longtailed distributions in FL.

Design better data partition/sampling schemes or more representative datasets. In addition to the several realworld longtailed datasets, most of the current work use the longtailed version of the popular image datasets. Although this method could use the predetermined imbalance factor IF to control the imbalance, it would also discard a large amount of samples when following the widelyused exponential and Pareto sampling methods. Therefore, the degradation of the performance could also be partially attributed to the small dataset size, especially for scenarios with a larger imbalance factor in federated settings. How to mitigate such negative impacts should be further investigated. Meanwhile, future research could also leverage on realworld scenarios, such as medical images or autonomous cars, to provide more representative and convincing federated longtailed learning dataset.
5 Concluding Remarks
In this paper, we introduce the federated longtailed learning task, a general setting motivated by realworld applications but rarely studied in previous research. We characterize three types of FLT learning settings with diverse local and global longtailed data distributions. The benchmark results with multiple federated learning architectures suggest that substantial future work is needed for better FLT. In addition, we highlight the potential techniques and possible trajectories of research towards federated longtailed learning with realworld data.
References
 [1] (2021) Federated learning based on dynamic regularization. arXiv preprint arXiv:2111.04263. Cited by: §3.2.
 [2] (2020) Siloed federated learning for multicentric histopathology datasets. In Domain Adaptation and Representation Transfer, and Distributed and Collaborative Learning, pp. 129–139. Cited by: §1.
 [3] (2019) Federated learning with personalization layers. arXiv preprint arXiv:1912.00818. Cited by: §2.3, §3.3.
 [4] (2019) Towards federated learning at scale: system design. Proceedings of Machine Learning and Systems 1, pp. 374–388. Cited by: §1.
 [5] (2019) Learning imbalanced datasets with labeldistributionaware margin loss. In Advances in Neural Information Processing Systems, Cited by: §3.1, §3.2.
 [6] (2022) Client selection in federated learning: convergence analysis and powerofchoice selection strategies. In Artificial intelligence and statistics, Cited by: §3.2.
 [7] (2019) Privacypreserving heterogeneous federated transfer learning. In 2019 IEEE International Conference on Big Data (Big Data), pp. 2552–2559. Cited by: §2.3.
 [8] (2020) An efficient framework for clustered federated learning. Advances in Neural Information Processing Systems 33, pp. 19586–19597. Cited by: §2.3.

[9]
(2019)
LVIS: a dataset for large vocabulary instance segmentation.
In
Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
, pp. 5356–5364. Cited by: §3.1.  [10] (2019) Measuring the effects of nonidentical data distribution for federated visual classification. arXiv preprint arXiv:1909.06335. Cited by: §3.1.
 [11] (2019) Decoupling representation and classifier for longtailed recognition. In International Conference on Learning Representations, Cited by: §1, §1, §2.1, §2.3, §3.2.
 [12] (2020) Scaffold: stochastic controlled averaging for federated learning. In International Conference on Machine Learning, pp. 5132–5143. Cited by: §1.
 [13] (2020) Federated optimization in heterogeneous networks. Proceedings of Machine Learning and Systems 2, pp. 429–450. Cited by: §1, §3.3.
 [14] (2017) Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pp. 2980–2988. Cited by: §1, §3.2.
 [15] (2020) Deep representation learning on longtailed data: a learnable embedding augmentation perspective. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2970–2979. Cited by: §3.2.
 [16] (2019) Largescale longtailed recognition in an open world. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §3.1.
 [17] (2017) Communicationefficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pp. 1273–1282. Cited by: §1, §3.3.

[18]
(2020)
Clustered federated learning: modelagnostic distributed multitask optimization under privacy constraints.
IEEE transactions on neural networks and learning systems
32 (8), pp. 3710–3722. Cited by: 2nd item.  [19] (2022) Federated learning on heterogeneous and longtailed data via classifier retraining with federated features. arXiv preprint arXiv:2204.13399. Cited by: §1, §3.2, §3.3.
 [20] (2019) Metaweightnet: learning an explicit mapping for sample weighting. Advances in neural information processing systems 32. Cited by: §3.2.
 [21] (2017) Federated multitask learning. Advances in neural information processing systems 30. Cited by: §2.3.
 [22] (2022) Towards personalized federated learning. IEEE Transactions on Neural Networks and Learning Systems. Cited by: §2.3, 1st item.
 [23] (2020) Longtailed classification by keeping the good and removing the bad momentum causal effect. Advances in Neural Information Processing Systems 33, pp. 1513–1524. Cited by: §3.2.
 [24] (2018) The inaturalist species classification and detection dataset. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8769–8778. Cited by: §1, §3.1.
 [25] (2020) Tackling the objective inconsistency problem in heterogeneous federated optimization. Advances in neural information processing systems 33, pp. 7611–7623. Cited by: §1.
 [26] (2021) Addressing class imbalance in federated learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35, pp. 10165–10173. Cited by: §1.
 [27] (2020) The devil is in classification: a simple framework for longtail instance segmentation. In European conference on computer vision, pp. 728–744. Cited by: §1, §3.2.
 [28] (2019) Dynamic curriculum learning for imbalanced data classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5017–5026. Cited by: §3.2.
 [29] (2020) Google landmarks dataset v2a largescale benchmark for instancelevel recognition and retrieval. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 2575–2584. Cited by: §1, §3.1.
 [30] (2020) Distributionbalanced loss for multilabel classification in longtailed datasets. In European Conference on Computer Vision, pp. 162–178. Cited by: §3.1.
 [31] (2020) Federated recommendation systems. In Federated Learning, pp. 225–239. Cited by: §1.

[32]
(2019)
Feature transfer learning for face recognition with underrepresented data
. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5704–5713. Cited by: §1, §3.2.  [33] (2020) FedMix: approximation of mixup under mean augmented federated learning. In International Conference on Learning Representations, Cited by: §1.
 [34] (2021) Deep longtailed learning: a survey. arXiv preprint arXiv:2110.04596. Cited by: §1, §3.2.
 [35] (2020) Bbn: bilateralbranch network with cumulative learning for longtailed visual recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9719–9728. Cited by: §2.1, §2.3.