HarmoFL: Harmonizing Local and Global Drifts in Federated Learning on Heterogeneous Medical Images

Multiple medical institutions collaboratively training a model using federated learning (FL) has become a promising solution for maximizing the potential of data-driven models, yet the non-independent and identically distributed (non-iid) data in medical images is still an outstanding challenge in real-world practice. The feature heterogeneity caused by diverse scanners or protocols introduces a drift in the learning process, in both local (client) and global (server) optimizations, which harms the convergence as well as model performance. Many previous works have attempted to address the non-iid issue by tackling the drift locally or globally, but how to jointly solve the two essentially coupled drifts is still unclear. In this work, we concentrate on handling both local and global drifts and introduce a new harmonizing framework called HarmoFL. First, we propose to mitigate the local update drift by normalizing amplitudes of images transformed into the frequency domain to mimic a unified imaging setting, in order to generate a harmonized feature space across local clients. Second, based on harmonized features, we design a client weight perturbation guiding each local model to reach a flat optimum, where a neighborhood area of the local optimal solution has a uniformly low loss. Without any extra communication cost, the perturbation assists the global model to optimize towards a converged optimal solution by aggregating several local flat optima. We have theoretically analyzed the proposed method and empirically conducted extensive experiments on three medical image classification and segmentation tasks, showing that HarmoFL outperforms a set of recent state-of-the-art methods with promising convergence behavior.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 5

page 7

page 13

page 14

03/22/2022

FedDC: Federated Learning with Non-IID Data via Local Drift Decoupling and Correction

Federated learning (FL) allows multiple clients to collectively train a ...
03/18/2022

Closing the Generalization Gap of Cross-silo Federated Medical Image Segmentation

Cross-silo federated learning (FL) has attracted much attention in medic...
06/04/2021

Local Adaptivity in Federated Learning: Convergence and Consistency

The federated learning (FL) framework trains a machine learning model us...
04/27/2022

Minimizing Client Drift in Federated Learning via Adaptive Bias Estimation

In Federated Learning a number of clients collaborate to train a model w...
03/30/2020

Adaptive Personalized Federated Learning

Investigation of the degree of personalization in federated learning alg...
10/01/2021

Personalized Retrogress-Resilient Framework for Real-World Medical Federated Learning

Nowadays, deep learning methods with large-scale datasets can produce cl...
04/23/2022

Federated Geometric Monte Carlo Clustering to Counter Non-IID Datasets

Federated learning allows clients to collaboratively train models on dat...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Figure 1: Loss landscape visualization of two clients without harmonization (left), with only local harmonization (middle) and our complete method of HarmoFL (right). The vertical axis shows the loss (denoting the solution for global objective as and each local objective as ), and the horizontal plane represents a parameter space centered at the global model weight. (See Introduction for detailed explanation.)

Multi-site collaborative training of deep networks is increasingly important for maximizing the potential of data-driven models in medical image analysis Shilo et al. (2020); Peiffer-Smadja et al. (2020); Dhruva et al. (2020), however, the data sharing is still restricted by some legal and ethical issues for protecting patient data. Federated learning recently allows a promising decentralized privacy-preserving solution, in which different institutions can jointly train a model without actual data sharing, i.e., training client models locally and aggregating them globally  McMahan et al. (2017); Kaissis et al. (2020).

Despite the recent promising progress achieved by FL in medical image analysis Dou et al. (2021); Rieke et al. (2020); Sheller et al. (2020); Roth et al. (2020); Ju et al. (2020), the non-independent and identically distributed (non-iid) data is still an outstanding challenge in real-world practice Kairouz et al. (2019); Hsieh et al. (2020); Xu et al. (2021). Non-iid issue typically happens, and the device vendors or data acquisition protocols are responsible for heterogeneity in the feature distributions Aubreville et al. (2020); Liu et al. (2020). For example, the appearance of histology images varies due to different staining situations, and MRI data of different hospitals suffer from feature distribution shifts associated with various scanners or imaging protocols.

Previous literature has demonstrated, both empirically and theoretically, that such data heterogeneity across clients introduces drift in both local (client) and global (server) optimizations, making the convergence slow and unstable Zhao et al. (2018); Li et al. (2019); Karimireddy et al. (2020). Specifically, in the local update, each client model will be optimized towards its own local optima (i.e., fitting its individual feature distribution) instead of solving the global objective, which raises a drift across client updates. Meanwhile, in the global update that aggregates these diverged local models, the server model is further distracted by the set of mismatching local optima, which subsequently leads to a global drift at the server model. Fig. 1 intuitively illustrates such local and global drifts via loss landscape visualization Li et al. (2018), in an example of two non-iid clients. The vertical axis shows the loss at each client (denoting the solution for global objective as and for each local objective as ), and the horizontal plane represents a parameter space centered at the specific parameters of global model . With the same objective function and parameter initialization for each client, we can see the local solution of two clients in the first column are significantly different, which indicates the drift across local client updates. Globally, a current good solution may achieve a relatively low loss for both clients. However, since each client has its own shaped loss landscape, optimizing the current solution towards global optima is difficult and the aggregation of the two diverged local solutions further distracts the current good solution.

To address such non-iid problem in FL, existing works can be mainly divided into two groups, which correspond to tackling the drift locally or globally. The local side emphasizes how to better normalize those diverse data. For example, Sheller et al. (2018) conducted a pioneer study and proposed using data pre-processing to reduce data heterogeneity. Recently, FedBN Li et al. (2021b)

kept batch normalization layers locally to normalize local data distribution. As for the global server side, adjusting aggregation weight is a typical strategy, for instance,

Yeganeh et al. (2020) used inverse distance to adaptively reweight aggregation. Recently, the framework FedAdam Reddi et al. (2021) is proposed for introducing adaptive optimization to stabilize server update in heterogeneous data. However, these methods tackle heterogeneity partially via either client or server update. The local and global drifts are essentially coupled, yet how to jointly solve them as a whole still remains unclear.

In this paper, we consider both client and server updates and propose a new harmonizing strategy to effectively solve the data heterogeneity problem for federated medical image analysis. First, to mitigate local drift at the local update, we propose a more effective normalization strategy via amplitude normalization, which unifies amplitude components from images decomposed in the frequency domain. Then for the global drift, although weighted parameter aggregation is widely adopted, the coordination can fail over heterogeneous clients with large parameter differences. Thus, we aim to promote client models that are easy to aggregate rather than design a new re-weighting strategy. Based on the harmonized feature space, we design a weight-perturbation strategy to reduce the server update drift. The perturbation is generated locally from gradients and applied on the client model to constrain a neighborhood area of the local converged model to have a uniformly low loss. With the perturbation, each client finds a shared flat optimal solution that can be directly aggregated with others, assisting the global model to optimize towards a converged optimal solution. As can be observed from the last two columns of Fig. 1, the proposed amplitude normalization for local harmonization well mitigates the distance between global and local solutions, and with the weight perturbation for global harmonization, our approach achieves flat client solutions that can be directly aggregated to obtain an optimal global model.

Our main contributions are highlighted as follows:

  • We propose to effectively mitigate the local update drift by normalizing the frequency-space amplitude component of different images into a unified space, which harmonizes non-iid features across clients.

  • Based on the harmonized features, we further design a novel weight-perturbation based strategy to rectify global server update shift without extra communication cost.

  • To the best of our knowledge, we are the first to simultaneously address both local and global update drifts for federated learning on heterogeneous medical images. We have also theoretically analyzed the proposed HarmoFL framework from the aspect of gradients similarity, showing that drift caused by data heterogeneity is bounded.

  • We conduct extensive experiments on three medical image tasks, including breast cancer histology image classification, histology nuclei segmentation, and prostate MRI segmentation. Our HarmoFL significantly outperforms a set of latest state-of-the-art FL methods.

2 Related Work

There have been many methods proposed trying to improve federated learning on heterogeneous data and can be mainly divided into two groups, including improvements on local client training and on global server aggregation.

Local client training:

Literature towards improving the client training to tackle local drift includes, using data pre-processing to reduce data heterogeneity Sheller et al. (2018), introducing a domain loss for heterogeneous electroencephalography classification Gao et al. (2019), using simulated CT volumes to mitigate scanner differences in cardiac segmentation Li et al. (2020a), adding a proximal term to penalize client update towards smaller weight differences between client and server model Li et al. (2020b), and learning affine transformation against shift under the assumption that data follows an affine distribution Reisizadeh et al. (2020). Recently, both SiloBN Andreux et al. (2020) and FedBN Li et al. (2021b) propose keeping batch normalization locally to help clients obtain similar feature distributions, and MOON Li et al. (2021a) uses contrastive learning on latent feature representations at the client-side to enhance the agreements between local and global models.

Global server aggregation:

Besides designs on the client-side, many methods are proposed towards reducing global drift with a focus on the server-side. Zhao et al. (2018) create a small subset of data that is globally shared across clients, and many other methods improve the aggregation strategy, e.g. Yeganeh et al. (2020) calculate the inverse distance to re-weight aggregation and FedNova Wang et al. (2020) proposes to use normalized stochastic gradients to perform global model aggregation rather than the cumulative raw local gradient changes. Very recently, FedAdam Reddi et al. (2021) introduces adaptive optimization into federated learning to stabilize convergence in heterogeneous data.

However, the above-mentioned methods partially address the drift either from the local client or global server perspective. Instead, our approach aims jointly mitigating the two coupled drifts both locally and globally.

3 Methodology

To address the non-iid issues from both local and global aspects, we propose an effective new federated learning framework of HarmoFL. We start with the formulation of federated heterogeneous medical images analysis, then describe amplitude normalization and weight perturbation towards reducing local drift and global drift respectively. At last, we give a theoretical analysis of HarmoFL.

3.1 Preliminaries

Denote as the joint image and label space over clients. A data sample is an image-label pair with , and the data sampled from a specific -th client follows data distribution

. In this work, we focus on the non-iid feature shift. Given the joint probability

of feature of image and label , we have varies even if is the same or varies across clients while is unchanged.

Our proposed HarmoFL aims to improve federated learning for both local client training and global aggregation. The federated optimization objective is formulated as follows:

(1)

where is the local objective function and

is the loss function defined by the learned model

and sampled pair . For the -th client, is the corresponding weight such that and . The is a weight perturbation term and is the harmonized feature distribution obtained by our amplitude normalization. Specifically, we propose a new amplitude normalization operator harmonizing various client features distribution to mitigate client update drift. The normalizer manipulates amplitudes of data in the frequency space without violating the local original data preserving. Based on the harmonized features, we can generate a weight perturbation for each client without any extra communication cost. The perturbation forces each client reaching a uniformly low error in a neighborhood area of local optima, thus reducing the drift in the server.

Figure 2: Amplitude normalization that harmonizes local client features. Phase components are strictly kept locally and only the average amplitude from each client is shared.

3.2 Amplitude normalization for local training

Structure semantic is an important factor to support medical imaging analysis while low-level statistics (e.g. color, contrast) usually help to differentiate the structure. By decomposing the two parts, we are able to harmonize low-level features across hospitals while preserving critical structures. Using the fast Fourier transform 

Nussbaumer (1981), we can transform images into the frequency space signals and decompose out the amplitude spectrum which captures low-level features. Considering the patient level privacy sensitivity and forbiddance of sharing original images, we propose only using averaged amplitude term during communication to conduct the normalization.

More specifically, for input image from the -th client, we transform each channel of image into the frequency space signals . Then we can split the real part and the imaginary part from the frequency signals . The amplitude and phase component can be expressed as:

(2)

Next, we normalize the amplitude component of each image batch-wise with a moving average term. For the -th batch of sampled images, first we decompose the amplitude and phase for each single image in the batch. We calculate the in-batch average amplitude and update the average amplitude with a decay factor :

(3)

where is the average amplitude calculated from the previous batch and this term is set to zero for the first batch. The harmonizes low-level distributions inside a client and keeps tracking amplitudes of the whole client images. With the updated average term, each original image of the -th batch is normalized using the average amplitude and its original phase component in below:

(4)

where is the inverse Fourier transform. After client training, the amplitude normalization layer which only contains the average amplitude information will be sent to the central server and generate a global amplitude. This global amplitude compromises various low-level visual features across clients and can help reduce client update drift at the next federated round. In practice, we find that only communication at the first round and fix the global amplitude can well harmonize non-iid features as well as saving the communication cost.

3.3 Weight perturbation for global aggregation

The above-proposed amplitude normalization allows each client to optimize within a harmonized feature space, mitigating the local drift. Based on the harmonized feature space, we further aim to promote client models that are easy to aggregate, thus rectifying the global drift. As the global aggregation is typically the weighted averaging, a client with flat optima can well coordinate with other clients than sharp local minima Keskar et al. (2016), which shows large error increases even with minor parameter change. Motivated from adversarial training, we propose a local optimization objective of each client as below:

(5)

where is the adversarial image within a -norm bounded -ball centered at original image , the adversarial image is generated with the same label of but different feature shift. However, the generating process has an extra communication burden since we have to transfer feature distribution information across clients. Based on the harmonized features, we carefully design a weight perturbation to effectively solve the Eq. (5). With a little abuse of notation, instead of generating adversarial images bonded by the term , we propose a new as a perturbation that is directly applied to model parameters. The perturbation is self-generated using gradients from the harmonized feature and has no extra communication cost. Formally, for the -th batch, we first calculate the gradients of client model on the amplitude normalized feature , which is calculated from Eq. (4). Then we use the Euclidean norm to normalize the gradients and obtain the perturbation term for the current iteration of gradient descent:

(6)

where is a hyper-parameter to control the degree of perturbation. The flat area around the local client optimum can be expanded with the larger , and the choice of is studied in the experiment section. After obtaining the perturbation term , we minimize the loss on the parameter-perturbated model as below:

(7)

where is the model from the previous batch and is the local client learning rate. After iteratively update, each local client model is gradually driven towards an optimal solution that holds a uniformly low loss around its neighborhood area (i.e. flat optimum), thus promoting the aggregation and getting rid of trapped into the local sharp minima.

Input: communication rounds , number of clients , mini-batch steps , client learning rate , global learning rate , hyper-parameter
Output: The final global model

1:Initialize server model
2:for  do
3:     for  in parallel do
4:          send the global model to client
5:         for  do client training
6:              sample a batch of data pairs of
7:              
8:               Eq. (3)
9:               Eq. (4)
10:               Eq. (6)
11:              
12:               Eq. (7)
13:         end for
14:         return send client model to server
15:     end for
16:     
17:     
18:end for
19:return
Algorithm 1 Harmonizing Local and Global Drifts in FL

3.4 Theoretical analysis for HarmoFL

With the help of amplitude normalization and weight perturbation, we constrain the client update with less dissimilarity and achieve a model having a relatively low error with a range of parameter changes. To theoretically analyze HarmoFL, we transform our approach into a simplified setting where we interpret both the amplitude normalization and weight perturbation from the gradients perspective. For amplitude normalization, it reduces gradients dissimilarity between clients by harmonizing the non-iid features. In the meanwhile, the weight perturbation forces clients to achieve flat optima, in which the loss variation is constrained, making the gradients change mildly. So both amplitude normalization and weight perturbation bound gradient differences across clients, and this assumption has also been widely explored in different forms Yin et al. (2018); Li et al. (2020b); Vaswani et al. (2019); Karimireddy et al. (2019). Based on the standard federated optimization objective, we formulate a new form in below:

(8)

where and is a non-negative constant and is the image-label pair from the -th client’s distribution. To quantify the overall drift between client and server models at the -th communication round, we define the overall drift term as below:

(9)

where is the mini-batch steps. We theoretically show that with the bounded gradient difference, our HarmoFL strategy is guaranteed to have an upper bound on the overall drift caused by non-iid data.

Theorem 3.1

With the shift term defined in Eq. (9

), assume the gradient dissimilarity and variance are bounded and the functions

are -smooth, denote the effective step-size , we have the upper bound for the overall drift of our HarmoFL below:

where when functions are convex, and for non-convex situation, we have . This theorem gives the upper bound with both the convex and non-convex assumptions for .

Please find the notation table in Appendix A. All assumptions and proofs are formally given in Appendix B. The proof sketch is applying our extra gradient differences constraint in the subproblem of one round optimization and using the bounded dissimilarity. The shift bound is finally unrolled from a recursion format of one round progress.

4 Experiments

In this section, we extensively evaluate our method to demonstrate that harmonizing local and global drifts are beneficial for clients with heterogeneous features. Our harmonizing strategy, HarmoFL, achieves higher performance as well as more stable convergence compared with other methods on feature non-iid datasets. This is shown on the breast cancer histology image classification, histology nuclei segmentation, and prostate MRI segmentation. All results reported are the average of three repeating runs with a standard deviation of different random seeds. More results please refer to Appendix 

C.

4.1 Dataset and experimental settings

Breast cancer histology image classification.

We use the public tumor dataset Camelyon17, which contains 450,000 histology images with different stains from 5 different hospitals Bandi et al. (2018). As shown in Fig. 3, we take each hospital as a single client, and images from different clients have heterogeneous appearances but share the same label distribution (i.e. normal and tumor tissues). We use a deep network of DenseNet121 Huang et al. (2017)

and train the model for 100 epochs at the client-side with different communication frequencies. We use cross-entropy loss and SGD optimizer with a learning rate of

.

Histology nuclei segmentation.

For the cell nuclei segmentation, we gather three public datasets, including MoNuSAC2020 Verma et al. (2021), MoNuSAC2018 Kumar et al. (2019) and TNBC Naylor et al. (2018). For data from MoNuSAC2020, we divide them into 4 clients according to different hospitals they come from and form 6 clients in total. We use U-Net Ronneberger et al. (2015) and train the model for 500 communication rounds with 1 local update epoch for each communication round. We use segmentation Dice coefficient (Dice) loss and SGD optimizer with a learning rate of .

Prostate MRI segmentation.

For the prostate segmentation, we use a multi-site prostate segmentation dataset Liu et al. (2020) which contains 6 different data sources from 3 public datasets Nicholas et al. (2015); Lemaître et al. (2015); Litjens et al. (2014). We regard each data source as a client and train the U-Net using Adam optimizer with a learning rate of , momentum of 0.9 and 0.99.

We report the performance of global models, i.e., the final results of our overall framework. The model was selected using the separate validation set and evaluated on the testing set. To avoid distracting focus on the feature non-iid problem due to data imbalance, we truncate the sample size of each client to their respective smaller number in histology image classification and prostate MRI segmentation task, but we keep the client data imbalance in the nuclei segmentation task to demonstrate the performance with data quantity difference. If not specified, our default setting for the local update epoch is . We use the momentum of and weight decay of for all optimizers in three tasks. We empirically set the decay factor to and set the degree of perturbation term to by grid search. For more dataset and implementation details, please refer to Appendix C.

Figure 3: Examples of breast histology images of normal and tumor tissues from five clients, showing large heterogeneity.
Method Histology Nuclei Segmentation (Dice %) Prostate MRI Segmentation (Dice %)
A B C D E F Avg. A B C D E F Avg.
FedAvg 73.44 73.06 72.52 68.91 67.33 49.69 67.49 90.04 94.31 92.60 92.21 90.14 89.36 91.44
(PMLR2017) (0.02) (0.24) (0.85) (0.34) (0.86) (0.34) (9.06) (1.27) (0.28) (0.66) (0.71) (0.27) (1.76) (1.91)
FedProx 73.49 73.11 72.45 69.01 67.33 49.56 67.49 90.65 94.60 92.64 92.19 89.36 87.07 91.08
(MLSys2020) (0.07) (0.19) (0.94) (0.34) (0.86) (0.34) (9.12) (1.95) (0.30) (1.03) (0.15) (0.97) (1.53) (2.66)
FedNova 73.40 73.01 71.50 69.23 67.46 50.68 67.55 90.73 94.26 92.73 91.91 90.01 89.94 91.60
(NeurIPS2020) (0.05) (0.38) (1.35) (0.34) (1.07) (0.34) (8.57) (0.41) (0.08) (1.29) (0.61) (0.87) (1.54) (1.70)
FedAdam 73.53 72.91 71.74 69.26 66.69 49.72 67.31 90.02 94.84 93.30 91.70 90.17 87.77 91.30
(ICLR2021) (0.08) (0.24) (1.33) (0.50) (1.18) (0.11) (8.98) (0.29) (0.11) (0.79) (0.16) (1.46) (1.35) (2.53)
FedBN 72.50 72.51 74.25 64.84 68.39 69.11 70.27 92.68 94.83 93.77 92.32 93.20 89.68 92.75
(ICLR2021) (0.81) (0.13) (0.28) (0.93) (1.13) (0.94) (3.47) (0.52) (0.47) (0.41) (0.19) (0.45) (0.60) (1.74)
MOON 72.85 71.92 69.23 69.00 65.08 48.26 66.06 91.79 93.63 93.01 92.61 91.22 91.14 92.23
(CVPR2021) (0.46) (0.37) (2.29) (0.71) (0.73) (0.66) (9.13) (1.64) (0.21) (0.75) (0.53) (0.61) (0.88) (1.01)
HarmoFL 74.98 75.21 76.63 76.59 73.94 69.20 74.42 94.06 95.26 95.28 93.51 94.05 93.53 94.28
(Ours) (0.36) (0.57) (0.20) (0.77) (0.13) (1.23) (2.76) (0.47) (0.38) (0.33) (0.79) (0.50) (1.02) (0.80)
Table 1: Results for histology nuclei segmentation and prostate MRI segmentation. The results of the Dice coefficient are reported. Each column represents one client and the Avg. is abbreviated for the average Dice.
Method Breast Cancer Histology Image Classification
(Accuracy %)
A B C D E Avg.
FedAvg 91.10 83.12 82.06 87.49 74.78 83.71
(PMLR2017) (0.46) (1.58) (8.52) (2.49) (3.19) (6.16)
FedProx 91.03 82.88 82.78 87.07 74.93 83.74
(MLSys2020) (0.50) (1.63) (8.56) (1.76) (3.05) (5.99)
FedNova 90.99 82.97 82.40 86.93 74.86 83.61
(NeurIPS2020) (0.54) (1.76) (9.21) (1.58) (3.12) (6.00)
FedAdam 87.45 80.38 76.89 89.27 77.86 82.37
(ICLR2021) (0.77) (2.03) (14.03) (1.28) (2.68) (5.65)
FedBN 89.35 90.25 94.16 94.04 68.87 87.33
(ICLR2021) (8.50) (1.66) (1.00) (2.32) (22.14) (10.55)
MOON 88.92 83.52 84.71 90.02 67.79 82.99
(CVPR2021) (1.54) (0.31) (5.14) (1.56) (2.06) (8.93)
HarmoFL 96.17 93.60 95.54 95.58 96.50 95.48
(Ours) (0.56) (0.67) (0.32) (0.27) (0.46) (1.13)
Table 2: Results for breast cancer histology images classification of different methods. Each column represents one client and the Avg. is abbreviated for the average accuracy.

4.2 Comparison with the state-of-the-arts

We compare our approach with recent state-of-the-art (SOTA) FL methods towards solving the non-iid problem. For local drifts, FedBN Li et al. (2021b) focuses on the non-iid feature shift with medical image applications, and both FedProx Li et al. (2020b) and a recent method MOON Li et al. (2021a) tackle the non-iid problem by constraining the dissimilarity between local and global models to reduce global aggregation shifts. FedAdam Reddi et al. (2021) and FedNova Wang et al. (2020) are proposed as general methods to tackle global drifts. For the breast cancer histology image classification shown in Table 2, we report the testing accuracy on five different clients and the average results. FedProx only achieves minor improvements than FedAvg, showing that only reducing global aggregation drift may not achieve promising results when local clients shifted severely. Another recent representation dissimilarity constraining method, MOON, boosts the accuracy on clients B, C, D but suffers from a large drop on client E. The reason may come from that images in client E appear differently as shown in Fig. 3, making the representations of client E failed to be constrained towards other clients. With the harmonized strategy reducing both local and global drifts, our method consistently outperforms others, reaching an accuracy of 95.48% on average, which is 8% higher than the previous SOTA (FedBN) for the feature non-iid problem. Besides, our method can help all heterogeneous clients benefit from the federated learning, where clients show a testing accuracy with a small standard deviation on average, while other methods show larger variance across clients.

For segmentation tasks, the experimental results of Dice are shown in Table 1 in the form of single client and average performance. On histology nuclei segmentation, HarmoFL significantly outperforms the SOTA method of FedBN and improves at least 4% on mean accuracy compared with all other methods. On the prostate segmentation task, as MRI images show fewer non-iid feature shifts than histology images, the performance gap of each client is not as large as nuclei segmentation. However, our method still consistently achieves the highest Dice of 94.28% and has a smaller standard deviation over different clients. Besides, we visualize the segmentation results to demonstrate a qualitative comparison, as shown in Fig. 4. Comparing with the first ground-truth column, due to the heterogeneous features, other federated learning methods either cover more or fewer areas in both prostate MRI and histology images. As can be observed from the second and fourth row, the heterogeneity in features also makes other methods fail to obtain an accurate boundary. But with the proposed harmonizing strategy, our approach shows more accurate and smooth boundaries.

Figure 4: Qualitative comparison on segmentation results with our method and other state-of-the-art methods. Top two rows for the task of prostate MRI segmentation and the bottom two rows for the task of histology nuclei segmentation.
Figure 5: (a) Convergence in terms of testing accuracy with communication rounds. (b) Comparison of FedAvg, FedBN, and our HarmoFL with different local training epochs. (c) Performance of HarmoFL with different perturbation radius.

4.3 Ablation study

We further conduct ablation study based on the breast cancer histology image classification to investigate key properties of our HarmoFL, including the convergence analysis, influence of different local update epochs, and the effects of weight perturbation degree.

Convergence analysis. To demonstrate the effectiveness of our method on reducing local update and server update drifts, we plot the testing accuracy curve on the average of five clients for communication rounds with local update epoch. As shown in Fig. 5(a), the curve of HarmoFL increases smoothly with communication rounds increasing, while the state-of-the-art method of FedBN Li et al. (2021b), shows unstable convergence as well as lower accuracy. From the curve of FedBN Li et al. (2021b), we can see the there are almost no improvements at the first 10 communication rounds, the potential reason is that the batch normalization layers in each local client model are not fully trained to normalize feature distributions.

Influence of local update epochs. Aggregating with different frequencies may affect the learning behavior, since less frequent communication will further enhance the drift due to the non-iid feature, and finally obtaining a worse global model. We study the effectiveness of HarmoFL with different local update epochs and the results are shown in Fig. 5(b). When the local epoch is 1, each client communicates with others frequently, all methods have a relatively high testing accuracy. With more local epochs added, the local client update drift is increased and the differences between local and global models become larger. Both FedAvg and FedBN suffer from large drifts and show severe performance drops. However, because our method enables each client to train the model using weight perturbation with harmonized features, HarmoFL significantly outperforms other approaches and shows robustness to larger drift brought by more local update epochs.

Effects of weight perturbation degree. We further analyze how the weight perturbation degree control hyper-parameter affects the performance of our method. Intuitively, the value of indicates the radius of the flat optima area. A small radius may hinder clients to find a shared flat area during aggregation, while a large radius creates difficulties in the optimization of reaching such a flat optimum. As shown in Fig 5

(c), we plot the average testing accuracy with standard error across clients by searching

. Our method reaches the highest accuracy when and the performance decreases with reducing. Besides, we can see our method can achieve more than 90% accuracy even without degree control (i.e. ).

5 Conclusion

This work proposes a novel harmonizing strategy, HarmoFL, which uses amplitude normalization and weight perturbation to tackle the drifts that exist in both local client and global server. Our solution gives inspiration on simultaneously solving the essentially coupled local and global drifts in FL, instead of regarding each drift as a separate issue. We conduct extensive experiments on heterogeneous medical images, including one classification task and two segmentation tasks, and demonstrates the effectiveness of our approach consistently. We further provide theoretical analysis to support the empirical results by showing the overall non-iid drift caused by data heterogeneity is bounded in our proposed HarmoFL. Overall, our work is beneficial to promote wider impact of FL in real world medical applications.

References

  • M. Andreux, J. O. du Terrail, C. Beguier, and E. W. Tramel (2020) Siloed federated learning for multi-centric histopathology datasets. In Domain Adaptation and Representation Transfer, and Distributed and Collaborative Learning, pp. 129–139. Cited by: §2.
  • M. Aubreville, C. A. Bertram, T. A. Donovan, C. Marzahl, A. Maier, and R. Klopfleisch (2020) A completely annotated whole slide image dataset of canine breast cancer to aid human breast cancer research. Scientific data 7 (1), pp. 1–10. Cited by: §1.
  • P. Bandi, O. Geessink, Q. Manson, M. Van Dijk, M. Balkenhol, M. Hermsen, B. E. Bejnordi, B. Lee, K. Paeng, A. Zhong, et al. (2018) From detection of individual metastases to classification of lymph node status at the patient level: the camelyon17 challenge. IEEE Transactions on Medical Imaging. Cited by: §C.1, §4.1.
  • S. S. Dhruva, J. S. Ross, J. G. Akar, B. Caldwell, K. Childers, W. Chow, L. Ciaccio, P. Coplan, J. Dong, H. J. Dykhoff, et al. (2020) Aggregating multiple real-world data sources using a patient-centered health-data-sharing platform. NPJ digital medicine 3 (1), pp. 1–9. Cited by: §1.
  • Q. Dou, T. Y. So, M. Jiang, Q. Liu, V. Vardhanabhuti, G. Kaissis, Z. Li, W. Si, H. H. Lee, K. Yu, et al. (2021)

    Federated deep learning for detecting covid-19 lung abnormalities in ct: a privacy-preserving multinational validation study

    .
    NPJ digital medicine 4 (1), pp. 1–11. Cited by: §1.
  • D. Gao, C. Ju, X. Wei, Y. Liu, T. Chen, and Q. Yang (2019) Hhhfl: hierarchical heterogeneous horizontal federated learning for electroencephalography. arXiv preprint arXiv:1909.05784. Cited by: §2.
  • J. Geiping, H. Bauermeister, H. Dröge, and M. Moeller (2020) Inverting gradients–how easy is it to break privacy in federated learning?. arXiv preprint arXiv:2003.14053. Cited by: §C.3.
  • K. Hsieh, A. Phanishayee, O. Mutlu, and P. Gibbons (2020)

    The non-iid data quagmire of decentralized machine learning

    .
    In International Conference on Machine Learning, pp. 4387–4398. Cited by: §1.
  • G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger (2017) Densely connected convolutional networks. In

    Proceedings of the IEEE conference on computer vision and pattern recognition

    ,
    pp. 4700–4708. Cited by: §C.1, §4.1.
  • C. Ju, D. Gao, R. Mane, B. Tan, Y. Liu, and C. Guan (2020)

    Federated transfer learning for eeg signal classification

    .
    In 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), pp. 3040–3045. Cited by: §1.
  • P. Kairouz, H. B. McMahan, B. Avent, A. Bellet, M. Bennis, A. N. Bhagoji, K. Bonawitz, Z. Charles, G. Cormode, R. Cummings, et al. (2019) Advances and open problems in federated learning. arXiv preprint arXiv:1912.04977. Cited by: §1.
  • G. A. Kaissis, M. R. Makowski, D. Rückert, and R. F. Braren (2020) Secure, privacy-preserving and federated machine learning in medical imaging. Nature Machine Intelligence 2 (6), pp. 305–311. Cited by: §1.
  • S. P. Karimireddy, S. Kale, M. Mohri, S. J. Reddi, S. U. Stich, and A. T. Suresh (2019) SCAFFOLD: stochastic controlled averaging for on-device federated learning.. Cited by: §3.4.
  • S. P. Karimireddy, S. Kale, M. Mohri, S. Reddi, S. Stich, and A. T. Suresh (2020) SCAFFOLD: stochastic controlled averaging for federated learning. In Proceedings of the 37th International Conference on Machine Learning, H. D. III and A. Singh (Eds.), Proceedings of Machine Learning Research, Vol. 119, pp. 5132–5143. External Links: Link Cited by: Appendix B, §1.
  • N. S. Keskar, D. Mudigere, J. Nocedal, M. Smelyanskiy, and P. T. P. Tang (2016) On large-batch training for deep learning: generalization gap and sharp minima. arXiv preprint arXiv:1609.04836. Cited by: §3.3.
  • N. Kumar, R. Verma, D. Anand, Y. Zhou, O. F. Onder, E. Tsougenis, H. Chen, P. Heng, J. Li, Z. Hu, et al. (2019) A multi-organ nucleus segmentation challenge. IEEE transactions on medical imaging 39 (5), pp. 1380–1391. Cited by: §C.1, §4.1.
  • G. Lemaître, R. Martí, J. Freixenet, J. C. Vilanova, P. M. Walker, and F. Meriaudeau (2015) Computer-aided detection and diagnosis for prostate cancer based on mono and multi-parametric mri: a review. Computers in biology and medicine 60, pp. 8–31. Cited by: §C.1, §4.1.
  • D. Li, A. Kar, N. Ravikumar, A. F. Frangi, and S. Fidler (2020a) Federated simulation for medical imaging. In Medical Image Computing and Computer Assisted Intervention (MICCAI), pp. 159–168. Cited by: §2.
  • H. Li, Z. Xu, G. Taylor, C. Studer, and T. Goldstein (2018) Visualizing the loss landscape of neural nets. In Neural Information Processing Systems, Cited by: §C.3, §1.
  • Q. Li, B. He, and D. Song (2021a) Model-contrastive federated learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10713–10722. Cited by: §2, §4.2.
  • T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V. Smith (2020b) Federated optimization in heterogeneous networks. In Conference on Machine Learning and Systems, Cited by: Appendix B, §2, §3.4, §4.2.
  • X. Li, K. Huang, W. Yang, S. Wang, and Z. Zhang (2019) On the convergence of fedavg on non-iid data. In International Conference on Learning Representations, Cited by: §1.
  • X. Li, M. Jiang, X. Zhang, M. Kamp, and Q. Dou (2021b) FedBN: federated learning on non-IID features via local batch normalization. In International Conference on Learning Representations, External Links: Link Cited by: §1, §2, §4.2, §4.3.
  • G. Litjens, R. Toth, W. van de Ven, C. Hoeks, S. Kerkstra, B. van Ginneken, G. Vincent, G. Guillard, N. Birbeck, J. Zhang, et al. (2014) Evaluation of prostate segmentation algorithms for mri: the promise12 challenge. Medical image analysis 18 (2), pp. 359–373. Cited by: §C.1, §4.1.
  • Q. Liu, Q. Dou, L. Yu, and P. A. Heng (2020) Ms-net: multi-site network for improving prostate segmentation with heterogeneous mri data. IEEE Transactions on Medical Imaging. Cited by: §C.1, §1, §4.1.
  • B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas (2017) Communication-efficient learning of deep networks from decentralized data. In Artificial Intelligence and Statistics, pp. 1273–1282. Cited by: §1.
  • P. Naylor, M. Laé, F. Reyal, and T. Walter (2018) Segmentation of nuclei in histopathology images by deep regression of the distance map. IEEE transactions on medical imaging 38 (2), pp. 448–459. Cited by: §C.1, §4.1.
  • B. Nicholas, M. Anant, H. Henkjan, F. John, K. Justin, et al. (2015) Nci-proc. ieee-isbi conf. 2013 challenge: automated segmentation of prostate structures. Note: The Cancer Imaging Archive Cited by: §C.1, §4.1.
  • H. J. Nussbaumer (1981) The fast fourier transform. In Fast Fourier Transform and Convolution Algorithms, pp. 80–111. Cited by: §3.2.
  • N. Peiffer-Smadja, R. Maatoug, F. Lescure, E. D’ortenzio, J. Pineau, and J. King (2020) Machine learning for covid-19 needs global collaboration and data-sharing. Nature Machine Intelligence 2 (6), pp. 293–294. Cited by: §1.
  • S. J. Reddi, Z. Charles, M. Zaheer, Z. Garrett, K. Rush, J. Konečný, S. Kumar, and H. B. McMahan (2021) Adaptive federated optimization. In International Conference on Learning Representations, External Links: Link Cited by: Appendix B, §1, §2, §4.2.
  • A. Reisizadeh, F. Farnia, R. Pedarsani, and A. Jadbabaie (2020) Robust federated learning: the case of affine distribution shifts. arXiv preprint arXiv:2006.08907. Cited by: §2.
  • N. Rieke, J. Hancox, W. Li, F. Milletari, H. R. Roth, S. Albarqouni, S. Bakas, M. N. Galtier, B. A. Landman, K. Maier-Hein, et al. (2020) The future of digital health with federated learning. NPJ digital medicine 3 (1), pp. 1–7. Cited by: §1.
  • O. Ronneberger, P. Fischer, and T. Brox (2015) U-net: convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pp. 234–241. Cited by: §C.1, §4.1.
  • H. R. Roth, K. Chang, P. Singh, N. Neumark, W. Li, V. Gupta, S. Gupta, L. Qu, A. Ihsani, B. C. Bizzo, et al. (2020) Federated learning for breast density classification: a real-world implementation. In Domain Adaptation and Representation Transfer, and Distributed and Collaborative Learning, pp. 181–191. Cited by: §1.
  • M. J. Sheller, B. Edwards, G. A. Reina, J. Martin, S. Pati, A. Kotrotsou, M. Milchenko, W. Xu, D. Marcus, R. R. Colen, et al. (2020) Federated learning in medicine: facilitating multi-institutional collaborations without sharing patient data. Scientific reports 10 (1), pp. 1–12. Cited by: §1.
  • M. J. Sheller, G. A. Reina, B. Edwards, J. Martin, and S. Bakas (2018) Multi-institutional deep learning modeling without sharing patient data: a feasibility study on brain tumor segmentation. In International MICCAI Brainlesion Workshop, pp. 92–104. Cited by: §1, §2.
  • S. Shilo, H. Rossman, and E. Segal (2020) Axes of a revolution: challenges and promises of big data in healthcare. Nature medicine 26 (1), pp. 29–38. Cited by: §1.
  • Q. Tong, G. Liang, and J. Bi (2020) Effective federated adaptive gradient methods with non-iid decentralized data. arXiv preprint arXiv:2009.06557. Cited by: Appendix B.
  • S. Vaswani, F. Bach, and M. Schmidt (2019)

    Fast and faster convergence of sgd for over-parameterized models and an accelerated perceptron

    .
    In The 22nd International Conference on Artificial Intelligence and Statistics, pp. 1195–1204. Cited by: §3.4.
  • R. Verma, N. Kumar, A. Patil, N. C. Kurian, S. Rane, S. Graham, Q. D. Vu, M. Zwager, S. E. A. Raza, N. Rajpoot, et al. (2021) MoNuSAC2020: a multi-organ nuclei segmentation and classification challenge. IEEE Transactions on Medical Imaging. Cited by: §C.1, §4.1.
  • J. Wang, Q. Liu, H. Liang, G. Joshi, and H. V. Poor (2020) Tackling the objective inconsistency problem in heterogeneous federated optimization. Advances in Neural Information Processing Systems 33. Cited by: §2, §4.2.
  • J. Xu, B. S. Glicksberg, C. Su, P. Walker, J. Bian, and F. Wang (2021) Federated learning for healthcare informatics. Journal of Healthcare Informatics Research 5 (1), pp. 1–19. Cited by: §1.
  • Y. Yeganeh, A. Farshad, N. Navab, and S. Albarqouni (2020) Inverse distance aggregation for federated learning with non-iid data. In Domain Adaptation and Representation Transfer, and Distributed and Collaborative Learning, pp. 150–159. Cited by: §1, §2.
  • D. Yin, A. Pananjady, M. Lam, D. Papailiopoulos, K. Ramchandran, and P. Bartlett (2018) Gradient diversity: a key ingredient for scalable distributed learning. In International Conference on Artificial Intelligence and Statistics, pp. 1998–2007. Cited by: §3.4.
  • Y. Zhao, M. Li, L. Lai, N. Suda, D. Civin, and V. Chandra (2018) Federated learning with non-iid data. arXiv preprint arXiv:1806.00582. Cited by: §1, §2.

Appendix A Notation Table

Notations Description
global objective function.
local objective function.
joint image and label space.
a pair of data sample.
communication rounds.
number of clients.
mini-batch steps.
size of sampled images in one batch.
loss function defined by the learned model
and sampled pair.
weight in the federated optimization
objective of the -th client.
data distribution for the -th client.
the amplitude normalization operation.
distribution harmonized by
amplitude normalization.
amplitude for each single image in a batch.
phase for each single image in a batch.
average amplitude calculated from
the current batch.
perturbation applied to model parameters.
hyper-parameter to control the degree
of perturbation.
client model learning rate.
global model learning rate.
effective step-size.
global server model at the -th
communication round.
the -th client model after mini-batch steps.
gradients of weight with perturbation.
a non-negative constant to bound gradients
difference.
overall non-iid drift term.
Table 3: Notations occurred in the paper.

Appendix B Theoretical analysis of bounded drift

We give the theoretical analysis and proofs of HarmoFL in this section. First, we state the convexity, smoothness, and bounder gradient assumptions about the local function and global function , which are typically used in optimization literature Li et al. (2020b); Reddi et al. (2021); Karimireddy et al. (2020); Tong et al. (2020).

b.1 Assumptions

Assumption B.1

() {} are and satisfy

Assumption B.2

(bounded gradient dissimilarity) there exist constants and such that

if {} are convex, we can relax the assumption to

Assumption B.3

(bounded variance) is unbiased stochastic gradient of with bounded variance

Lemma B.4

(relaxed triangle inequality) Let be vectors in . Then the following are true:

Proof: The proof of the first statement for any follows from the identity:

For the second inequality, we use the convexity of and Jensen’s inequality

Lemma B.5

(separating mean and variance) Let be random variables in which are not necessarily independent. First suppose that their mean is and variance is bounded as . Then, the following holds

Proof. For any random variable , implying

expand the above equation using Lemma B.4, we can have

b.2 Theorem of bounded drift and proof

We first restate the Theorem 3.1 with some additional details and then give the proof. Recall that .

Theorem B.6

Suppose that the functions satisfies assumptions B.1B.2 and  B.3. Denote the effective step-size , then the updates of HarmoFL have a bounded drift:

  • Convex:

  • Non-convex:

Proof:

First, we consider , then this theorem trivially holds, since for all . Then we assume , and have

For the first inequality, we separate the mean and variance, the second inequality is obtained by using the relaxed triangle inequality with . For the next equality, it is obtained with the definition of and the rest inequalities follow the assumptions on gradients. Up to now, we get the recursion format and then we unroll this and get below:

averaging over and , if are non-convex, we can have

if are convex, then we have

Appendix C Complete experiment details and results

In this section, we demonstrate details of the experimental setting and more results. For all experiments, we implement the framework with PyTorch library of Python version 3.6.10, and train and test our models on TITAN RTX GPU. The randomness of all experiments is controlled by setting three random seeds, i.e., 0, 1, and 2.

c.1 Experimental details

Breast cancer histology image classification. For the breast cancer histology image classification experiment, we use the all data from the public dataset Camelyon17 Bandi et al. (2018) dataset. This dataset comprises 450,000 patches of breast cancer metastasis in lymph node sections from 5 hospitals. The task is to predict whether a given region of tissue contains any tumor tissue or not. All the data are pre-processed into the shape of , for all clients, we use 20% of client data for test, for the reaming data, we use 80% for train and 20% for validation. For training details, we use the DenseNet121 Huang et al. (2017) and train it with a learning rate of 0.001 for 100 epochs, the batch size is 128. We use cross-entropy loss and SGD optimizer with the momentum of and weight decay of .
Histology nuclei segmentation. For the histology nuclei segmentation, we use three public datasets, including MoNuSAC2020 Verma et al. (2021), MoNuSAC2018 Kumar et al. (2019) and TNBC Naylor et al. (2018), and we further divide the MoNuSAC2020 into 4 clients. The criterion is according to the official multi-organ split, where each organ group contains several specific hospitals and has no overlap with other groups. All images are reshaped to . For all clients, we follow the original train-test split and split out 20% of training data for validation. For the training process, we use U-Net Ronneberger et al. (2015) and train the model for 500 epochs using segmentation Dice loss. We use the SGD optimizer with the momentum of and weight decay of . The batch size is and the learning rate is .
Prostate MRI segmentation. For the prostate MRI segmentation, we use a multi-site prostate segmentation dataset Liu et al. (2020) containing 6 different data sources Nicholas et al. (2015); Lemaître et al. (2015); Litjens et al. (2014). For all clients, we use 20% of client data for test, for the reaming data, we use 80% for train and 20% for validation. We use cross-entropy and dice loss together to train the U-Net and use Adam optimizer with a beta of and weight decay of . The communication rounds are the same as the nuclei segmentation task. Following Liu et al. (2020), we randomly rotate the image with the angle of and horizontally or vertically flip images.

c.2 Ablation studies

In this section we present the ablation experiments regarding our proposed two parts on all three datasets (see results in Tables 456). We start with the baseline method FedAvg and then add the amplitude normalization (AmpNorm), on top of this, we further utilize the weight perturbation (WeightPert) and thus completing our proposed whole framework HarmoFL. From all three tables can be observed that the amplitude normalization (the second row) exceeds the baseline with consistent performance gain on all clients for all three datasets, demonstrating the benefits of harmonizing local drifts. On top of this, adding the weight perturbation (the third row) to mitigate the global drifts can further boost the performance with a clear margin.

Baseline FedAvg Amp Norm (Local) Weight Pert (Global)
Breast Cancer Histology Image
Classification (Accuracy %)
A B C D E Avg
91.6 84.6 72.6 88.3 78.4 83.1
94.3 90.6 90.4 95.1 93.0 92.6
96.8 94.2 95.9 95.8 96.9 95.9
Table 4: Ablation studies for the classification task.
Baseline FedAvg Amp Norm (Local) Weight Pert (Global)
Nuclei Segmentation (Dice %)
A B C D E F Avg
73.4 72.9 72.5 69.0 67.6 50.0 67.6
74.3 74.8 75.2 74.9 72.3 64.6 72.7
75.4 75.8 76.8 77.0 74.1 68.1 74.5
Table 5: Ablation studies for the nuclei segmentation task.
Baseline FedAvg Amp Norm (Local) Weight Pert (Global)
Prostate MRI Segmentation (Dice %)
A B C D E F Avg
89.4 94.1 92.3 91.4 90.3 91.4 91.5
91.4 94.8 94.2 92.6 93.4 93.0 93.2
94.3 95.3 95.5 94.2 94.4 94.2 94.7
Table 6: Ablation studies for the prostate segmentation task.

c.3 Additional results

Loss landscape visualization on breast cancer histology image classification.

We use the loss landscape visualization Li et al. (2018) to empirically show the non-iid drift in terms of the for each client . As shown in Fig. 6, we draw the loss of FedAvg and our HarmoFL, each column represents a client. The 2-d coordination denotes a parameter space centered at the final obtained global model, loss value varies with the parameter change. From the first row can be observed that the model trained with FedAvg is more sensitive to a parameter change, and the minimal solution for each client does not match the global model (i.e., the center of the figure). The observation is consistent with the Introduction part, with the same objective function and parameter initialization for each client, the five local optimal solutions (local minima) have significantly different parameters, which is caused by the drift across client updates. Besides, the loss landscape shape of each client differs from others, making the global optimization fail to find a converged solution matching various local optimal solutions. However, from the second row of our proposed HarmoFL, we can observe the loss variation is smaller than FedAvg and robust to the parameter changes, showing flat solutions both locally and globally. We further draw loss values with a higher scale as shown in the third row, which means we change the parameter with a large range. From the third row can be observed that HarmoFL holds flat optima around the central area, even with 8 times wider parameter change range. Besides, the global solution well matches every client’s local optimal solution, demonstrating the effectiveness of reducing drifts locally and globally.

Figure 6: Loss landscape visualization of 5 clients in the breast cancer histology image classification task with different parameter changing scale. Our method is more robust to model parameter change, achieving a centered and flat area for both global solution and 5 local solutions, even with 8 times larger changing scales.

Gradient inversion attacks with the shared amplitudes across clients. To further explore the potential privacy issue regarding the inverting gradient attack by using our shared average amplitudes across client, we implement the inverting gradient method Geiping et al. (2020) and show results in below Fig. 7. We first extract the amplitude from the original images and then conduct the inverting gradient attack on the amplitudes which will be shared during the federated communication. The reconstruction results of the attack are shown at the last column. It implies that it is difficult (if possible) to perfectly reconstruct the original data solely using frequency information.

Figure 7: Examples of inverting gradient attack using the amplitude extracted from original images.

Visualization of amplitude normalization results. In this section we demonstrate the visual effects of our amplitude normalization. The qualitative results for the two segmentation datasets (Fig. 2 is the classification dataset) are shown in Fig. 8. Given that amplitudes mainly relate to low-level features, the AmpNorm operation would not affect the high-level semantic structures. In addition, though the normalization might introduce tiny changes in some ambiguous boundaries, the network with a large receptive field would have the ability to combat such local changes by using contextual information.

Figure 8: Qualitative visualization results of using amplitude normalization.

Visualization of segmentation results. More qualitative segmentation results comparison on both prostate MRI segmentation and histology nuclei segmentation tasks are shown in Fig. 9 and Fig. 10. From Fig. 9, as can be observed from the first row, alternative methods show several small separate parts for the segmentation on big cells while HarmoFL gives more complete segmentation results. As for small cells, HarmoFL has the capability to split each small region out while others may connect small nuclei together or only cover a very small explicit region. Specifically for the fourth row, others fail to segment some nuclei which have fewer differences with background, but our method demonstrates more accurate results. For the prostate MRI segmentation results shown in Fig. 10, the images from different hospitals show a shift in feature distributions, which brings difficulties for other methods to obtain a more accurate boundary. Especially for the third row, the compared methods fail to figure out the structure, while ours delineates the accurate boundary.

Figure 9: Qualitative comparison on histology nuclei segmentation results with our method and other state-of-the-art methods.
Figure 10: Qualitative comparison on prostate MRI segmentation results with our method and other state-of-the-art methods.