Log In Sign Up

Feather-Light Fourier Domain Adaptation in Magnetic Resonance Imaging

by   Ivan Zakazov, et al.

Generalizability of deep learning models may be severely affected by the difference in the distributions of the train (source domain) and the test (target domain) sets, e.g., when the sets are produced by different hardware. As a consequence of this domain shift, a certain model might perform well on data from one clinic, and then fail when deployed in another. We propose a very light and transparent approach to perform test-time domain adaptation. The idea is to substitute the target low-frequency Fourier space components that are deemed to reflect the style of an image. To maximize the performance, we implement the "optimal style donor" selection technique, and use a number of source data points for altering a single target scan appearance (Multi-Source Transferring). We study the effect of severity of domain shift on the performance of the method, and show that our training-free approach reaches the state-of-the-art level of complicated deep domain adaptation models. The code for our experiments is released.


page 8

page 12


TriGAN: Image-to-Image Translation for Multi-Source Domain Adaptation

Most domain adaptation methods consider the problem of transferring know...

Test-time Unsupervised Domain Adaptation

Convolutional neural networks trained on publicly available medical imag...

Light-weight Calibrator: a Separable Component for Unsupervised Domain Adaptation

Existing domain adaptation methods aim at learning features that can be ...

Source-free Domain Adaptation for Multi-site and Lifespan Brain Skull Stripping

Skull stripping is a crucial prerequisite step in the analysis of brain ...

Domain-Adversarial Training of Neural Networks

We introduce a new representation learning approach for domain adaptatio...

Self domain adapted network

Domain shift is a major problem for deploying deep networks in clinical ...

Test-time Fourier Style Calibration for Domain Generalization

The topic of generalizing machine learning models learned on a collectio...

1 Introduction

Magnetic Resonance Imaging (MRI) has become an irreplaceable tool in healthcare thanks to its capacity to produce high-resolution scans without ionizing radiation. The widespread use of the modality has helped to accumulate large volumes of miscellaneous imaging data, which have been fueling development of machine- and deep-learning methods, aiming to mimic diagnostic decisions. However, the real-life deployment of these methods is often hindered by an issue known as Domain Shift, which originates from a possible difference in train (source) and test (target) distributions. This difference might occur whenever source and target datasets are acquired with different machines or research protocols, and entails a need for proper Domain Adaptation (DA) [4].

To this end, modern DA in medical imaging includes a plethora of shallow and deep models [4]. Existing shallow DA methods are somewhat rudimentary, requiring human-engineered features [21, 22], and generally lagging in performance, while the deep ones are often heavy, slow, and barely interpretable (albeit accurate) [8, 15, 24].

One trait shared by all of the aforementioned methods is that they operate in the image space. It is only recently that the community has started to realize the potential of operating in k-space (also referred to as Fourier space or spectrum of an image) for tackling domain shift in MRI data with [11, 10] being the only two studies we were able to find. This is surprising given that MRI is a modality that yields k-space data representation by design.

In k-space, specific spectral components are responsible for different properties of an image, e.g., the high frequency components accentuate the edges and enhance details, while the low ones control contrast and the large-scale content. Besides, it is known that the semantic information is mostly stored in the phase component of the spectrum. This suggests an efficient strategy for tackling DA via “mixing” the source and the target spectra. The purpose of this ”mixing” (Fourier Domain Adaptation or FDA) is to transfer the style while preserving the patient-specific content, thereby compensating for the DA shift.

The idea is borrowed from the natural image domain [26] and adapted to MRI volumes, entailing a new “optimal style donor” selection module and a multi-source transferring routine (we use multiple source images when transferring the style to a given target image to further improve the performance).

We call our method feather-light because, unlike the modern go-to approaches, it does not involve any training, as we simply transfer the k-space components, characterizing the style of the source domain, to a target

scan during the test time. Despite its simplicity, the method performs on par with complicated deep DA models, such as those based on Generative Adversarial Networks (GANs)

[23]. Notably, the proposed method is also interpretable as it directly shows which style-carrying source frequencies alleviate the domain shift.

2 Related Work

A large part of Deep Domain Adaptation methods could be split into feature-level [2, 13] and image-level approaches. Among the image-level ones, the majority exploit the idea of GANs [10, 7, 23] for eradicating the difference in distribution between images from various domains. GANs, however, are difficult to train, lack explainability and might produce undesirable artefacts, which pose an even greater problem in the medical imaging context

Fourier Domain Adaptation (FDA) [26]

provides a feasible alternative to GANs, as image-to-image translation, performed via low-frequency spectra components swap (amplitudes only), is simple, predictable and yet yields SOTA-level results on the natural images. This method has been adapted for the medical imaging, with the earliest application being mitigation of domain shift, appearing in the synthetic ultrasound images

[17]. In [10] the authors applied FDA-based augmentation technique for the cardiac MRI segmentation, with the novelty being swapping both amplitudes and phases, which apparently is not very stable and may lead to changes in the image semantics [25]. [11]

applies FDA to federated learning in order to generate images exhibiting distribution characteristic of other clients, while


uses FDA as a proof-of-concept tool for obtaining ”poisoned” images, challenging for neural networks.

In [12]

the authors solve automatic polyp detection task via combining feature-level adaptation with FDA, while further improving the FDA component with sampling ”matching” source and target image pairs. The closer a target image is to the source one in terms of cosine similarity of their deep ResNet-50 features, the greater is the ”match” probability. We note that adding an additional deep model to the pipeline makes it more complicated, while we strive for simplicity.

One dismissed idea in the FDA area is the one we propose to denote multi-source transfer, i.e., performing a number of k-space components swaps with a single target image and multiple source images followed by averaging of the down-stream task predictions for various versions of the changed target.

3 Method

We base our approach on the Fourier Domain Adaptation technique [26] and summarize it in Fig. 1. This method consists of swapping the low-frequency amplitudes of an image spectrum with those of another image, the style of which should be borrowed. As amplitudes of the low-frequency spectrum components are mostly related to the low-level image characteristics, defining the style, this procedure is expected to align the source and the target distributions, thus compensating for the domain shift between them.

Figure 1: Fourier Domain Adaptation (FDA) for the Brain Segmentation task.

While in [26], the source data is transferred to the target style, and the deep neural network is then trained on the dataset, we note that in the clinical setting it would mean re-training the model for each new domain (e.g., a new clinic), which might complicate certification and clinical deployment. Moreover, the data (e.g., data from a hospital we need to adjust the model for) may appear to be scarce, which limits capabilities of the self-supervised training on target, another component of the original method.

Instead of , we focus on the adaptation with a single source model used for various targets, which are transferred into the source style during the test time. Mathematically speaking, we carry out the following procedure: .

The phase component of a spectrum remains intact, while the source style is injected with the low-frequency amplitudes we ”cut out” with (Fig.1).

As there is no additional training required, this setting is much more light-weight than the original one, but it is also more challenging in terms of reaching the optimal performance. To this end, we improve the method in the following ways:

  • We propose to carry out a multitude of swaps (Multi-Source Transfer) with the final result for slice t calculated as , where is the FDA procedure, performed on and ; reflects the number of source slices used for the style transfer (after preliminary experiments we set )

  • We design an approach for picking the optimal source slices

The intuition behind the latter feature is that as swapping the spectral components inevitably leads to artefacts, we should minimize this detrimental effect by choosing source and target which are as close as possible in terms of their semantics. To do so, we assess the ”closeness” with the Spectrual Residual Similarity (SR-SIM) semantic similarity measure [27] (Fig.2).

Figure 2: Multi-Source Transfer (MST) + SR-SIM source choice in the 2.5D fashion.

We consider several approaches to the optimal source ”style donor” search for the slices, belonging to the target scan. Firstly, we may average the similarity score between the corresponding slices in scans, thus obtaining the scan-to-scan similarity score (). We then select for Domain Adaptation most similar source scans (3D similarity).

We note that choosing the ”style donors” on the scan level might introduce unnecessary constraint. Alternatively, source ”style donors” for various slices of the target scan may come from different source scans. In this case, for we look for the slices closest to among (2D). A natural extension to this approach is broadening this set to , which we refer to as 2.5D (Fig.2). We set .

4 Experiments

4.1 Technical details

We conduct all the experiments on a public brain MR dataset called CC359 [20], which is formed of scans and various masks, among which are the brain segmentation masks. The scans are produced by one of MRI machines (Siemens, Philips, GE; 1.5T or 3T each), and thus fall into one of 6 domains of approximately equal size ( or scans). We perform affine registration of all the scans to MNI152 template using the FSL software [5, 6], and subsequently normalize voxel intensities to . We use the Surface Dice Score [14] as it appears to be a more reliable indicator of the brain segmentation quality than the standard Dice Score [19].

We solve the brain segmentation task with 2D U-Net with residual blocks, which we train for epochs (

iterations per epoch), using SGD optimizer with Nesterov momentum of

, combination of BCE and dice losses (weighted with and coefficients), and learning rate of , reduced to at epoch . We train the networks on crops grouped in batches of samples. The crops are sampled randomly at each iteration.

Source domains
sm15 sm3 ge15 ge3 ph15 ph3

Target domains

sm15 0.90 0.03 0.57 0.18 0.83 0.07 0.54 0.18 0.78 0.09 0.84 0.03
sm3 0.81 0.04 0.90 0.02 0.78 0.03 0.63 0.07 0.80 0.05 0.78 0.03
ge15 0.61 0.17 0.11 0.06 0.90 0.03 0.40 0.16 0.51 0.18 0.67 0.15
ge3 0.84 0.03 0.44 0.14 0.78 0.07 0.91 0.03 0.76 0.1 0.78 0.03
ph15 0.83 0.06 0.45 0.1 0.87 0.03 0.42 0.17 0.91 0.03 0.79 0.03
ph3 0.74 0.12 0.40 0.12 0.62 0.12 0.39 0.12 0.56 0.12 0.88 0.04
Table 1: Naive transferring (no Domain Adaptation applied).

We follow the methodology of the original Fourier Domain Adaptation (FDA) paper [26] with respect to the k-space swapping technique, with the notable difference of using the circular crop instead of the rectangular one, which is to take into account the radial symmetry of the spectrum components amplitudes.

4.2 Naive model transfer

Firstly, we consider a simple case of no Domain Adaptation applied. In this regard, we train base models on the corresponding domains, designating all but source scans for training (these are to ensure reaching the loss plateau when training). We then transfer these source-trained models to unseen domains, thus considering source-target pairs. We calculate each transferred model performance on the target test set of images. Besides, we also use -fold cross-validation to assess the model performance on the domain it was trained on.

As may be seen from Fig. 1, the magnitude of Domain Shift, i.e., the performance variability between the transferred model and the one which was initially trained on some domain changes significantly between the source-target pairs. As conducting subsequent experiments on all source-target pairs is computationally prohibitive, we decide to concentrate on 3 clusters, representing severe domain shift, medium domain shift and the subtle one. We sort the source-target pairs by the metric decline magnitude, and pick 2 pairs per cluster from the top, bottom, and middle of this sorted list.

Baselines : aver. optimal : opt. per pair
No DA StyleSegor Cycle Style Fast 3D 2.5D 2D 3D 2.5D 2D

Shift severity

severe #1 0.11 0.11 0.50 0.46 0.15 0.57 0.57 0.57 0.57 0.58 0.59
severe #2 0.39 0.46 0.64 0.58 0.12 0.50 0.48 0.47 0.51 0.48 0.48
medium #1 0.67 0.66 0.70 0.64 0.15 0.72 0.73 0.74 0.77 0.77 0.77
medium #2 0.74 0.69 0.69 0.61 0.11 0.72 0.69 0.68 0.75 0.74 0.73
subtle #1 0.84 0.85 0.60 0.41 0.17 0.81 0.81 0.82 0.83 0.83 0.84
subtle #2 0.87 0.82 0.46 0.55 0.11 0.85 0.83 0.84 0.87 0.86 0.86
average 0.60 0.60 0.60 0.54 0.14 0.7 0.68 0.69 0.72 0.71 0.71
Table 2: Comparison of the performance of various proposed methods with the baselines. Style is for StyleGAN [9], Cycle is for CycleGAN [23], Fast is for artistic stylization network [3].

4.3 Choosing the optimal

One of the most important FDA design choices is choosing the size of the swapping window . Specifically for this purpose we designate another target scans per source-target pair, on which the grid search over various values is performed. We consider strategies of devising the optimal , which correspond to actual clinical set-ups:

  • Optimal per Pair. Picking the , which proved to be the optimal one for each source-target pair. This set-up is motivated by the scenario, in which at least some target domain data (e. g., data coming from a new clinical center) is available and labelled, and thus may be used for setting the optimal , peculiar to this source-target pair

  • Averaged Optimal. Picking the , based on the grid-search results, averaged over all pairs, which corresponds to a broader scenario of setting a single ”standard” beta for all the pairs

: averaged optimal : optimal per pair

Shift severity

severe #1 0.57 0.41 0.40 0.59 0.57 0.53
severe #2 0.47 0.42 0.41 0.48 0.41 0.41
medium #1 0.74 0.78 0.76 0.77 0.78 0.76
medium #2 0.68 0.69 0.65 0.73 0.69 0.65
subtle #1 0.82 0.85 0.82 0.84 0.85 0.82
subtle #2 0.84 0.85 0.84 0.86 0.85 0.84
average 0.69 0.67 0.65 0.71 0.69 0.67
Table 3: The ablation study.

4.4 Results and Discussion

As was discussed in Section 3, we consider approaches to picking the ”best” source slices, which we denote 2D, 2.5D, and 3D. Furthermore, in line with 4.3 we devise from the target validation set by means of either global averaging or averaging per pair. The corresponding results are presented in Table 2 in comparison with the SOTA-level baselines of CycleGAN [23], StyleGAN [10] and Style-Segor [13]. We also consider another light-weight baseline [3].

We perform the ablation study (Table 3), comparing (a) SR-SIM chosen sources + Multi-source transfer (MST) (b) Multi-source transfer (MST) (c) A ”simple” swap.

2D vs. 2.5D vs. 3D. Interestingly, no significant difference could be observed between various SR-SIM-based source choice approaches. Subsequently, we concentrate on the 3D approach, as it appears to be both more intuitive and marginally better than the others

: Averaged Optimal vs. : Optimal per Pair. Picking on target Validation set in a pair-wise fashion gives only a minor advantage over devising it from the averaged target validation curve. Besides, the latter set-up does not require adjusting for a particular pair on a labelled subset of target data, and thus is more relevant in the clinical practice. Therefore, from now on we concentrate on the Averaged Optimal results analysis.

Our method (3D; : Averaged Optimal) vs. Baselines. While our method is outperformed by CycleGAN on severe #2 pair and by StyleSegor on subtle #1 pair, it is the only one demonstrating good performance across all the data shift magnitude range, since GANs fail to preserve even the ”naive” swap quality (no DA) in case of low domain shift and StyleSegor is barely improving the score in case of strong domain shift. Fast artistic image stylization [3], another light-weight method we consider as a baseline, does not demonstrate sufficiently good performance

Figure 3: Visual comparison of various approaches. In this particular case, we set =0.03

Ablation Studies. As could be seen in Table 3, both introducing Multi-Source Transfer and combining it with the SR-SIM-based source choice improves the score on average with the positive effect of the ”smart” source choice substantial for the instances of severe domain shift.

In Fig. 3, we visually compare our approach with the baselines, considering the cases of severe (top) and subtle (bottom) domain shift. For illustration purposes, we apply the method to the middle-positioned slices. Notably, in case of severe domain shift GANs alter the appearance much more significantly, which might explain the decreasing score.

5 Conclusion

We present a novel Fourier-based Domain Adaptation method, which requires neither any training, nor incorporating any additional deep components into the pipeline. We consider various domain shift severity scenarios, and show that our method performs consistently across all of them, outperforming SOTA-level GANs in case of subtle domain shift. We note that the simplicity achieved ensures better explainability, and envision easier certification, as we avoid modifying the deep model in any way, but rather adapt an incoming image in a strictly defined fashion.

A limitation of this study is the blunt selection of the k-space low-frequency window, which could be improved by engaging intelligent search for the style-bearing spectrum components, such as presented in [18] for the supervised case, or by penalizing for the errors in the high-frequency part of the spectrum [16]. Another fundamental assumption we make is the separability of content and style, which is known to be true only partially [2]. Optimization of the k-space swapping pattern along with taking into account the intrinsic content-style coupling will be the subject of future work.


Ivan Zakazov was supported by RSF grant 20-71-10134. Philips is the owner of the IP rights on the work described in this publication.

We warmly thank Prof.Kamnitsas for fruitful discussions in 2021 during early stages of this work.


  • [1] Y. Feng, B. Ma, J. Zhang, S. Zhao, Y. Xia, and D. Tao (2022-06) FIBA: frequency-injection based backdoor attack in medical image analysis. In

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

    pp. 20876–20885. Cited by: §2.
  • [2] Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. S. Lempitsky (2015) Domain-adversarial training of neural networks.. CoRR abs/1505.07818. Cited by: §2, §5.
  • [3] G. Ghiasi, H. Lee, M. Kudlur, V. Dumoulin, and J. Shlens (2017) Exploring the structure of a real-time, arbitrary neural artistic stylization network. In British Machine Vision Conference 2017, BMVC 2017, London, UK, September 4-7, 2017, Cited by: §4.4, §4.4, Table 2.
  • [4] H. Guan and M. Liu (2022) Domain adaptation for medical image analysis: a survey. IEEE Transactions on Biomedical Engineering 69 (3), pp. 1173–1185. Cited by: §1, §1.
  • [5] M. Jenkinson, P. R. Bannister, M. Brady, and S. M. Smith (2002) IMPROVED optimization for the robust and accurate linear registration and motion correction of brain images. NeuroImage 17, pp. 825–841. Cited by: §4.1.
  • [6] M. Jenkinson and S. Smith (2001-07) Global optimisation method for robust affine registration of brain images. Medical image analysis 5, pp. 143–56. Cited by: §4.1.
  • [7] N. Joshi and P. Burlina (2021) AI fairness via domain adaptation. External Links: 2104.01109, Document Cited by: §2.
  • [8] K. Kamnitsas, C. F. Baumgartner, C. Ledig, V. F. J. Newcombe, J. P. Simpson, A. D. Kane, D. K. Menon, A. V. Nori, A. Criminisi, D. Rueckert, and B. Glocker (2017) Unsupervised domain adaptation in brain lesion segmentation with adversarial networks. In IPMI, Cited by: §1.
  • [9] T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila (2020) Analyzing and improving the image quality of stylegan. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vol. , pp. 8107–8116. Cited by: Table 2.
  • [10] F. Kong and S. C. Shadden (2020) A generalizable deep-learning approach for cardiac magnetic resonance image segmentation using image augmentation and attention u-net. In M&Ms and EMIDEC/STACOM@MICCAI, Cited by: §1, §2, §2, §4.4.
  • [11] Q. Liu, C. Chen, J. Qin, Q. Dou, and P. Heng (2021) FedDG: federated domain generalization on medical image segmentation via episodic learning in continuous frequency space. The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Cited by: §1, §2.
  • [12] X. Liu, X. Guo, Y. Liu, and Y. Yuan (2021) Consolidated domain adaptive detection and localization framework for cross-device colonoscopic images. Medical Image Analysis 71, pp. 102052. External Links: ISSN 1361-8415 Cited by: §2.
  • [13] C. Ma, Z. Ji, and M. Gao (2019) Neural style transfer improves 3d cardiovascular mr image segmentation on inconsistent data. External Links: 1909.09716 Cited by: §2, §4.4.
  • [14] S. Nikolov, S. Blackwell, R. Mendes, J. De Fauw, C. Meyer, C. Hughes, H. Askham, B. Romera-Paredes, A. Karthikesalingam, C. Chu, et al. (2018) Deep learning to achieve clinically applicable segmentation of head and neck anatomy for radiotherapy. arXiv preprint arXiv:1809.04430. Cited by: §4.1.
  • [15] C. S. Perone, P. Ballester, R. C. Barros, and J. Cohen-Adad (2019) Unsupervised domain adaptation for medical imaging segmentation with self-ensembling. NeuroImage 194, pp. 1–11. Cited by: §1.
  • [16] V. Pronina, F. Kokkinos, D. V. Dylov, and S. Lefkimmiatis (2020) Microscopy Image Restoration with Deep Wiener-Kolmogorov Filters. ECCV. External Links: Link Cited by: §5.
  • [17] M. Sharifzadeh, A. K. Z. Tehrani, H. Benali, and H. Rivaz (2021)

    Ultrasound domain adaptation using frequency domain analysis

    External Links: 2109.09969, Document Cited by: §2.
  • [18] V. Shipitsin, I. Bespalov, and D. V. Dylov (2021) GAFL: Global Adaptive Filtering Layer for Computer Vision. arXiv: 2010.01177. Cited by: §5.
  • [19] B. Shirokikh, I. Zakazov, A. Chernyavskiy, I. Fedulova, and M. Belyaev (2020) First u-net layers contain more domain specific information than the last ones. In Domain Adaptation and Representation Transfer, and Distributed and Collaborative Learning, pp. 117–126. Cited by: §4.1.
  • [20] R. Souza, O. Lucena, J. Garrafa, D. Gobbi, M. Saluzzi, S. Appenzeller, L. Rittner, R. Frayne, and R. Lotufo (2018) An open, multi-vendor, multi-field-strength brain mr dataset and analysis of publicly available skull stripping methods agreement. NeuroImage 170, pp. 482–494. Cited by: §4.1.
  • [21] J. Wang, L. Zhang, Q. Wang, L. Chen, J. Shi, X. Chen, Z. Li, and D. Shen (2020)

    Multi-class asd classification based on functional connectivity and functional correlation tensor via multi-source domain adaptation and multi-view sparse representation

    IEEE Transactions on Medical Imaging 39 (10), pp. 3137–3147. Cited by: §1.
  • [22] M. Wang, D. Zhang, J. Huang, P. Yap, D. Shen, and M. Liu (2020) Identifying autism spectrum disorder with multi-site fmri via low-rank domain adaptation. IEEE Transactions on Medical Imaging 39 (3), pp. 644–655. Cited by: §1.
  • [23] P. Welander, S. Karlsson, and A. Eklund (2018) Generative adversarial networks for image-to-image translation on multi-contrast MR images - A comparison of cyclegan and UNIT. CoRR abs/1806.07777. Cited by: §1, §2, §4.4, Table 2.
  • [24] T. Wollmann, C. S. Eijkman, and K. Rohr (2018) Adversarial domain adaptation to improve automatic breast cancer grading in lymph nodes. In 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), Vol. , pp. 582–585. Cited by: §1.
  • [25] Y. Yang, D. Lao, G. Sundaramoorthi, and S. Soatto (2020-06) Phase consistent ecological domain adaptation. pp. 9008–9017. External Links: Document Cited by: §2.
  • [26] Y. Yang and S. Soatto (2020) FDA: fourier domain adaptation for semantic segmentation. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4084–4094. Cited by: §1, §2, §3, §3, §4.1.
  • [27] L. Zhang and H. Li (2012) SR-sim: a fast and high performance iqa index based on spectral residual. In 2012 19th IEEE International Conference on Image Processing, Vol. , pp. 1473–1476. Cited by: §3.