1 Introduction
The ability to harness diverse, featurerich datasets for algorithm training can allow the scientific community to create machine learning (ML) models capable of solving challenging datadriven tasks. These include the creation of robust autonomous vehicles
(Rao and Frtunikj, 2018), earlystage cancer discovery (Cruz and Wishart, 2006) or disease survival prediction (Rau et al., 2018). A subclass of these ML problems is able to profit particularly from the ability to execute deep learning workflows over complexvalued datasets, such as magnetic resonance imaging (MRI) (Virtue et al., 2017) or timeseries data (Fan and Xiong, 2013; Kociuba and Rowe, 2016). Complexvalued deep learning has seen increased traction in the past years, owing in part to the improved support by ML frameworks and the broader availability of graphics processing unit (GPU) hardware able to tackle the increased computational requirements (Bassey et al., 2021). However, since complex numbers are often used to represent signals derived from sensitive biological or medical records (Cole et al., 2020; Küstner et al., 2020; Peker, 2016), privacy constraints can render such datasets hard to obtain. The resulting data scarcity impairs effective model training, prompting the adoption of regulationcompliant and privacypreserving methods for data access. Distributed computation methods such as federated learning (FL) (Konečný et al., 2016) can partially address this requirement by only requiring participants to share results of the local computation rather than exchange data over the network. However, FL on its own has repeatedly been shown to be insufficient in the task of privacy protection (Geiping et al., 2020; Yin et al., 2021). Thus, bridging the gap between data protection and utilisation for algorithmic training requires methods able to offer objective privacy guarantees. Differential privacy (DP) (Dwork et al., 2014) has established itself as the cornerstone of such techniques and has been deployed in contexts like the US Census (Abowd, 2018) and distributed learning on mobile devices (Cormode et al., 2018). DP’s purview has been expanded to encompass deep learning through the introduction of DP stochastic gradient descent (DPSGD) (Abadi et al., 2016), allowing for the training of deep neural networks on private data. So far, however, the application of DP to complexvalued ML tasks remains drastically underexplored. Our work attempts to address this challenge through the following contributions:
We extend DP to the complex domain through a collection of techniques we refer to as DP. We use this term instead of complexvalued DP for brevity and to avoid confusion with the abbreviation cDP, which is already used for concentrated DP (Dwork and Rothblum, 2016). The letter alludes to the complexvalued Riemann function and is intended to convey the notion of continuation to the complex domain.

We define and discuss the complex Gaussian Mechanism (GM) in Section 4.1 and show that its properties generalise corresponding results on realvalued functions. This allows us to interpret the complex GM through the lens of previous work on DP and RényiDP (RDP).

To enable the design and privacypreserving training of complexvalued deep learning models, we introduce DPSGD in Section 4.2.

Finally, in Section 5 we experimentally evaluate our techniques on several reallife neural network training tasks, i.e. speech classification, abnormality detection in electrocardiograms and magnetic resonance imaging (MRI) reconstruction. Moreover, we establish baselines for future work by providing benchmark results on a complexvalued variant of the MNIST dataset and on complex neural network activation functions both with and without DPSGD.
2 Related work
Prior work has addressed several challenges in nonprivate complexvalued deep learning, including the introduction of appropriate activation functions, and has presented applications in domains such as MRI reconstruction (Küstner et al., 2020) or time series analysis (Fink et al., 2014). For a detailed overview of methodology and applications, we refer to Hirose (2012); Bassey et al. (2021). Until recently, deep learning frameworks did not fully support complex arithmetic and automatic differentiation. Hence, previous works (Trabelsi et al., 2017; Nazarov and Burnaev, 2020) express as and use two realvalued channels rather than complex floatingpoint numbers. This approach can lead to a spurious increase in function sensitivity and, by extension, to the addition of excessive noise in the private setting, adversely impacting utility. Our work specifically addresses this shortcoming through the use of Wirtinger/calculus (Wirtinger, 1927; KreutzDelgado, 2009). A limited number of studies have utilised DP techniques in conjunction with complexvalued data (Fan and Xiong, 2013; Fioretto et al., 2019), however, to our knowledge none has formalised a notion of complexvalued DP or investigated neural network applications. The definition of DP and the Gaussian mechanism are essential to our formalism, and details on their realvalued definitions can be found in Dwork et al. (2014). As stated above, DPSGD was introduced by Abadi et al. (2016). RényiDP (RDP) was introduced by Mironov (2017) as a relaxation of DP with favourable properties under composition, rendering it particularly useful for DPSGD privacy accounting.
3 Background
We begin by introducing key terminology required in the rest of our work. We assume that a trusted analyst in possession of sensitive data wishes to publish the results of some analysis performed on this data while offering the individuals to whom the data belongs a DP guarantee. We will refer to the set of all sensitive records as the sensitive database , whereby we assume that one individual’s data is only present in the database once. Let , the metric space of all sensitive databases, be equipped with the Hamming metric and let . ’s adjacent database can be constructed from by adding or removing exactly one database row (that is, one individual’s data), such that . The analyst executes a query (function) , for example a mean calculation, over the database. We first define the sensitivity of :
Definition 1 (Sensitivity of ).
Let and be defined as above. maps the elements of to elements of a metric space equipped with a metric . The (global) sensitivity of is then defined as:
(1) 
The maximum is taken over all adjacent database pairs in . When is the Euclidean space and is the metric, is referred to as the sensitivity. We will only use the sensitivity in this work.
In private data analysis and ML, we are often concerned with differentiable functions; for Lipschitzcontinuous query functions, the equivalence of the Lipschitz constant and the sensitivity (Raskhodnikova and Smith, 2016) can be exploited:
Definition 2 (Lipschitz constant of ).
Let and be defined as above. Then is said to be Lipschitz continuous if and only if a nonnegative real number exists for which the following holds:
(2) 
Evidently, by Equation (1) and the definition of adjacency. Moreover, let be the differential operator; then , where is the operator norm (O’Searcoid, 2006). Therefore, for a scalarvalued query function, .
A DP mechanism adds noise to the results of calibrated to its sensitivity. Here, we provide the definition of the (realvalued) Gaussian mechanism (GM):
Definition 3 (Gaussian mechanism).
The Gaussian mechanism operates on the results of a query function with sensitivity over a sensitive database by outputting , where . Here,
denotes the identity matrix with
diagonal elements andis the variance of Gaussian noise calibrated to
.The application of the GM with properly calibrated noise satisfies DP:
Definition 4 (Dp).
The randomised mechanism preserves DP if, for all pairs of inputs and and all subsets of ’s range:
(3) 
A number of relaxations have been proposed to characterise the properties of the GM, of which Rényi DP is arguably the most widely employed in DP deep learning frameworks owing to its favourable properties under composition.
Definition 5 (Rényi DP).
preserves RényiDP (RDP) if, for all pairs of inputs and :
(4) 
where denotes the Rényi divergence of order .
4 Dp
In this section we introduce DP, an extension of DP to complexvalued query functions and mechanisms. DP generalises realvalued DP and allows the reuse of prior theoretical results and software implementations.
4.1 The complex Gaussian mechanism
We begin by introducing a variant of the GM suitable to query functions with codomain .
Definition 6 (Complex Gaussian mechanism).
The complex Gaussian mechanism on outputs , where and denotes circularly symmetric complexvalued Gaussian noise with variance .
Of note, a random variable
can be constructed by independently drawing two random variablesfrom a realvalued normal distribution
and outputting , where is the imaginary unit.We now state our two main theoretical results:
Theorem 1.
Let be a query function with sensitivity . Then, preserves DP if and only if the following holds :
(5) 
where
denotes the cumulative distribution function of the standard (realvalued) normal distribution.
Proof.
The claim represents a generalisation of the Analytic Gaussian Mechanism (Balle and Wang, 2018) to . It suffices to show that the magnitude of the privacyloss random variable is bounded by
with probability
. As shown in Dwork and Rothblum (2016) and Balle and Wang (2018), given some fixed output , is given by:(6) 
where is the natural logarithm, and is distributed as:
(7) 
As , where denotes the Hermitian transpose, has a realvalued mean and hence follows a realvalued normal distribution, even when . From here, the proof proceeds identically to the proof to Theorem 8 of (Balle and Wang, 2018). ∎
Theorem 2.
Let be defined as above. Then, preserves RDP if:
(8) 
We will rely on the following fact about the Rényi divergence of order between arbitrary distributions:
Corollary 1 (Definition 2 in (Van Erven and Harremos, 2014)).
Let and be two arbitrary distributions defined on a measurable space with densities and . Then, for :
(9) 
In particular, for two normal distributions with means and and common variance :
(10) 
where denotes the inner product.
We can now prove Theorem 2:
Proof.
By Definition 6
and the additive property of the Gaussian distribution, the density functions of
on and follow a circularly symmetric complexvalued Gaussian distribution with means and and common covariance matrix . By substituting in equation (10):(11) 
Hence, to preserve RDP, it suffices to choose such that
(12) 
∎
These findings allow for a seamless transfer of results which apply to realvalued functions to the complex domain. In particular, they yield the following insights:

The complex GM inherits all properties of the and RDP interpretations of the realvalued GM, such as composition and subsampling amplification.

The complex GM, like the realvalued GM, is fully characterised by the sensitivity and the magnitude of the noise .

The GM naturally fits DP due to the convenient properties of the circularly symmetric complexvalued Gaussian distribution. As a counterexample, a complexvalued Laplace random variable is naturally noncircular in the complex (and multivariate) case, even when constructed from independent distributions (Kotz et al., 2001). Moreover, the utilisation of the metric on the output space of is disadvantageous, as even for scalar (complex) outputs, the sensitivity can be higher than the sensitivity. Lastly, the utilisation of elliptical noise is inherently unable to satisfy DP in any dimension (Reimherr and Awan, 2019). We thus leave the introduction of alternative strategies for obtaining DP in the complexvalued setting to future investigation.
We conclude this section by introducing a modification of the DP stochastic gradient descent (DPSGD) algorithm, which will be employed in our experimental evaluation.
4.2 DpSgd
The DPSGD algorithm (Abadi et al., 2016)
represents an application of the GM to the training of deep neural networks. Using the terminology above, each training step of the neural network (whose loss function, in this setting, represents the
query) leads to the release of a privatised gradient. Evidently, the noise magnitude of the GM must be calibrated to the sensitivity of the loss function. However, most neural network loss functions have a Lipschitz constant which is too high to preserve DP while maintaining acceptable utility (and –generally –the Lipschitz constant of neural networks is NPhard to compute (Scaman and Virmaux, 2018)). Thus, DPSGD (Abadi et al., 2016) artificially induces a bounded sensitivity condition by clipping the norm of the gradient to a predefined value. A realvalued loss function is required for minimisation as the complex plane –contrary to the real number line– does not admit a natural ordering. Our implementation of the algorithm makes use of Wirtinger (or ) calculus (KreutzDelgado, 2009) for gradient computations similar to previous works on complexvalued deep learning (Virtue et al., 2017; Boeddeker et al., 2017). This technique, discussed in detail Appendix A.1, provides several benefits: It relaxes the requirement for component functions to be holomorphic (that is, differentiable in the complex sense), only requiring them to be individually differentiable with respect to their real and imaginary components (differentiable in the real sense). For holomorphic functions , derivatives nevertheless recover the correct derivative definition. Thus, derivatives can also be used to compute the global sensitivity in the case via the Lipschitz constant. More importantly, for functions, they lead to a correct gradient magnitude calculation, whereas expressing complexvalued functions as vectorvalued functions in
, a technique often employed in complexvalued neural network training (Trabelsi et al., 2017), can incur an undesirable multiplicative sensitivity increase which would diminish the utility of DPSGD. We exemplify this phenomenon and the noise savings calculus can enable in Appendix A.1. DPSGD is presented in Algorithm 1and relies on a modification of the gradient clipping step: we clip the
conjugate gradient, which represents the direction of steepest ascent for a realvalued loss function :(13) 
where is the conjugate weight vector.
5 Experimental evaluation
Throughout this section, we present results from the experimental evaluation of DPSGD. Details on dataset preparation and training can be found in Appendix A.2.
5.1 Benchmarking DPSGD on PhaseMNIST
The MNIST dataset (LeCun et al., 2010) is widely used as a benchmark dataset in realvalued DPSGD literature. As a means of comparison, we thus begin our experimental evaluation with results on an adapted, complexvalued version of MNIST, which we term PhaseMNIST. In brief, for each example of the original MNIST dataset with label , we obtain the imaginary component by selecting an image with label such that resulting in an input image arrangement . Only the label of the realvalued image is used. The results are summarised in Table 1, where we also provide baselines for realvalued MNIST training on the same architecture (with realvalued weights).
Accuracy  

PhaseMNIST DPSGD  
MNIST DPSGD  
PhaseMNIST nonDP  
MNIST nonDP 
The complexweighted neural networks reached accuracy with a low privacy budget consumption of . We assume this to be due to the increased amount of information provided by the alignment with a second image as well as the higher entropic capacity of the network due to the complexvalued weights. A similar phenomenon was observed by Scardapane et al. (2018).
5.2 Privacypreserving electrocardiogram abnormality detection on wearable devices
The advent of wearable devices incorporating electrocardiography (ECG) sensors has provided consumers the ability to detect signs of an abnormal heart rhythm. In this section, we demonstrate the utilisation of a small neural network architecture suitable for deployment, e.g. to a mobile device connected to such a biosensor, to be trained on ECG data from the the China Physiological Signal Challenge (CPSC) 2018 challenge dataset (Liu et al., 2018). We selected the task of automated Left Bundle Branch Block (LBBB) detection, formulated as a binary classification task against a normal (sinus) rhythm. This task is clinically relevant, as the sudden appearance of LBBB can herald acute coronary syndrome which requires urgent attention to avert myocardial infarction. As ECG data constitutes personal health information, its protection is mandated both legally and ethically. We utilised
DPSGD for training a complexvalued neural network on Fouriertransformed ECG acquisitions. We adopt this strategy as it can benefit from two key properties of the Fourier transform: ECG data can contain highfrequency noise which is irrelevant for diagnosis and can be reduced using Fourier filtering. Concurrently, this technique compresses the signal, which can drastically reduce the amount of data transferred. Table
2 shows classification results and Figure 1 shows exemplary source data.ROCAUC  

NonDP  
DPSGD 
5.3 Differentially private speech command classification for voice assistant applications
In recent years, voice assistants have gained popularity in consumer applications such as home speakers, and rely heavily on ML. Recordings collected from users for training speech processing algorithms can be used in impersonation attacks, resulting in successful identity theft (Sweet, 2016) or in acoustic attacks, which trigger unintended behaviour in voice assistants (Yuan et al., 2018; Carlini et al., 2016)
. Protecting privacy in this setting is therefore paramount to increase trust and applicability, as well as safeguard both users and systems from adversarial interference. Convolutional neural networks (CNNs) have been demonstrated to yield stateoftheart performance on spectrogramtransformed audio data
(Palanisamy et al., 2020). However, this and other works (Zhou et al., 2021) typically discard the imaginary components. We here experimentally demonstrate the differentially private training of a 2dimensional CNN directly on the complex spectrogram data. We utilised a subset of the SpeechCommands dataset (Warden, 2018), specifically samples from the categories Yes, No, Up, Down, Left, Right, On, Off, Stop, and Go, summing up to examples. We transformed each waveform signal to a complexvalued 2D spectrogram and used DPSGD to train a complexvalued CNN. These results are summarised in Table 3 and Figure 2.ROCAUC  

NonDP  
DPSGD 
5.4 Benchmarking complexvalued activation functions for DpSgd
A number of specialised activation functions, designed for utilisation with complexvalued neural networks, have been proposed in literature. To guide practitioner choice in our newly proposed setting of DPSGD training, we here provide activation function benchmarks on the SpeechCommands dataset used in the previous section. Table 4 summarises these results. We consistently found the inverted Gaussian (iGaussian) activation function to perform best in the DPSGD setting. This may be in part due to its bounded magnitude, thereby recapitulating the effect Papernot et al. (2020) discuss for realvalued networks, i.e. that bounded activation functions lead to improved performance in DPSGD. We leave the further investigation of this finding to future work.
Activation function  Reference  ROCAUC 

Separable Sigmoid  Nitta (1997)  
zReLU  Guberman (2016)  
Trainable Cardioid (perfeature bias)  Virtue et al. (2017)  
SigLog  Georgiou and Koutsougeras (1992)  
Trainable ModReLU (perfeature bias)  Arjovsky et al. (2016)  
Cardioid  Virtue et al. (2017)  
Trainable Cardioid (single bias)  Virtue et al. (2017)  
ModReLU  Arjovsky et al. (2016)  
cReLU  Trabelsi et al. (2017)  
iGaussian  Virtue et al. (2017) 
5.5 MRI reconstruction
MRI is an important medical imaging modality and has been studied extensively in the context of deep learning (Akçakaya et al., 2019; Hammernik et al., 2018; Küstner et al., 2020; Muckley et al., 2020). MRI data is acquired in the socalled kspace. Sampling only a subset of kspace data allows for a considerable speedup in acquisition time, benefiting patient comfort and costs, however, typically leads to image artifacts, which reduce the diagnostic quality of the resulting MR images. Although neural networks have the ability to produce highquality reconstructions, their usage for this task has been shown to sometimes lead to the appearance of spurious image content from the fullysampled reference images the models have been originally trained on (Hammernik et al., 2021; Muckley et al., 2020; Shimron et al., 2021). DP could counteract such hallucination as it is designed to limit the effect of individual training examples on model training. However, this positive effect of DP may be counterbalanced by an unacceptable decrease in the diagnostic suitability of the reconstructed images. In this section, we investigate the ramifications of DP on the quality of MRI reconstructions. For this purpose, we trained a complexvalued UNet model architecture on the task of reconstructing singlecoil knee MRI images from the fastMRI dataset (Zbontar et al., 2018) using pseudorandom kspace sampling at acceleration. We observed a nearly equivalent performance in the nonDP and the DPSGD settings, whereby the nonDP model enjoyed a performance advantage in all metrics. Moreover, to assess the diagnostic suitability of the reconstructed images, we asked a diagnostic radiologist who was blinded to whether or not DPSGD was used, to compare the resulting scans. No differences in diagnostic suitability were observed by the expert in any of the reconstructed images. We thus conclude that –at least with respect to image quality– DP can indeed match the nonprivate training of MRI reconstruction models, even at an value of ; we intend to investigate its effect on preventing training data hallucination into reconstructed images in future work. Results from these experiments are summarised in Table 5 and Figure 3.
NMSE  PSNR  SSIM  

NonDP  
DPSGD 
Results on the MRI reconstruction task. NMSE: normalised mean squared error, PSNR: peak signaltonoise ratio in dB, SSIM: structural similarity index metric.
6 Conclusion
Our work presents DP, an extension of DP to the complex domain and introduces key building blocks of DP training, namely the complex Gaussian mechanism and
DPSGD. Our experiments on realworld tasks demonstrate that the training of DP complexvalued neural networks is possible with high utility under tight privacy guarantees. This may –in part– be attributable to the increased learning capacity of complexvalued models resulting from incorporating two degrees of freedom (real and imaginary) per trainable model parameter. On the flip side, both complexvalued deep learning and DP incur a considerable computational performance penalty. Despite steadily improving complex number support, current deep learning frameworks have not yet implemented a full palette of complexvalued layers and activation functions. Moreover, the software framework utilised to computationally realise
DPSGD in our work relies on multithreading, which suffers from considerable overhead compared to implementations utilising vector instructions and hardware. We discuss the topic of software implementation and provide computational performance benchmarks in Appendices A.3 and A.4. Our conclusions highlight a requirement for mature software frameworks able to offer feature and performance parity with their realvalued counterparts. We focus on the Gaussian mechanism and the DP/RDP interpretations in this work, believing them to be the most relevant for deep learning applications. The formalisation of other DP mechanisms and interpretations (such as fDP (Dong et al., 2019)) is a promising future research direction. Such future work could benefit from improved privacy accounting (interestingly also relying on complex numbers, e.g. Zhu et al. (2021)) to diminish –as much as possible– the utility gap to nonprivate training. In conclusion, we contend that DP and other privacyenhancing technologies can increase the amount of data available for scientific study, and are optimistic that our work represents a worthwhile contribution to their implementation in a broad variety of tasks.Ethics statement
Our work follows all applicable ethical research standards and laws. All experiments were conducted on publicly available datasets. No new data concerning human or animal subjects was generated during our investigation.
Reproducibility Statement
We adhere to ICLRs reproducibility standards and include all necessary information to reproduce our experimental and theoretical results either in the main manuscript or in the Appendix. Theoretical results and proofs can be found in the main manuscript, Section 4 and additional information can be found in Appendix A.1. Details of dataset preparation and analysis can be found in Appendix A.2
. Specifically, it contains details about the used datasets, their number of samples, all training, validation and test splits, as well as preprocessing steps. Furthermore, we describe model architectures, employed optimisers, learning rates, and the number of epochs for which models were trained. Lastly, for all DP trainings we provide the noise multipliers,
clipping norms and sampling rates, as well as the values at which the values were calculated. Software implementation details and computational resources used can be found in Appendices A.3 and A.4.References
 Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, pp. 308–318. Cited by: §1, §2, §4.2.
 The US Census Bureau adopts differential privacy. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2867–2867. Cited by: §1.

Scanspecific robust artificialneuralnetworks for kspace interpolation (RAKI) reconstruction: Databasefree deep learning for fast imaging
. Magnetic Resonance in Medicine 81 (1), pp. 439–453. Cited by: §5.5. 
Unitary evolution recurrent neural networks
. In International Conference on Machine Learning, pp. 1120–1128. Cited by: Table 4.  Improving the Gaussian mechanism for differential privacy: Analytical calibration and optimal denoising. In International Conference on Machine Learning, pp. 394–403. Cited by: §4.1.
 A survey of complexvalued neural networks. arXiv preprint arXiv:2101.12249. Cited by: §1, §2.
 On the computation of complexvalued gradients with application to statistically optimum beamforming. arXiv preprint arXiv:1701.00392. Cited by: §4.2.
 A complex gradient operator and its application in adaptive array theory. In IEE Proceedings HMicrowaves, Optics and Antennas, Vol. 130, pp. 11–16. Cited by: §A.1.
 Hidden voice commands. In 25th USENIX Security Symposium, pp. 513–530. Cited by: §5.3.
 Going beyond the image space: undersampled MRI reconstruction directly in the kspace using a complex valued residual neural network. In 2021 ISMRM & SMRT Annual Meeting & Exhibition, pp. 1757. Cited by: §A.3.
 Analysis of deep complexvalued convolutional neural networks for MRI reconstruction. arXiv preprint arXiv:2004.01738. Cited by: §1.
 Privacy at scale: Local differential privacy in practice. In Proceedings of the 2018 International Conference on Management of Data, pp. 1655–1658. Cited by: §1.
 Applications of machine learning in cancer prediction and prognosis. Cancer informatics 2, pp. 59–77. Cited by: §1.
 Gaussian differential privacy. arXiv preprint arXiv:1905.02383. Cited by: §6.
 The algorithmic foundations of differential privacy.. Found. Trends Theor. Comput. Sci. 9 (34), pp. 211–407. Cited by: §1, §2.
 Concentrated differential privacy. arXiv preprint arXiv:1603.01887. Cited by: item 1, §4.1.
 Adaptively sharing realtime aggregate with differential privacy. IEEE Transactions on Knowledge and Data Engineering (TKDE) 26 (9), pp. 2094–2106. Cited by: §1, §2.
 Predicting component reliability and level of degradation with complexvalued neural networks. Reliability Engineering & System Safety 121, pp. 198–206. Cited by: §2.
 Differential privacy for power grid obfuscation. IEEE Transactions on Smart Grid 11 (2), pp. 1356–1366. Cited by: §2.
 Inverting Gradients–How easy is it to break privacy in federated learning?. arXiv preprint arXiv:2003.14053. Cited by: §1.

Complex domain backpropagation
. IEEE transactions on Circuits and systems II: analog and digital signal processing 39 (5), pp. 330–334. Cited by: Table 4.  On complex valued convolutional neural networks. arXiv preprint arXiv:1602.09046. Cited by: Table 4.
 Learning a variational network for reconstruction of accelerated MRI data. Magnetic resonance in medicine 79 (6), pp. 3055–3071. Cited by: §5.5.
 Systematic evaluation of iterative deep neural networks for fast parallel MRI reconstruction with sensitivityweighted coil combination. Magnetic Resonance in Medicine 86 (4), pp. 1859–1872. Cited by: §5.5.
 Complexvalued neural networks. Vol. 400, Springer Science & Business Media. Cited by: §2.
 Complexvalued timeseries correlation increases sensitivity in FMRI analysis. Magnetic resonance imaging 34 (6), pp. 765–770. Cited by: §1.
 Federated learning: Strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492. Cited by: §1.
 The Laplace Distribution and Generalizations. Birkhäuser Boston. Cited by: item 3.
 The complex gradient operator and the CRcalculus. arXiv preprint arXiv:0906.4835. Cited by: §A.1, §2, §4.2.
 CINENet: deep learningbased 3D cardiac CINE MRI reconstruction with multicoil complexvalued 4D spatiotemporal convolutions. Scientific reports 10 (1), pp. 1–13. Cited by: §1, §2, §5.5.
 MNIST handwritten digit database. Cited by: §5.1.
 An Open Access Database for Evaluating the Algorithms of Electrocardiogram Rhythm and Morphology Abnormality Detection. Journal of Medical Imaging and Health Informatics 8 (7), pp. 1368–1373. Cited by: §A.2.2, §5.2.
 Rényi Differential Privacy. 2017 IEEE 30th Computer Security Foundations Symposium (CSF). Cited by: §2.
 Stateoftheart Machine Learning MRI reconstruction in 2020: Results of the second fastMRI challenge. arXiv preprint arXiv:2012.06318. Cited by: §5.5.
 Bayesian Sparsification of Deep Cvalued Networks. In International Conference on Machine Learning, Vol. 119, pp. 7230–7242 (en). Note: ISSN: 26403498 Cited by: §2.
 An extension of the backpropagation algorithm to complex numbers. Neural Networks 10 (8), pp. 1391–1415. Cited by: Table 4.
 Metric Spaces. Springer Undergraduate Mathematics Series, Springer London. External Links: ISBN 9781846286278, LCCN 2006924371 Cited by: §3.
 Rethinking CNN models for audio classification. arXiv preprint arXiv:2007.11154. Cited by: §5.3.
 Tempered sigmoid activations for deep learning with differential privacy. arXiv preprint arXiv:2007.14191. Cited by: §5.4.
 An efficient sleep scoring system based on EEG signal using complexvalued machine learning algorithms. Neurocomputing 207, pp. 165–177. Cited by: §1.
 Deep learning for selfdriving cars: Chances and challenges. In Proceedings of the 1st International Workshop on Software Engineering for AI in Autonomous Systems, pp. 35–38. Cited by: §1.
 Lipschitz extensions for nodeprivate graph statistics and the generalized exponential mechanism. In 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pp. 495–504. Cited by: §3.
 Mortality prediction in patients with isolated moderate and severe traumatic brain injury using machine learning models. PloS one 13 (11), pp. e0207192. Cited by: §1.
 Elliptical Perturbations for Differential Privacy. arXiv preprint arXiv:1905.09420. Cited by: item 3.

Lipschitz regularity of deep neural networks: analysis and efficient estimation
. arXiv preprint arXiv:1805.10965. Cited by: §4.2.  Complexvalued neural networks with nonparametric activation functions. IEEE Transactions on Emerging Topics in Computational Intelligence 4 (2), pp. 140–150. Cited by: §5.1.
 Subtle Inverse Crimes: Naively training machine learning algorithms could lead to overlyoptimistic results. arXiv preprint arXiv:2109.08237. Cited by: §5.5.
 The Hidden Scam: Why Consumers Should No Longer Be Forced to Shoulder the Burden of Liability for Mobile Cramming. J. Bus. & Tech. L. 11, pp. 69. Cited by: §5.3.
 Deep complex networks. arXiv preprint arXiv:1705.09792. Cited by: §2, §4.2, Table 4.

Rényi divergence and KullbackLeibler divergence
. IEEE Transactions on Information Theory 60 (7), pp. 3797–3820. Cited by: Corollary 1.  Better than real: Complexvalued neural nets for MRI fingerprinting. In 2017 IEEE international conference on image processing (ICIP), pp. 3953–3957. Cited by: §1, §4.2, Table 4.
 Speech commands: A dataset for limitedvocabulary speech recognition. arXiv preprint arXiv:1804.03209. Cited by: §A.2.3, §5.3.
 Zur formalen Theorie der Funktionen von mehr komplexen Veränderlichen. Mathematische Annalen 97 (1), pp. 357–375. Cited by: §2.

See through Gradients: Image Batch Recovery via GradInversion.
In
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
, pp. 16337–16346. Cited by: §1.  Commandersong: A systematic approach for practical adversarial voice recognition. In 27th USENIX Security Symposium, pp. 49–64. Cited by: §5.3.
 fastMRI: An open dataset and benchmarks for accelerated MRI. arXiv preprint arXiv:1811.08839. Cited by: §A.2.4, §5.5.
 Cough Recognition Based on MelSpectrogram and Convolutional Neural Network. Frontiers in Robotics and AI 8. Cited by: §5.3.

Optimal Accounting of Differential Privacy via Characteristic Function
. arXiv preprint arXiv:2106.08567. Cited by: §6.  Medical imaging deep learning with differential privacy. Scientific Reports 11 (1), pp. 1–8. Cited by: §A.3.
Appendix A Appendix
a.1 Wirtinger/Calculus
In this section, we present key results from Wirtinger (or ) calculus which are used in our work. For a detailed treatment, we refer to KreutzDelgado (2009).
Consider a function . As for realvalued functions, the derivative of at a point can be defined as:
(14) 
If this limit is defined for the (infinitely many) series approaching , is called complex differentiable (equivalently, differentiable in the complex sense). If, in addition, exists everywhere in the neighbourhood of , is called holomorphic. It is also possible to write and to then express as two realvalued functions and of the variables and :
(15) 
can then be written as:
(16) 
If this derivative exists at , is called differentiable in the real sense. This interpretation represents as or, more generally, for vectorvalued functions, as . The CauchyRiemann equations state that, for to be holomorphic, it must satisfy:
(17) 
As discussed above, the complex plane does not admit a natural ordering. Hence, the minimisation of a complexvalued function is not defined. Therefore, for complexvalued deep learning, we only consider realvalued (loss) functions . By equation (15), . Thus, by the CauchyRiemann equations, such a realvalued function is only holomorphic if:
(18) 
This means that any holomorphic realvalued function must be constant, which invalidates its usefulness for optimisation. The Wirtinger/derivatives provide an alternative interpretation of the CauchyRiemann equations which allows us to consider holomorphicity and differentiability in the real sense separately. Thus, they recover the usefulness of interpreting as while preventing multiplicative penalties on the gradient norm as a consequence of following this interpretation too closely. We will motivate this somewhat informal notion with an example below. The Wirtinger/derivatives^{1}^{1}1The term derivative represents an abuse of terminology, as they are formal operators and not derivatives with respect to actual variables. However, the interpretation as derivatives is intuitive, and we will thus retain it. of are defined as:
(19) 
An immediate consequence of this definition is that the CauchyRiemann equations can be expressed as:
(20) 
Therefore, if a function is holomorphic, corresponds to the derivative in the complex sense (that is, ) while, if is differentiable in the real sense, both and are valid (and are conjugates of each other). As stated above, it can be shown that the steepest ascent of is aligned with . In this sense, fulfils the role of the operator for real, scalarvalued loss functions. Evidently, compared to the actual gradient of in the real sense, the following relationship holds:
(21) 
However, redefining is desirable (and correct, as shown by Brandwood (1983)) . We will motivate this requirement with an example: Let be a function such that:
(22) 
The Wirtinger/derivative of is , whose norm is . The same output can be realised by interpreting as a function of a realvalued vector :
(23) 
The gradient of is , whose norm is = . This undesirable multiplicative penalty, which would translate to a superfluous multiplicative increase in the noise scale of the GM to preserve DP, is a consequence of ignoring the connection between real and imaginary part inherent to complex numbers, but not to components of vectors. In fact, is neither equivalent to (as would be the case if where ), nor is it equivalent to , as complex multiplication lacks the bilinearity inherent to a real inner product space. Both complications are avoided by the redefinition of the Wirtinger/derivative as the gradient used for optimisation, which prompts its utilisation in our work. As a note to practitioners, certain deep learning frameworks silently rescale the Wirtinger/gradient by to avoid user confusion by a lower effective learning rate. To ascertain a correct implementation, we therefore recommend examining this behaviour by testing the gradient norm of known functions.
a.2 Dataset preparation and Model Training
a.2.1 PhaseMNIST
Dataset construction
As described in the main manuscript, PhaseMNIST is intended as a benchmark dataset for complexvalued computer vision tasks and contains images of handwritten digits from to ^{2}^{2}2The version of the dataset used in this study will be made publicly available upon acceptance.. The training set consists of images and the testing set of images. For each example of the original MNIST dataset, from which PhaseMNIST is constructed, we performed the following procedure: Let be the label corresponding to the realvalued image. We then constructed the imaginary component by (deterministically) sampling uniformly with replacement from the set of images whose label satisfies . We used the label of the realvalued image as the label of the overall training example.
Model training
We used a complexvalued model consisting of three fully connected layers with units and an output layer of units. The Cardioid activation function was used between layers and the Softmax activation function after the output layer. The model was trained with the Stochastic Gradient Descent optimiser at a learning rate of both for DPSGD and for nonprivate training. The nonprivate model converged after 3 epochs, whereas the DPSGD model required 10 epochs to achieve the same accuracy. The noise multiplier was set to and the clipping norm to . The value was calculated at a . A sampling rate of was used for DPSGD, and a batch size of for nonDP training.
a.2.2 ECG Dataset
Dataset preparation
We utilised the China Physiological Signal Challenge 2018 (Liu et al., 2018) dataset for this task. We used the normal and left bundle branch block classes and channel . The ECGs were loaded from the provided Matlab format using the SciPy
library and trimmed or padded to a length of
. The numpy Fast Fourier Transform implementation was used whereby the signal was pretrimmed to length . The final dataset consisted of training examples and testing examples.Model training
We implemented a complexvalued fullyconnected neural network architecture consisting of input/hidden layers with (DPSGD) units and a single output unit. The cReLU activation function was used both in the nonDP and the DPSGD setting. The output layer implemented the magnitude operation followed by a logistic sigmoid activation function. Models were trained using the SGD optimiser at a learning rate of with an regularisation of for nonDP training and a learning rate of for DPSGD training, respectively. A batch size of was used for nonprivate training and a sampling rate of at a noise multiplier of and an clipping norm of for DPSGD. was calculated at a of where is the number of training samples. Both models were trained for epochs.
a.2.3 Speech command classification dataset
Dataset preparation
We used a subset of the SpeechCommands dataset (Warden, 2018) as described above, consisting of samples each from the categories Yes, No, Up, Down, Left, Right, On, Off, Stop, and Go. Of these, examples were used as the training test and as the testing set. The waveform data was decoded using the TensorFlow library and, where necessary, padded to a length of samples. The TensorFlow implementation of the Short time Fourier Transform function was used with a frame length of and a frame step of .
Model training
For this task, we employed a complexvalued 2D CNN consisting using filters of size
without zeropadding and a stride of
. The convolutional layers had output filters, whereby a MaxPooling layer was used between the second layer and the third layer and an adaptive MaxPooling layer after the final convolutional layer. The convolutional block was followed by a fully connected layer with units and an output layer of units. Both employed the iGaussian activation function. The nonDP model was trained at a batch size of for epochs at a learning rate of using the Stochastic Gradient Descent optimiser, whereas the DPSGD network was trained using a sampling rate of for epochs with the same learning rate and optimiser, a noise multiplier of and an clipping norm of . We calculated at a value of .a.2.4 fastMRI knee dataset
Dataset preparation
We utilised the single coil knee MRI dataset of the fastMRI challenge proposed by Zbontar et al. (2018). We used the reference implementation ^{3}^{3}3https://github.com/facebookresearch/fastMRI/tree/main/fastmri_examples/unet, and employed the default settings using an acceleration rate of and of densely sampled kspace center lines in the mask. Masks are sampled pseudorandomly during training time. The dataset offers train and validation images.
Model training
We changed the UNet
network to use complexvalued weights and accept complexvalued inputs instead of the magnitude image employed in the original example. We replaced the original ReLU activation functions with CReLU. In the DP setting, we used a noise multiplier of
, an clipping norm of and a sampling rate of and calculated the at a of . The learning rate was set tousing the RMSProp optimiser and a stepwise learning rate scheduler. We trained both in the nonprivate and the
DPSGD setting for 30 epochs and disabled the collection of running statistics in the BatchNormalisation layers to render them compatible with DP.a.3 Software libraries and computational resources used
Implementations of the DPSGD algorithm, and –by extension– DPSGD require access to perexample gradients. We utilised the deepee software library (Ziller et al., 2021) to implement DPSGD, as it is compatible with arbitrary neural network architectures, including such containing complexvalued weights. We report results using uniform without replacement sampling and using the RDP option provided by deepee. For complexvalued neural network components, the PyTorch Complex library (Chatterjee et al., 2021) with PyTorch 1.9 were used. TensorFlow 2.4 was used for loading data and the Short Time Fourier Transforms discussed above, but no neural network components were used from this library. Experiments were carried out in Python 3.8.5 on a single workstation computer running Ubuntu Linux 20.04 and equipped with a single NVidia Quadro RTX 8000 GPU, 12 CPU cores and 64 GB of RAM.
a.4 Computational considerations
We conclude by presenting a systematic evaluation of the computational considerations incurred by the utilisation of complexvalued neural networks and by the implementation of DPSGD using the abovementioned libraries. Two main sources of computational overhead arise between realvalued and complexvalued neural networks. Complex numbers are internally represented as a pair of bit floating point numbers. This affects inputs and neural network weights. Moreover, even though a complexvalued architecture may contain the same number of parameters as its realvalued counterpart, an increased number of computational operations is required in . For instance, the operation requires a single multiplication operation in . However, in it can require up to multiplications and additions, depending if vector hardware is used and whether complex floating point instructions are implemented in the respective framework (e.g., cuDNN). Table 6 shows results for individual matrix multiplication operations and convolutions with real/complexvalued inputs and weight matrices.
Linear  Conv.  

CPU  s  ms  ms  ms 
GPU  s  s  ms  ms 
DPSGD carries additional overhead as it requires persample gradients. In the utilised deepee framework, this is realised through dispatching one computation thread per example in the minibatch (more precisely, lot) to perform a forward and backward pass, which incurs substantial overhead compared to pure vectorisation. These results are shown in Table 7. Of note, for the nonprivate models, the computation time includes the forward pass, backward pass, loss gradient calculation (Mean Squared Error against a vector of dimensions ) and weight update (Stochastic Gradient Descent). For the DPSGD model, the following additional steps occur between the loss gradient calculation and the weight update: gradient clipping, averaging of persample gradients, noise application. Moreover, the deepee framework requires an additional step between the weight update and the subsequent batch.
NonDP  DPSGD  

CPU  ms  s  s  min 
GPU  ms  ms  s  s 
Comments
There are no comments yet.