Log In Sign Up

Federated Quantum Natural Gradient Descent for Quantum Federated Learning

The heart of Quantum Federated Learning (QFL) is associated with a distributed learning architecture across several local quantum devices and a more efficient training algorithm for the QFL is expected to minimize the communication overhead among different quantum participants. In this work, we put forth an efficient learning algorithm, namely federated quantum natural gradient descent (FQNGD), applied in a QFL framework which consists of the variational quantum circuit (VQC)-based quantum neural networks (QNN). The FQNGD algorithm admits much fewer training iterations for the QFL model to get converged and it can significantly reduce the total communication cost among local quantum devices. Compared with other federated learning algorithms, our experiments on a handwritten digit classification dataset corroborate the effectiveness of the FQNGD algorithm for the QFL in terms of a faster convergence rate on the training dataset and higher accuracy on the test one.


Efficient and Convergent Federated Learning

Federated learning has shown its advances over the last few years but is...

QuantumFed: A Federated Learning Framework for Collaborative Quantum Training

With the fast development of quantum computing and deep learning, quantu...

Exact Decomposition of Quantum Channels for Non-IID Quantum Federated Learning

Federated learning refers to the task of performing machine learning wit...

Network Gradient Descent Algorithm for Decentralized Federated Learning

We study a fully decentralized federated learning algorithm, which is a ...

Quantum Split Neural Network Learning using Cross-Channel Pooling

In recent years, quantum has been attracted by various fields such as qu...

Federated Generalized Bayesian Learning via Distributed Stein Variational Gradient Descent

This paper introduces Distributed Stein Variational Gradient Descent (DS...

Federated Learning with a Sampling Algorithm under Isoperimetry

Federated learning uses a set of techniques to efficiently distribute th...

1 Introduction

Successful deep learning (DL) applications, such as automatic speech recognition (ASR) 


, natural language processing (NLP) 


, and computer vision 

[31], highly rely on the hardware breakthrough of the graphical processing unit (GPU). However, the advancement of large DL models, such as GPT [3] and BERT [10], is faithfully attributed to significantly powerful computing capabilities that are only privileged to big companies which own numerous costly and industrial-level GPUs. With recent years witnessing a rapid development of near-term intermediate-scale quantum (NISQ) devices [23, 12, 13]

, the quantum computing hardware is expected to further speed up the classical DL algorithms by designing novel quantum machine learning (QML) models like quantum neural networks (QNN) on NISQ devices 

[4, 16, 11, 15]

. Unfortunately, two obstacles prevent the NISQ devices from applying to QNN in practice. The classical DL models cannot be directly deployed on a NISQ device without conversion to a quantum tensor format 

[26, 5]

. For another, the NISQ devices admit a few physical qubits such that insufficient qubits could be spared for the quantum error correction 

[2, 12, 13], and more significantly, the representation power of QML is quite limited due to the small number of currently available qubits [25].

To deal with the first challenge, the variational algorithm, namely the variational quantum circuit (VQC), was proposed to enable QNN simulated on NISQ devices and has attained even exponentially advantages over the DL counterparts on exclusively many tasks like ASR [33, 24], NLP [34]

, and reinforcement learning 

[6]. As for the second challenge, distributed quantum machine learning systems, which consist of different quantum machines with limited quantum capabilities, can be set up to increase the quantum computing power. One particular distributed model architecture is called quantum federated learning (QFL), which aims at a decentralized computing architecture derived from classical federated learning (FL).

The FL framework depends on the advances in hardware progress, making tiny DL systems practically powerful. For example, a speech recognition system on the cloud can transmit a global acoustic model to a user’s cell phone and then send the updating information back to the cloud without collecting the user’s private data on the centralized computing server. This methodology helps to build a privacy-preserving DL system and inspires us to leverage quantum computing to expand machine learning capabilities. As shown in Figure 1, our proposed QFL and FL differ in the models utilized in the systems, where QFL is composed of VQC models rather than the classical DL counterparts for FL. More specifically, the QFL system consists of a global VQC model placed on the cloud, and local VQC models deployed on user devices. Moreover, the training process of QFL involves three key steps: (1) the global VQC parameter is transmitted to local participants’ devices; (2) the parameter of each local VQC is adaptively trained based on the participant’s data and then uploads the model gradients back to the cloud; (3) the uploaded gradients are aggregated to generate a centralized gradient to update the global model parameter .

Figure 1: An illustration of quantum federated learning. The global VQC parameter is first transmitted to local VQCs . Then, the updated gradients based on the participants’ local data are sent back to the centralized server and then they are aggregated to update the parameters of the global VQC.

Besides, a significantly inherent obstacle of QFL is bound up with the communication overhead among different VQC models. To reduce the communication cost, a more efficient training algorithm is expected to accelerate the convergence rate such that fewer counts of the global model update can be obtained. Therefore, in this work, we put forth a federated quantum learning algorithm for training QFL, namely federated quantum natural gradient descent (FQNGD). FQNGD is derived from the algorithm of quantum natural gradient descent (QNGD), which has demonstrated more efficient performance for training VQC models than the other stochastic gradient descent (SGD) methods in the training process of a single VQC.

1.1 Main Results

Our main contributions to this work are summarized as follows:

  1. We design the FQNGD algorithm for the QFL system by extending the QNGD method for training a single VQC. The QNGD algorithm is developed by approximating the Fubini-Study metric tensor to create a specific VQC model structure.

  2. We compare our FQNGD algorithm with the conventional SGD algorithms, and highlight the performance advantages of FQNGD over other SGD methods in theory.

  3. Empirical study of experiments on the handwritten digit classification is conducted to corroborate our theoretical results. The experimental results can demonstrate the performance of the advantages of FQNGD for QFL.

2 Related Work

Konečnỳ et al. [19] first proposed the FL strategies to improve the communication efficiency of a distributed computing system, and McMahan et al. [22] set up the FL systems with the concerns in the use of big data and a large-scale cloud-based DL [28]. Chen et al. [7] demonstrates the QFL architecture that is built based on the classical FL paradigm, where the central node holds a global VQC and receives the trained VQC parameters from participants’ local quantum devices. The algorithm of QNGD was discussed to provide an efficient training method for a single VQC [29]. In particular, Stokes et al. [29] firstly claimed that the Fubini-Study metric tensor can be used for the QNGD.

This work proposes the algorithm of FQNGD by extending the QNGD to the FL setting and highlighting the learning efficiency of FQNGD in a QFL architecture. Besides, compared with the work [7], in this work, the gradients of VQC are uploaded to the global model rather than the VQC parameters of local devices such that the updated gradients can be collected without getting access to the VQC parameters as shown in [7].

3 Results

3.1 Preliminaries

In this section, we first show the necessary preliminaries that compose the algorithmic foundations for FQNGD. More specifically, we first introduce the detailed VQC framework and then explicitly explain the steps of QNGD.

3.1.1 Variational Quantum Circuit

An illustration of VQC is shown in Figure 2, where the VQC model consists of three components: (a) tensor product encoding (TPE); (b) parametric quantum circuit (PQC); (c) measurement. The TPE initializes the input quantum states , , …, from the classical inputs , , …, , and the PQC operator transforms the quantum states , , …, into the output quantum states , , …, . The measurement outputs the expected observations , , …, .

Figure 2: The VQC is composed of three components: (a) TPE; (b) PQC; (c) measurement. The TPE utilizes a series of to transform classical inputs into quantum states. The PQC consists of CNOT gates and single-qubit rotation gates , , with trainable parameters , , and . The CNOT gates are non-parametric and impose the property of quantum entanglement among qubits, and , and are parametric gates and can be adjustable during the training stage. To build a deep model, the PQC model in the green dash square is repeatably copied. The measurement converts the quantum states into the corresponding expectation values . The outputs , , …,

are connected to a loss function and the gradient descent algorithms can be used to update the VQC model parameters. Besides, both CNOT gates and

, and correspond to unitary matrices as shown below the VQC framework.

In more detail, the TPE model was firstly proposed in [30]

and aims at converting a classical vector

x into a quantum state by building up a one-to-one mapping as:


where refers to a single-qubit quantum gate rotated across -axis and each is constraint to the domain of that results in a reversely one-to-one conversion between x and .

Moreover, the PQC is equipped with the CNOT gates for quantum entanglement and trainable quantum gates , , and , where the qubit angles , , and are adjustable during the training process. The PQC framework in the green dash square is repeatedly copied to set up a deep model, and the number of the copied PQC frameworks is called the depth of VQC. The operation of the measurement outputs the classical expected observations , , …, from the quantum output states. The expected outputs are used to calculate the loss value and the gradient descents [27] that can be utilized to update the VQC model parameters by applying the back-propagation algorithm [32].

3.1.2 Quantum Natural Gradient Descent

As shown in Eq. (2), at the step , the standard gradient descent minimizes a loss function with respect to the parameters in a Euclidean space.


The standard gradient descent algorithm admits each optimization step conducted in a Euclidean geometry on the parameter space. However, since the form of parameterization is not unique, different compositions of parameterizations are likely to distort distances within the optimization landscape. A better alternative method is to perform the gradient descent in the distribution space, namely natural gradient descent [1], which is dimension-free and invariant concerning parameterization. In doing so, each optimization step chooses the optimum step size for the update of parameter , regardless of the choice of parameterization. Mathematically, the standard gradient descent is modified as follows:


where denotes the Fisher information matrix that acts as a metric tensor, transforming the steepest gradient descent in the Euclidean parameter space to the steepest descent in the distribution space.

Since the standard Euclidean geometry is sub-optimal for the optimization of quantum variational algorithms, a quantum analog has the following form:


where refers to the pseudo-inverse and is associated with the specific architecture of the quantum circuit. The coefficient can be calculated using the Fubini-Study metric tensor and it reduces to the Fisher information matrix in the classical limit [21].

3.2 Theoretical Results

3.2.1 Quantum Natural Gradient Descent for the VQC

Before employing the QFNGD for a quantum federated learning system, we concentrate on the use of QNGD for a single VQC. For simplicity, we leverage a block-diagonal approximation to the Fubini-Study metric tensor for composing QNGD for the training of the VQC on the NISQ quantum hardware.

Given an initial quantum state and a PQC with layers, for we separately denote and as the unitary matrices associated with non-parameterized quantum gates and parameterized quantum ones.

Figure 3: An illustration of unitary matrices associated with the non-parametric and parametric gates. , the matrices correspond to the non-parametric gates, the matrices are associated with the parametric ones, and refers to the initial quantum state that is derived from the operation of TPE.

Consider a variational quantum circuit as:


Furthermore, any unitary quantum parametric gates can be rewritten as , where refers to the Hermitian generator of the gate . The approximation to the Fubini-Study metric tensor admits that for each parametric layer in the variational quantum circuit, the block-diagonal submatrix of the Fubini-Study metric tensor is calculated by




denotes the quantum state before the application of the parameterized layer . Figure 4 illustrates a simplified version of VQC, where , are related to non-parametric gates, and and correspond to the parametric gates with adjustable parameters , respectively. Since there are two layers, each of which owns two free parameters, the block-diagonal approximation is composed of two matrices, and . More specifically, and can be separately expressed as:




The elements of and compose as:


Then, we employ the Eq. (4) to update the VQC parameter .

Figure 4: A demonstration of the VQC approximation method based on the Fubini-Study metric tensor: (a) A block-diagonal approximation to VQC based on the Fubini-Study metric tensor; (b) a measurement of for ; a measurement of for .

3.2.2 Federated Quantum Natural Gradient Descent

A QFL system can be simply built by setting up VQC models in an FL manner, where the QNGD algorithm is applied for each VQC and the uploaded gradients of all VQCs are aggregated to update the model parameters of the global VQC. In mathematical, the FQNGD can be summarized as:


where and separately correspond to the model parameters of the global VQC and the

-th VQC model at epoch

, and represents the amount of training data stored in the participant , and the sum of participants’ data is equivalent to .

Compared with the SGD counterparts used for QFL, the FQNGD algorithm admits adaptive learning rates for the gradients such that the convergence rate could be accelerated according to the VQC model status.

1. The coordinator executes.
2. Initialize global parameter and broadcast it to all participants .
3. Assign each participant with training data, where .
4. For each global model update at epoch do
5.      For each participant in parallel do
6.          Attain by applying the QNGD for the -th VQC.
7.          Send the local gradient to the coordinator.
8.      End for
9.      The coordinator aggregates the received gradients .
10.    The coordinator updates the global model by taking Eq. (11).
11.     Broadcast the updated global model parameter to all participants.
12. End for

Algorithm 1 The FQNGD Algorithm for QFL

3.2.3 Empirical Results

To demonstrate the FQNGD algorithm for QFL, we perform the binary and ternary classification tasks on the standard MNIST dataset 

[9], specifically digits for the binary task and for the ternary one. There are a total of training data and test data for the binary classification, and training data and test data are assigned for the ternary classification. As for the setup of QFL in our experiments, the QFL system consists of identically local VQC participants, each of which owns the same amount of training data. The test data are stored in the global part and are used to evaluate the classification performance.

Figure 5: Simulation results of binary and ternary classification on the training set of the MNIST database. (a) the learning curves of various optimization methods for the binary classification; (b) the learning curves of various optimization methods for the ternary classification.

The experiments compare our proposed FQNGD algorithm with the other three optimizers: the naive SGD optimizer, the Adagrad optimizer [20] and the Adam optimizer [18]

. The Adagrad optimizer is a gradient descent optimizer with a past-gradient-dependent learning rate in each dimension. The Adam optimizer refers to the gradient descent method with an adaptive learning rate as well as adaptive first and second moments.

Methods SGD Adagrad Adam FQNGD
Acc. 98.48 98.81 98.87 99.32
Table 1: The simulation results of a binary classification in terms of accuracy.
Methods SGD Adagrad Adam FQNGD
Acc. 97.86 98.63 98.71 99.12
Table 2: The simulation results of a ternary classification in terms of accuracy.

As shown in Figure 5, our simulation results suggest that our proposed FQNGD method is capable of achieving the fastest convergence rate compared with other optimization approaches. It means that the FQNGD method can lower the communication cost and also maintain the baseline performance of both binary and ternary classification on the MNIST dataset. Moreover, we evaluate the QFL performance in terms of classification accuracy. The FQNGD method outperforms the other counterparts with the highest accuracy values. In particular, the FQNGD is designed for the VQC model and can attain better empirical results than the Adam and Adagrad methods with adaptive learning rates over epochs.

4 Conclusion and Discussion

This work focuses on the design of the FQNGD algorithm for the QFL system in which multiple local VQC models are applied. The FQNGD is derived from training a single VQC based on QNGD, which relies on the block-diagonal approximation of the Fubini-Study metric tensor to the VQC architecture. We put forth the FQNGD method to train the QFL system. Compared with other SGD optimization approaches with Adagrad and Adam optimizers, our experiments on the classification tasks on the MNIST dataset demonstrate the FQNGD method attains better empirical results than other SGD counterparts, while the FQNGD exhibits a faster convergence rate than the others, which implies that our FQNGD method suggests that it is capable of lowering the communication cost and can maintain the baseline empirical results.

Although this work focuses on the optimization methods for the QFL system, the decentralized deployment of high-performance QFL systems for adapting to the large-scale dataset is left for our future investigation. In particular, it is essential to consider how to defend against malicious attacks from adversaries and also boost the robustness and integrity of the shared information among local participants. Besides, another quantum neural network like quantum convolutional neural networks (QCNN) 

[8] is worth further study to constitute a QFL system.

5 Method

In our study, the Fubini-Study metric tensor is typically defined in quantum mechanics’ notations of mixed states. To explicitly equate this notation to the homogeneous coordinates, let


where is a set of orthonormal basis vectors for Hilbert space, the are complex numbers. is the standard notation for a point in the projective space of homogeneous coordinates. Then, given two points and in the space, the distance between and is


The Fubini-Study metric is the natural metric for the geometrization of quantum mechanics, and much of the peculiar behavior of quantum mechanics including quantum entanglement can be attributed to the peculiarities of the Fubini-Study metric.


  • [1] S. Amari (1998) Natural Gradient Works Efficiently in Learning. Neural Computation 10 (2), pp. 251–276. Cited by: §3.1.2.
  • [2] P. Ball (2021) Real-Time Error Correction for Quantum Computing. Physics 14, pp. 184. Cited by: §1.
  • [3] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al. (2020) Language Models Are Few-Shot Learners. Advances in Neural Information Processing Systems 33, pp. 1877–1901. Cited by: §1.
  • [4] M. Cerezo, A. Arrasmith, R. Babbush, S. C. Benjamin, S. Endo, K. Fujii, J. R. McClean, K. Mitarai, X. Yuan, L. Cincio, et al. (2021) Variational Quantum Algorithms. Nature Reviews Physics 3 (9), pp. 625–644. Cited by: §1.
  • [5] S. Y. Chen, C. Huang, C. Hsing, and Y. Kao (2021)

    An End-to-End Trainable Hybrid Classical-Quantum Classifier

    Machine Learning: Science and Technology 2 (4), pp. 045021. Cited by: §1.
  • [6] S. Y. Chen, C. H. Yang, J. Qi, P. Chen, X. Ma, and H. Goan (2020) Variational Quantum Circuits for Deep Reinforcement Learning. IEEE Access 8, pp. 141007–141024. Cited by: §1.
  • [7] S. Y. Chen and S. Yoo (2021) Federated Quantum Machine Learning. Entropy 23 (4), pp. 460. Cited by: §2, §2.
  • [8] I. Cong, S. Choi, and M. D. Lukin (2019) Quantum Convolutional Neural Networks. Nature Physics 15 (12), pp. 1273–1278. Cited by: §4.
  • [9] L. Deng (2012) The MNIST Database of Handwritten Digit Images for Machine Learning Research. IEEE Signal Processing Magazine 29 (6), pp. 141–142. Cited by: §3.2.3.
  • [10] J. Devlin, M. Chang, K. Lee, and K. Toutanova (2018) Bert: Pre-Training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805. Cited by: §1.
  • [11] Y. Du, M. Hsieh, T. Liu, S. You, and D. Tao (2021) Learnability of Quantum Neural Networks. PRX Quantum 2 (4), pp. 040337. Cited by: §1.
  • [12] L. Egan, D. M. Debroy, C. Noel, A. Risinger, D. Zhu, D. Biswas, M. Newman, M. Li, K. R. Brown, M. Cetina, et al. (2021) Fault-Tolerant Control of An Error-Corrected Qubit. Nature 598 (7880), pp. 281–286. Cited by: §1.
  • [13] Q. Guo, Y. Zhao, M. Grassl, X. Nie, G. Xiang, T. Xin, Z. Yin, and B. Zeng (2021) Testing A Quantum Error-Correcting Code on Various Platforms. Science Bulletin 66 (1), pp. 29–35. Cited by: §1.
  • [14] J. Hirschberg and C. D. Manning (2015) Advances in Natural Language Processing. Science 349 (6245), pp. 261–266. Cited by: §1.
  • [15] H. Huang, M. Broughton, J. Cotler, S. Chen, J. Li, M. Mohseni, H. Neven, R. Babbush, R. Kueng, J. Preskill, et al. (2022) Quantum Advantage in Learning from Experiments. Science 376 (6598), pp. 1182–1186. Cited by: §1.
  • [16] H. Huang, M. Broughton, M. Mohseni, R. Babbush, S. Boixo, H. Neven, and J. R. McClean (2021) Power of Data in Quantum Machine Learning. Nature Communications 12 (1), pp. 1–9. Cited by: §1.
  • [17] X. Huang, J. Baker, and R. Reddy (2014) A Historical Perspective of Speech Recognition. Communications of the ACM 57 (1), pp. 94–103. Cited by: §1.
  • [18] D. P. Kingma and J. Ba (2015) Adam: A Method for Stochastic Optimization. In Proc. International Conference on Representation Learning, Cited by: §3.2.3.
  • [19] J. Konečnỳ, H. B. McMahan, F. X. Yu, P. Richtárik, A. T. Suresh, and D. Bacon (2016) Federated Learning: Strategies for Improving Communication Efficiency. arXiv preprint arXiv:1610.05492. Cited by: §2.
  • [20] A. Lydia and S. Francis (2019) Adagrad—An Optimizer for Stochastic Gradient Descent. Int. J. Inf. Comput. Sci 6 (5), pp. 566–568. Cited by: §3.2.3.
  • [21] S. McArdle, T. Jones, S. Endo, Y. Li, S. C. Benjamin, and X. Yuan (2019) Variational Ansatz-Based Quantum Simulation of Imaginary Time Evolution. NPJ Quantum Information 5 (1), pp. 1–6. Cited by: §3.1.2.
  • [22] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas (2017) Communication-Efficient Learning of Deep Networks from Decentralized Data. In Artificial Intelligence and Statistics, pp. 1273–1282. Cited by: §2.
  • [23] J. Preskill (2018-08) Quantum Computing in the NISQ Era and Beyond. Quantum 2, pp. 79. External Links: ISSN 2521-327X Cited by: §1.
  • [24] J. Qi and J. Tejedor (2021)

    Classical-to-Quantum Transfer Learning for Spoken Command Recognition Based on Quantum Neural Networks

    Proc. IEEE International Conference on Acoustics, Speech and Signal Processing. Cited by: §1.
  • [25] J. Qi, C. H. Yang, P. Chen, and M. Hsieh (2022) Theoretical Error Performance Analysis for Variational Quantum Circuit Based Functional Regression. arXiv preprint arXiv:2206.04804. Cited by: §1.
  • [26] J. Qi, C. H. Yang, and P. Chen (2022) QTN-VQC: An End-to-End Learning Framework for Quantum Neural Networks. In NeurIPS 2021 Workshop on Quantum Tensor Networks in Machine Learning, Cited by: §1.
  • [27] S. Ruder (2016) An Overview of Gradient Descent Optimization Algorithms. arXiv preprint arXiv:1609.04747. Cited by: §3.1.1.
  • [28] R. Shokri and V. Shmatikov (2015) Privacy-Preserving Deep Learning. In Proc. ACM SIGSAC Conference on Computer and Communications Security, pp. 1310–1321. Cited by: §2.
  • [29] J. Stokes, J. Izaac, N. Killoran, and G. Carleo (2020) Quantum Natural Gradient. Quantum 4, pp. 269. Cited by: §2.
  • [30] E. Stoudenmire and D. J. Schwab (2016) Supervised Learning with Tensor Networks. Advances in Neural Information Processing Systems 29. Cited by: §3.1.1.
  • [31] A. Voulodimos, N. Doulamis, A. Doulamis, and E. Protopapadakis (2018) Deep Learning for Computer Vision: A Brief Review. Computational intelligence and neuroscience 2018. Cited by: §1.
  • [32] P. J. Werbos (1990) Backpropagation Through Time: What It Does and How to Do It?. Proceedings of the IEEE 78 (10), pp. 1550–1560. Cited by: §3.1.1.
  • [33] C. H. Yang, J. Qi, S. Y. Chen, P. Chen, S. M. Siniscalchi, X. Ma, and C. Lee (2021)

    Decentralizing Feature Extraction with Quantum Convolutional Neural Network for Automatic Speech Recognition

    In Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6523–6527. Cited by: §1.
  • [34] C. H. Yang, J. Qi, S. Y. Chen, Y. Tsao, and P. Chen (2022) When BERT Meets Quantum Temporal Convolution Learning for Text Classification in Heterogeneous Computing. In Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, Cited by: §1.