Log In Sign Up

Quantum Split Neural Network Learning using Cross-Channel Pooling

In recent years, quantum has been attracted by various fields such as quantum machine learning, quantum communication, and quantum computers. Among them, quantum federated learning (QFL) has recently received increasing attention, where quantum neural networks (QNNs) are integrated into federated learning (FL). In contrast to the existing QFL methods, we propose quantum split learning (QSL), which is the extension version of split learning. In classical computing, split learning has shown many advantages in faster convergence, communication cost, and even privacy. To fully utilize QSL, we propose crosschannel pooling which leverages the unique nature of quantum state tomography that is made by QNN. In numerical results, we corroborate that QSL achieves not only 1.64 privacy-preserving in the MNIST classification task.


Slimmable Quantum Federated Learning

Quantum federated learning (QFL) has recently received increasing attent...

QuantumFed: A Federated Learning Framework for Collaborative Quantum Training

With the fast development of quantum computing and deep learning, quantu...

Exact Decomposition of Quantum Channels for Non-IID Quantum Federated Learning

Federated learning refers to the task of performing machine learning wit...

Federated Quantum Natural Gradient Descent for Quantum Federated Learning

The heart of Quantum Federated Learning (QFL) is associated with a distr...

Federated Quantum Machine Learning

Distributed training across several quantum computers could significantl...

Quantum Federated Learning with Entanglement Controlled Circuits and Superposition Coding

While witnessing the noisy intermediate-scale quantum (NISQ) era and bey...

When BERT Meets Quantum Temporal Convolution Learning for Text Classification in Heterogeneous Computing

The rapid development of quantum computing has demonstrated many unique ...

I Introduction

With quantum supremacy and recent advance in quantum computing, the quantum algorithm is attracted as a solution to various computation-intensive problems, where vast information can be processed in a faster manner [du2015spl, xiong2022spl]. Indeed, Shor algorithm [Shor97] can reduce computations of factorization by quantum algorithm, and Grover search [Grover96]

reduces the searching cost, which is proven quantum supremacy by the real quantum device. Furthermore, IBM Quantum has publicized the 2022 development roadmap that 10–100k qubits quantum computers will be developed until 2026 

[roadmap2022], and more quantum algorithms will prove their supremacy by the advance of quantum computers. Therefore, the trends of quantum computing can be selected as a strong candidate for next-generation computing, and this leads to research innovation in quantum machine learning (QML).

Recent studies have started re-implementing existing ML applications using QNNs, e.g., image classification [baek2022scalable]

, reinforcement learning 

[yun2022quantum, icdcs2022yun], federated learning (FL) [yun2022slimmable, KWAK2022]. Especially, quantum federated learning (QFL) is widely used in QML because distributed learning can be implemented with quantum communications [xu2022privacy, li2021quantum]. Motivated by these trends, and split learning (SL) studies have shown its success in both privacy-preserving [DSN21], we extend the classical way to the quantum SL (QSL) framework. The naïve approach of the QSL framework can be re-implemented to a quantum version of classical SL. However, a naïve approach cannot deal with the quantum nature. According to [baek2022scalable, baek2022sqcnn3d, baek2022fv], the features obtained from QCNNs with naïve training are similar when input is given to the QCNNs. To cope with this problem, reverse fidelity training has been trained, which utilizes the quantum nature. When QSL is designed, the quantum nature helps QML and gives insights.

We aim to propose an extension of SL to QSL by replacing NNs with QNNs [DSN21]

. In addition, we propose a cross-channel pooling (C2Pool) that utilizes the expected value of probability amplitude, which is the nature of QNN’s output. We empirically show that leveraging C2Pool achieves higher accuracy, the lowest communication cost, and privacy preservation. The contributions of this paper are tri-folded, i.e., (1) We first propose the quantum split learning framework, (2) The proposed framework has considered the characteristic of quantum nature; and (3) The numerical experiments show that our framework outperforms naïve QSL or QFL.

Ii Related Work

Ii-a Basic Quantum Gates and Quantum Operations

We introduce the basic quantum gates. In a single qubit system, the quantum state is defined with two bases and probability amplitude in the Hilbert space as , where . The qubits quantum state is written as , where and stand for the probability amplitude and -th standard basis in Hilbert space, respectively. By the definition of Hilbert space (i.e., ), probability amplitude has following relationship, i.e., . The rotation and controlled

gates linearly transform the probability amplitude, which are denoted as


where , , and denote Pauli-, , and gate, respectively. By leveraging these basic gates, we can represent a universal unitary gate , which is a key element of QNN.

Ii-B Quantum Neural Network

We introduce the operation of quantum neural networks. As shown in Fig. 1, QNN is tripartite: the state encoder, quantum neural layer, and measurement [qoc2022wang].

State Encoder. Since uploading classical data to quantum circuits is challenging, the encoding methods for QNNs have been discussed such as basis encoding, amplitude encoding, and angle encoding. Among these methods, we introduce an advanced angle encoding technique, i.e., data-reuploading. Given the classical data , the data-uploading process is expressed as follows,


where denotes the initial quantum state, i.e., . The input data are splitted into with the interval of , , where and

denote the input vector which is composed by the first

to elements and the trainable parameters, .

Quantum Neural Layer. Quantum neural layers play a role in processing the quantum state

. It can be utilized as a quantum convolutional neural network (QCNN) and a quantum fully connected network (QFCN). The rotation gates and controlled gates (referred to (

1)) are utilized to compose the quantum neural layers. In the real-quantum device, designing a quantum neural layer is significant [quantumnas, qoc2022wang], because entangling qubits is challenging. However, simulating quantum circuits with classical computers is available. Thus, the quantum neural layer can be considered as a unitary operation, i.e., .

Measurement. To obtain the classical output, the quantum state should be projected to measure device. We introduce the measurement methods, i.e., projection-valued measure (PVM), positive operator-valued measure (POVM), and the trainable measurement method [simeone2022introduction, yun2022slimmable, yun2022quantum, yun2022projection].

  • [leftmargin=10pt]

  • PVM: The PVM makes a probability measure by projecting the quantum state into projector , and the output can be expressed as follows,


    PVM is known as a complete measurement, because it projects the quantum state into all possible bases.

  • POVM: Usually, the quantum state is projected into Pauli- measure devices for every qubit [Qiskit], where Pauli- is . The -th projection matrix is defined as . In POVM, it makes the expectation value of a projection

    as random variable denoted as

    where . Its expectation value of -th qubit is . POVM is known as a incomplete measurement, because it projects the quantum state into partial projection matrices.

  • Trainable Measurement: Trainable measurement is suggested by [lloyd2020quantum, schuld2022quantum]. To realize the trainable measurement, the concept of pole is proposed by implementing multi-agent reinforcement learning [yun2022quantum], and federated learning [yun2022slimmable].

Fig. 1: The general quantum neural network architecture.

Ii-C Optimizing Methods for QNN

We introduce a QNN training method, i.e., a classical-quantum hybrid method. Since optimizing QNN has a backward quantum propagation of phase errors (Baqprop) principle, classical computing is required to obtain the gradient of the objective function, known as a classical-quantum hybrid method. It consists of two processes: (1) gradient calculation via classical computing and (2) parameter-shift rule [qoc2022wang]

. Suppose that the loss function is denoted as

where is the QNN parameters. The loss gradient of -th QNN parameter is written as , where denotes the measured output. The classical computer calculates and the term is calculated by quantum computers and forward propagation as follows,


where is one-hot vector that eliminate all components except , i.e., .

Ii-D Recent Studies on Quantum Distributed Learning

We introduce the recent studies of quantum distributed learning including quantum federated learning (QFL). QFL is an architecture inspired by the classical FL which is improved by utilizing QML. QFL model has many similarities with classical FL because it inherits many characteristics of the classical version. Multiple devices and a server is involved in QFL which transmit aggregated model parameters repeatedly. In addition, QFL with quantum blind computing is proposed to enhance the privacy-preserving characteristic by utilizing differential privacy with quantum computers. However, due to the qubit constraint, the application of QFL is severely limited. In [chen2021federated, huang2022quantum], binary classification is used as the simulation environment which is a relatively simple task. This is because leveraging spatial information requires many qubits. To cope with this issue, hybrid-QFL is proposed [chen2021federated], where the pretrained feature extractor extracts low-dimensional features from images. On the other hand, in our work, we focus on overcoming all these issues of QFL. We will elaborate on this in the next section.

Iii Quantum Split Learning

Fig. 2: The illustration of privacy-preserving quantum split learning framework.

Iii-a Setup

In this paper, local devices and one server participate in our proposed quantum split learning. As shown in Fig. 2, each local device has its local data , and quantum neural networks. Each local device has QCNNs and one cross-channel pooling QNN. The local data and local QCNNs cannot be shared with anyone, whereas features and labels can be shared. The server processes the feature and finally obtains corresponding predictions.

Iii-B Local-side Operations

Quantum Image Processing. The classical data of local is denoted as

, and the QCNN filters scan the image with stride

where the filter size is . In this paper, we describe the patch , where denotes the number of output channels. The patch is uploaded to QCNN via the data-reuploading method referred to (2). We adopt a scalable quantum convolutional neural network (sQCNN [baek2022scalable]) as QCNN. With QCNN and a patch , the feature is obtained, which is denoted as , where denotes the number of output channels.

Cross-Channel Pooling QNN. Similarly, the probability amplitude is obtained by QNN, and the output is denoted as , where . We propose a cross-channel pooling method by inner-producting of and . Therefore, the output is expressed as follows,


After scanning all patches, the output features are obtained . Then, the local device transmits its feature-label set to server. Note that the feature-label set is a classical datum, even if it is processed with QNN.

Iii-C Server-side Operations

Prediction. On the server side, QNN processes the features. The features are processed with QCNNs and QNN. The operations of QCNNs are the same as the local side, whereas the operation of QNN makes the multi-class classification. For this, we leverage the projection-valued measure which is one of the quantum state projection method [yun2022projection]. Then, we have a prediction value when quantum state is projected into bases states, i.e., . Note that our method is not based on the softmax function, i.e., it doesn’t require softmax tuning parameters [jerbi2021variational].

QNN Training. We elaborate on how to train QNN. The overall process is presented in Algorithm LABEL:alg:qsl. First of all, -th local device generates cross-channel pooled features from its data (lines 4–5). The local device transmits the features and labels to the server (line 6). After the server receives all features and labels, the server makes predictions by feed-forwarding features. For simplicity, we denote the trainable parameters as . We use the objective function for PVM, suggested by [yun2022projection], as follows,


where , and denote the minibatch, the binary cross entropy, and regularizer term. The terms and are expressed as,


where denotes the prediction value on class obtained by (3). Finally, in lines 8–12, the loss gradients of local QNNs and global QNN are calculated and updated by (4).


Iv Performance Evaluation

Iv-a Setup

The benchmark schemes are designed as follows.

  • [leftmargin=10pt]

  • Advantage of QSL: We compare the proposed framework to QFL with federated averaging (named ‘QFedAvg’), and standalone training. We adopt top-1 accuracy for the metric. For QFedAvg, we adopt 10 local iterations per communication round.

  • Effectiveness of cross-channeling pooling: We conduct an ablation study of cross-channel pooling. To benchmark, we design the comparison framework as QSL without any pooling and QSL with channel-averaging pooling. For this, we measure top-1 accuracy and communication cost.

  • Scalability: Since the scalability of the number of local devices is important in distributed machine learning, we conduct a various number of local devices, i.e., .

We have conducted 10 classes of classification tasks with MNIST, FashionMNIST, and CIFAR10, where all data are bicubic interpolated to

size. Note that all data are independent and identically distributed. All local devices have 512 data for each, and the same as the batch size. In addition, each local and server models consist of trainable parameters where U3 and CU3 gates are utilized [u3gate, cu3gate]. We use Adagrad optimizer with an initial learning rate . In addition, we have conducted all experiments in a classical computer with Torchquantum and python library [quantumnas].

TABLE I: Top-1 accuracy of various quantum distributed learning framework and dataset ().
Fig. 3: Learning curve with various datasets.

Iv-B Experimental Results

Performance. Table I shows the top-1 accuracy. Regarding MNIST dataset, quantum split learning shows 1.64% and 6.83% higher than QFedAvg and standalone training. In addition, QSL shows 2.37% and 6.76% higher accuracy than QFedAvg and standalone training. Lastly, a similar tendency has been observed regarding CIFAR10. Thus, our algorithm outperforms other algorithms.

Effectiveness of C2Pool. Table II shows the various results of C2Pool ablation. Regarding top-1 accuracy, QSL with average-pooling and without pooling are inferior to QSL with C2pool. In addition, QSL with C2Pool shows the smallest domain gap between train data and test data. In the communication aspect, we calculate the communication cost per feature as follows,


The average pooling and C2Pool reduce 16 channels to a single channel, whereas QSL without pooling transmits the original feature that consists of 16 channels to the server, respectively. In summary, our proposed C2Pool improves performance as well as reduces gap and communication costs.

Scalability. Fig. 4 shows the experiment that increase the number of devices from to . When the number of local devices is 1, it means standalone training. When , the top-1 accuracy achieves 68.81%, which is the highest. Therefore, it can be confirmed that higher accuracy can be obtained, as the number of local devices increases.

Privacy-Preserving Aspect. Fig. 5 presents the various feature representation corresponding to different quantum split learning frameworks. It is apparent that the all features is harder recognizable than the original image, because the original image becomes distorted after passing through convolution. Especially in C2Pool, the outline of digits does not seem clear, whereas convolution without pooling and convolution with average-pooling clearly show the outline. This is, our proposed C2Pool has a strength to privacy-preserving aspect.

Metric Train Acc. Test Acc. Commun. Cost [Byte]
w.o/ pooling
w/ avg-pooling
w/ C2Pool
TABLE II: Ablation study on cross-channel pooling in the quantum split learning framework given MNIST dataset.
Fig. 4: Impact on the number of devices given MNIST dataset.
Fig. 5: The various output features that local client transmits to the server given MNIST image corresponding to different quantum split learning architectures.

V Conclusions and Future Work

In this paper, we propose a quantum split learning framework. To utilize the full potential of quantum split learning, we propose the probability amplitude-based cross-channel pooling method. We corroborate that our proposed framework achieves reasonable performance, scalability on the number of local devices, low communication cost, and privacy-preserving in 10 classification tasks. Since this work is the first quantum split learning framework, the extension to the detailed split learning can be one research direction. Our future work is for analyzing the differential privacy in our proposed cross-channel pooling method.