QTN-VQC: An End-to-End Learning framework for Quantum Neural Networks

10/06/2021 ∙ by Jun Qi, et al. ∙ ibm Georgia Institute of Technology 0

The advent of noisy intermediate-scale quantum (NISQ) computers raises a crucial challenge to design quantum neural networks for fully quantum learning tasks. To bridge the gap, this work proposes an end-to-end learning framework named QTN-VQC, by introducing a trainable quantum tensor network (QTN) for quantum embedding on a variational quantum circuit (VQC). The architecture of QTN is composed of a parametric tensor-train network for feature extraction and a tensor product encoding for quantum embedding. We highlight the QTN for quantum embedding in terms of two perspectives: (1) we theoretically characterize QTN by analyzing its representation power of input features; (2) QTN enables an end-to-end parametric model pipeline, namely QTN-VQC, from the generation of quantum embedding to the output measurement. Our experiments on the MNIST dataset demonstrate the advantages of QTN for quantum embedding over other quantum embedding approaches.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The state-of-the-art machine learning (ML), particularly based on deep neural networks (DNN), has enabled a wide spectrum of successful applications ranging from the everyday deployment of speech recognition 

(Deng et al., 2013)

and computer vision 

(Sermanet et al., 2014) through to the frontier of scientific research in synthetic biology (Jumper et al., 2021). Despite rapid theoretical and empirical progress in DNN based regression and classification (Goodfellow et al., 2016), DNN training algorithms are computationally expensive for many new scientific applications, such as new drug discovery (Smalley, 2017), which requires computational resources that are beyond the computational limits of classical hardwares (Freedman, 2019). Fortunately, the imminent advent of quantum computing devices opens up new possibilities of exploiting quantum machine learning (QML) (Biamonte et al., 2017; Schuld et al., 2015; Schuld and Petruccione, 2018; Schuld and Killoran, 2019; Saggio et al., 2021; Dunjko, 2021) to improve the computational efficiency of ML algorithms in the new scientific domains.

Although the exploitation of quantum computing devices to carry out QML is still in its initial exploratory stages, the rapid development in quantum hardware has motivated advances in quantum neural networks (QNN) to run in noisy intermediate-scale quantum (NISQ) devices (Preskill, 2018; Huggins et al., 2019; Huang et al., 2021; Kandala et al., 2017)

. A NISQ device means that not enough qubits could be spared for quantum error correction, and the imperfect qubits have to be directly used at the physical layer. Even though, a compromised QNN approach is proposed by employing hybrid quantum-classical models that rely on the optimization of variational quantum circuits (VQC) 

(Benedetti et al., 2019; Mitarai et al., 2018). The resilience of the VQC based models to certain types of quantum noise errors and high flexibility concerning coherence time and gate requirements (McClean et al., 2018) admit many practical implementations of QNN on NISQ devices (Chen et al., 2020a; Yang et al., 2021; Du et al., 2020, 2021; Skolik et al., 2021; Dunjko et al., 2016; Jerbi et al., 2021; Ostaszewski et al., 2021). One notable limitation in the current QNN training pipeline is that the quantum embedding is not fully realizable in a quantum computer, which may impede the learning of the QNN. Hence, this work proposes QTN-VQC to enable an end-to-end trainable QNN, including data embedding to quantum measurements, that are easily realizable in quantum devices, where QTN stands for the quantum tensor network (Orús, 2019; Huckle et al., 2013; Biamonte et al., 2017; Murg et al., 2010) for generating quantum embedding.

As shown in Figure 1, our QNN builds a unitary linear operator that consists of three main components: (1) quantum embedding generation; (2) variational quantum circuit; (3) measurement. Quantum embedding generation, also known as quantum encoding, applies a fixed unitary linear operator

transforming classical vectors

x to quantum states in a Hilbert space. This step is an important aspect of designing quantum algorithms that directly impact the entire computation cost of VQC and owns a characteristic of quantum superposition. Moreover, the VQC comprises two types of quantum gates: (1) Controlled-NOT (CNOT) gates; (2) learnable parametric quantum gates. The CNOT gates ensure the property of quantum entanglement through mutually connecting the qubits, and the parametric quantum gates can be adjustable to best fit the quantum input states. The model parameters of VQC should be optimized by employing variants of gradient descent algorithms during the training process. Those parametric quantum gates of VQC are similar to the weights assigned to DNN, and such quantum circuits have been justified to be resilient to quantum noises (Farhi et al., 2014; Kandala et al., 2017; McClean et al., 2016). Besides, the measurement aims at projecting the quantum output states to one classical output .

Figure 1: An illustration of QNN based on VQC.

This work focuses on quantum embedding generation because it is quite related to the practical usage in machine learning applications in terms of computational cost and representation capability of classical input features. In particular, we design a novel quantum tensor network (QTN) for quantum embedding generation. More specifically, the QTN consists of a tensor-train network (TTN) for dimension reduction and a quantum tensor encoding framework for outputting quantum embeddings. The dimension reduction is a necessary procedure before the quantum encoding because only a small number of qubits could be supported on available NISQ computers at this moment. A typical approach for dimension reduction relies on a classical fully-connected layer, also known as a dense layer, to convert high-dimensional input vectors

y into low-dimensional ones x. However, since a dense layer cannot be physically mapped on a quantum computer, much overhead has to be incurred by frequently communicating between classical and quantum devices during the end-to-end training pipeline.

Figure 2: Different paradigms for quantum embedding. (a) a dense layer is used to generate low-dimensional vector x from a high-dimensional one y; (b) a TTN is used for dimension reduction.

As shown in Figure 2 (b), one of our contribution is to leverage a tensor train network (TTN) to replace the dense layer in Figure 2 (a). The benefits of applying TTN arise from two aspects: (1) TTN can maintain the representation power of the dense layer, which will be justified in our theorems; (2) TTN is a tensor network and can be flexibly placed in quantum computers, which enables an end-to-end training process fully conducted in a quantum computer. Moreover, in this work, a tensor product encoding (TPE) is delicately designed for generating quantum embedding, which builds the relationship between a classical vector x and the corresponding quantum state

; Besides, we further investigate the representation of QTN-VQC in terms of model size and non-linear activation function used in TTN. We denote a QTN as the combination of TTN and TPE and utilize QTN-VQC as a genuine end-to-end learning framework for QNN.

2 Related Work

The work (Schuld and Petruccione, 2018; Biamonte et al., 2017; Dunjko and Briegel, 2018) demonstrate that VQC shows great promise in surpassing the performance of classical ML. Prominent examples of VQC based models include quantum approximate optimization algorithm (QAOA) (Farhi et al., 2014), and quantum circuit learning (QCL) (Mitarai et al., 2018). Various architectures and geometries of VQC have been shown in tasks ranging from image classification (Henderson et al., 2020; Chen et al., 2020b; Kerenidis et al., 2020)

to reinforcement learning 

(Chen et al., 2020a).

As for quantum embedding, basis encoding is the process of associating classical input data in the form of binary strings with the computational basis state of a quantum system (Leymann and Barzen, 2020). Similarly, amplitude encoding is a technique involving encoding data into the amplitudes of a quantum state (Soklakov and Schack, 2006). Unfortunately, the computational cost of both quantum embedding and amplitude encoding becomes exponentially expensive with the increasing number of qubits (Schuld and Killoran, 2019). A new technique of angle embedding makes use of the quantum gates to generate quantum states (Fu et al., 2011), but it cannot deal with the high-dimensional feature inputs. Therefore, this work exploits the use of TTN for dimension reduction followed by a TPE for generating quantum embedding.

In particular, this work employs the TTN for dimensionality reduction. The TTN model based on TT decomposition in neural networks was first proposed in (Oseledets, 2011)

, and it could be flexibly extended the convolutional neural network (CNN) 

(Garipov et al., 2016)

and recurrent neural network (RNN) 

(Tjandra et al., 2017). The empirical study of TTN on machine learning tasks shows that TTN is capable of maintaining the DNN baseline results (Qi et al., 2020a; Yu et al., 2017; Yang et al., 2017; Jin et al., 2020). However, to our best knowledge, no existing works have applied TTN to QML. Besides, since the tensor network-based machine learning model like TTN is closely related to quantum machine learning in terms of their model structures (Liu et al., 2018; Gao et al., 2017), the QTN-VQC model can be directly regarded as the classical simulation of the corresponding quantum machine learning. In addition to a classical dense layer, more complicated architectures like AlexNet (Lloyd et al., 2020) could be used for dimension reduction, and we also compare the performance between TTN and AlexNet-based models.

3 Notations

We denote as a -dimensional real coordinate space, and refers to a space of -order tensors. The symbol represents a -order multi-dimensional tensor in , and the symbols and represent a vector and a matrix, respectively.

For the notations of quantum computing, , the symbol denotes a quantum state associated with a -dimensional vector in a Hilbert space. Particularly, and .

The quantum gate means a Pauli- gate with a unitary operator as defined in Eq. (1), which implies a qubit rotates the Bloch sphere along the Y-axis by a given angle .

(1)

Moreover, the operator is a tensor product. Given the vectors , the tensor product of vectors is defined as , which is a -dimensional vector and can provide a compact representation for . Similarly, the symbol means a tensor product of quantum states of . Furthermore, for a scalar , the quantum state can be written as:

(2)

4 QTN-VQC: Our Proposed End-to-End Learning Framework

This section introduces our proposed end-to-end learning framework, namely QTN-VQC in this work. As shown in Figure 3, the QTN model includes two components (a) TTN and (b) TPE, which will be separately introduced in Section 4.1 and Section 4.2. Moreover, Figure 4 illustrates the framework of VQC and Section 4.3 is devoted to discussing the details of VQC.

Figure 3: A demonstration of quantum tensor network for quantum embedding.

4.1 Tensor Train Network for Dimension Reduction

We leverage TTN (Novikov et al., 2015) for the dimension reduction of input features. TTN relies on the TT decomposition (Oseledets, 2011) and has been commonly employed in machine learning tasks like speech processing (Qi et al., 2020b) and computer vision (Yang et al., 2017). The TT decomposition assumes that given a set of TT-ranks , a -order tensor is factorized into the multiplication of -order tensors . In more detail, given a set of indices , is decomposed as:

(3)

where , . Since , the term is a scalar value.

TTN employs the TT decomposition in a dense layer and is explicitly demonstrated in Figure 3 (a). In more detail, for an input tensor and an output tensor , we achieve

(4)

where , and which results in a scalar because of the ranks ; is closely associated with as defined in Eq. (3), if each index is set. The multi-dimensional tensor is decomposed into the multiplication of -order tensors

. A non-linear activation function, e.g., Sigmoid, Tanh, and ReLU, is imposed upon the tensor

. Compared with a dense layer with parameters, a TTN owns as few as trainable parameters.

When a TTN is utilized for the dimension reduction, the high-dimensional input vector is first reshaped into a tensor , and then we can represent as a TT format that goes through TTN. The outputs of TTN can be converted back to a tensor , which is further reshaped to a lower dimensional vector . Here, we define and . Moreover, the computational complexities of TTN and the related dense layer are in the same scale, which is discussed in (Yang et al., 2017).

Eq. (4) suggests that TTN is a multi-dimensional extension of a dense layer, where the trainable weight matrix of a dense layer is changed to the learnable core tensors. Additionally, many empirical studies demonstrate that a TTN is capable of maintaining the baseline results of the dense layer (Qi et al., 2020b; Yang et al., 2017; Novikov et al., 2015; Qi et al., 2020a). More significantly, since TTN can be flexibly mapped into a quantum circuit, the quantumness inherent in TTN brings great advantages over other architectures like the dense layer. In other words, although TTN is treated classically, it is possible to substitute equivalent quantum circuits for TTN when more qubits become available (Du et al., 2020), which implies that QTN-VQC stands for a genuine end-to-end QNN learning architecture on a quantum computer.

Furthermore, since the gradient exploding and diminishing problems are serious issues in the TTN training. To avoid those training problems, we only consider -order core tensors and small TT-ranks to configure a simple TTN in our experimental simulations. Our theoretical analysis of QTN-VQC based on Theorem 3 in Section 5 suggests that the representation power is not related to TT-ranks and the tensor order , thus small TT-ranks and the tensor order are preferred. In particular, a lower can significantly reduce the computational cost and speed up the convergence rate.

4.2 Tensor Product Encoding

In this subsection, we first introduce Theorem 1, and then we derive our TPE associated with the circuits in Figure 3 (b).

Theorem 1.

Given the classical vector , a TPE as shown in Figure 3 (b) can result in a quantum state with the following complete vector representation as:

(5)
Proof.

Since each element in the vector x can be written as , the quantum state can be written as:

(6)

When the vector x goes through the quantum tensor network, which implies the following as:

(7)

The preceding equation, in turn, implies that Eq. (5). ∎

Theorem 1 builds a connection between the vector x and the quantum state , and the resulting is taken as the quantum embedding as the inputs to VQC. Since is a reversely unitary linear operator, there is no information loss incurred during the stage of quantum encoding. Furthermore, if the input is multiplied with a constant , we obtain the following term as:

(8)

which corresponds to Figure 3 (b).

4.3 The Framework of Variational Quantum Circuit

The framework of VQC is shown in Figure 4 (a), where 4 qubit wires are taken into account, and the CNOT gates aim at mutually entangling the channels such that , , and lie in the same entanglement state. The Pauli-X, Y, Z gates , and with learnable parameters , , , are built to set up the learnable part. Being similar to the unitary operators of , and , which are defined in Figure 4 (b), are separately associated with the rotations along X-axis and Z-axis by the given angles of and . Besides, the quantum circuits in the dash square can be repeatedly copied to compose a deeper architecture. The outputs of VQC are connected to the measurement which projects the quantum states into a certain quantum basis that becomes a classical scalar .

Figure 4: A framework of variational quantum circuit.

As for the end-to-end training paradigm for QTN-VQC, the learnable parameters come from the VQC and TTN models, and they should be updated by applying the back-propagation algorithm based on the Adam optimizer. Given qubits and depths, there are totally trainable parameters for VQC. Consequently, there are parameters for QTN-VQC. On the other hand, the Dense-VQC model possesses more model parameters than QTN-VQC ( vs. ).

5 Characterizing Representation Power of QTN-VQC

This section focuses on analyzing the representation power of QTN-VQC. As shown in Figure 5, given qubits and a target quantum state , since is known as a linear operator and is defined as a definite mapping from input x

to the unitary matrix

, the representation power of QTN-VQC is determined by how TTN can approximate the classical vector . To understand the expressiveness of TTN, we first start with the discussion on the expressive capability of Dense-VQC (a dense layer is taken for dimension reduction) and then generalize it to QTN-VQC. Based on the universal approximation theorem (Cybenko, 1989; Barron, 1994)

for a feed-forward neural network, we derive the following theorem as:

Theorem 2.

Given a target vector , there exists a feed-forward neural network with a dense layer connecting to qubits, then

(9)

where the activation function is imposed upon the dense layer, and is a constant associated with the target vector .

Figure 5: An illustration of analyzing the representation power of QTN-VQC.

Since TTN is a compact TT representation of a dense layer, by modifying Theorem 2 for TTN, we can also derive the upper bound on the approximation error as follows:

Theorem 3.

Given a target vector , there exists a TTN, denoted as , with a TT layer connecting to qubits, then

(10)

where , the Sigmoid activation function is imposed upon the TTN model, denotes the multi-dimensional order, is a constant associated with the target vector .

Comparing the two upper bounds, it is observed that TTN can attain an identical upper bound as the dense layer on the approximation error because . That implies that TTN can at least maintain the representation power of a dense layer. Besides, the number of qubits is a key factor determining the upper bound on the approximation error. However, is a small fixed number on a NISQ device, and a larger number of qubits is expected to further improve the representation power of QTN-VQC. However, the computational costs of classical simulation may grow exponentially with the increasing number of qubits, and a small number of qubits have to be considered in practice.

6 Experiments and Results

6.1 Experimental setups

We assess our QTN-VQC based end-to-end learning system on the standard MNIST. MNIST is a dataset for the task of digit classification, where there are and image data assigned for training and testing, respectively. The full MNIST dataset is challenging for quantum machine learning algorithms, and many works only consider 2-digit classification on the MNIST task (Wang et al., 2021; Chen et al., 2020b). Moreover, the image data are separately reshaped into

dimensional input vectors. Dense-VQC and PCA-VQC are taken as our experimental baselines to compare with our QTN-VQC model. Dense-VQC denotes that a dense layer is used for dimension reduction, and PCA-VQC refers to using principal component analysis (PCA) to extract low-dimensional features before training the VQC parameters.

As for the experiments of QTN-VQC, the image data are reshaped into 3-order tensors. We set small TT-ranks as to reduce the computational cost of TTN. the image data are represented as the TT format according to Eq. (3) before going through the TTN model. Since qubits are used for the quantum encoding, the output of TTN needs to configure the tensor format as , which results in

dimensional output vectors. Besides, the model parameters of QTN-VQC are randomly initialized based on the Gaussian distribution, and the back-propagation algorithm is applied to train the models. The Sigmoid function is utilized for the hidden layers of TTN.

To be consistent with QTN-VQC, the weight of the dense layer for Dense-VQC is configured as the shape of . Although Dense-VQC is a hybrid classical-quantum model, the training process of Dense-VQC can also be set as an end-to-end pipeline and the weights of the dense layer are updated during the training stage. The Sigmoid function is used for the dense layer. On the other hand, PCA is employed to reduce the feature dimension to , and the resulting low-dimensional features are further encoded into quantum states. Consequently, PCA-VQC admits the VQC parameters solely to be updated during the training stage. A standard AlexNet (Iandola et al., 2016) is employed to constitute an AlexNet-VQC to compare the performance.

Moreover, 6 VQC layers are constructed to form a deep model, and the outputs of the VQC model are connected to classes with a non-trainable matrix. The back-propagation algorithm based on the Adam optimizer with a learning rate of is employed for the model training. The loss of cross-entropy (CE) is utilized as the objective function during the training stage, and it is also taken as the metric to evaluate the model performance. We leverage the tools of Pennylane (Bergholm et al., 2018)

and PyTorch 

(Paszke et al., 2019) to simulate the model performance. In particular, we separately simulate the model performance with noiseless quantum circuits and noisy quantum circuits corrupted by quantum noises from IBM quantum machines.

6.2 Experimental Results of Noiseless Quantum Circuit

Table 1 shows the final results of the models on the test dataset. QTN-VQC owns much fewer model parameters than Dense-VQC ( vs. ) and attains even higher classification accuracy than Dense-VQC ( vs. ) and lower loss values than Dense-VQC ( vs. ). However, PCA-VQC with trainable VQC parameters attains the worst performance by all metrics, which implies that a trainable quantum embedding is of significance to boost experimental performance. Although our empirical results cannot reach the state-of-the-art classification performance of classical ML algorithms, our empirical results demonstrate the advantages of QTN-VQC over the PCA-VQC and Dense-VQC counterparts. With the development of more powerful quantum devices supporting more qubits, the representation power of QTN-VQC can be improved and better experimental results could be attained. Moreover, AlexNet-VQC achieves better results than QTN-VQC (), but it involves more model parameters than QTN-VQC.

Models Params CE Acc ().
PCA-VQC 144 0.5877 82.48 1.02
Dense-VQC 6416 0.4132 88.54 0.73
AlexNet-VQC 0.2562 92.81 0.47
QTN-VQC 328 0.3090 91.43 0.51
Table 1: Empirical results on the MNIST test dataset under the noiseless quantum circuit setting.

6.3 Experimental Results of Noisy Quantum Circuit

To empirically validate the effectiveness of our proposed VQC algorithm, we proceed with the simulation of the practical experiments with noisy quantum circuits. More specifically, we follow an established noisy circuit experiment with the NISQ device suggested by (Chen et al., 2020a). One major advantage of the setups is to observe the robustness and preserve the quantum advantages of a deployed VQC with physical settings being close to quantum processing unit (QPU) experiments without an executive queuing time. As for the detailed setup, we first use an IBM Q 20-qubit machine to collect channel noise in the real scenario for a deployed VQC and upload the machine noise into our Pennylane-Qiskit simulator (denoted as Acc. We provide a depolarizing noisy circuit simulation (denoted as Acc) based on a depolarizing channel attained from (Nielsen and Chuang, 2002) with a noise level of . As shown in Table 2, the quantum noise brings about the performance degradation of all models, but our proposed QTN-VQC consistently outperforms PCA-VQC and Dense-VQC in the condition of noisy quantum circuits. In particular, QTN-VQC can even outperform the AlexNet-VQC counterpart in noisy circuit conditions.

Models Params Acc () Acc ()
PCA-VQC 144 81.23 1.34 81.98 1.17
Dense-VQC 6416 84.55 1.22 86.09 1.04
AlexNet-VQC 87.46 1.34 87.86 1.08
QTN-VQC 328 88.12 1.09 89.32 1.07
Table 2: Empirical results on the MNIST test dataset under the noisy quantum circuit setting.

6.4 Further Discussions

The above experimental results show the advantages of QTN-VQC over Dense-VQC and PCA-VQC in the scenarios with noiseless and noisy quantum circuits. Next, we will further discuss the representation power of QTN-VQC based on two factors: (1) the activation function used in TTN; (2) the number of qubits.

6.4.1 The activation function used in TTN

Table 3 compares the results of QTN-VQC based on different activation functions. Our simulation on noiseless quantum circuits shows that the non-linear activation functions can bring more performance gain than a linear one, but the Sigmoid function attains a better performance than the Tanh and ReLU counterparts in our experiments. Our experiments also correspond to the universal approximation theory for QTN-VQC in Theorem 3.

Models CE Acc ().
QTN-VQC (Linear) 0.4958
QTN-VQC (Tanh) 0.4792
QTN-VQC (ReLU) 0.3764
QTN-VQC (Sigmoid) 0.3090 91.43 0.54
Table 3: Comparing performance of QTN-VQC with different activation functions.

6.4.2 The number of qubits

Finally, we investigate the effects of the number of qubits on the performance of QTN-VQC by increasing the qubits from to and . Accordingly, the output of TTN is configured as a tensor format of , and the model size is increased from to and , respectively. Our experiments show that the baseline performance of QTN-VQC can be further improved by increasing the number of qubits, which implies that more qubits are likely to possess higher accuracy.

Models Params CE Acc ()
QTN-VQC (8 qubits) 328 0.3090 91.43 0.51
QTN-VQC (12 qubits) 464 0.2679 92.36 0.62
QTN-VQC (16 qubits) 600 0.2355 92.98 0.52
Table 4: Comparing performance of QTN-VQC with more qubits.

7 Conclusions

This work proposes a genuine end-to-end learning framework for quantum neural networks based on QTN-VQC. QTN consists of a TTN for dimension reduction and a TPE framework for generating quantum embedding. The TTN model is a compact representation of a dense layer to classically simulate quantum machine learning algorithms. Our theorem on the representation of QTN-VQC shows that the number of qubits is inversely related to the approximation error of QTN-VQC and the non-linear activation plays an important role. Our experiments compare our proposed QTN-VQC with Res-VQC, Dense-VQC, and PCA-VQC. Our simulated results demonstrate that QTN-VQC obtains better experimental performance than Dense-VQC and PCA-VQC with both noiseless and noisy quantum circuits, and it achieves marginally worse performance than AlexNet-VQC. Besides, our results justify our theorem on the representation power of QTN-VQC.

References

Appendix A Appendix

The section of appendix includes the proofs for Theorem 2 and Theorem 3.

a.1 Proof for Theorem 2

Proof.

Theorem 2 is derived from the modification of the universal approximation theory proposed by [55, 54]. The universal approximation theory is shown in Lemma 1, which suggests that a feed-forward neural network with neurons can approximate any continuous function with arbitrarily small .

Lemma 1.

Given a continuous target function , we can employ a 2-layer neural network with a non-linear activation , such that

(11)

where denotes the number of neurons, and is a constant associated with . In particular, for , satisfies the following condition as:

(12)

where .

To associate Lemma 1 with our Theorem 2, the target function is replaced with the target vector , then there is a neural network with a dense layer connected to qubits such that

(13)

where is related to the target vector . ∎

a.2 Proof for Theorem 3

Proof.

Assume that , and the TT decomposition of target vector is , then we obtain

(14)

On the other hand, we denote and as the vectorization of the tensors and , respectively. We also define , as the TTN parameters, and also define as the matricization of . Moreover, refers to a non-linear activation function.

Since that corresponds to a dense layer, we can obtain that

(15)

In sum, we can further obtain

(16)

where . ∎

Appendix B Appendix

This section includes additional experimental simulations. First, we assess the settings of TT-ranks, and then we compare the convergence rates of QTN-VQC and Dense-VQC in the experiments.

b.1 Experiments on TT-ranks for QTN-VQC

Table 5 corresponds to the experiments of QTN-VQC with qubits and the Sigmoid function. The empirical results suggest that the larger TT-ranks cannot result in better results than the smaller ones. The main reason is that the TT-ranks can correspond to a manifold, and there may potentially exist an optimal manifold with smaller TT-ranks that corresponds to the best performance.

TT-ranks Params CE Acc ()
{1, 2, 2, 1} 328 0.3090 91.43 0.51
{1, 4, 4, 1} 768 0.3082 91.46 0.53
{1, 6, 6, 1} 1464 0.3079 91.47 0.52
Table 5: Comparing performance of different TT-ranks for QTN-VQC

b.2 A comparison of convergence rates

Next, we analyze the computational complexity for TTN for QTN-VQC. In more detail, given the TT-ranks , a multi-dimensional tensor is factorized into several -order tensors , the computational complexity of the feed-forward process is in the scale of . In contrast, the computational overhead for a dense layer is in the scale of . It means that smaller TT-ranks can reduce the computational cost for QTN-VQC, which explains that smaller TT-ranks is configured in our experiments of QTN-VQC.

Figure 6: A comparison of convergence rates for different models.

Empirically, we compare the convergence rates of different models on the test data in our experiments. In our experimental settings with the Tanh activation function and qubits, the QTN-VQC model consistently attains a faster convergence rate than the Dense-VQC and PCA-VQC counterparts. Moreover, Table 6 compares the absolute running time of QTN-VQC with Dense-VQC and AlexNet-VQC. Since our experiments are conducted on the same GPUs and CPUs, the training time of all models can be comparable. Our evaluation shows that QTN-VQC is marginally slower than Dense-VQC, but it is much faster than AlexNet-VQC.

Models Dense-VQC AlexNet-VQC QTN-VQC

Time/epochs (mins)

58 75 61
Table 6: Comparing performance of different TT-ranks for QTN-VQC

Appendix C Experiments of Labeled Faces in the Wild (LFW)

c.1 Experimental setups

The LFW is a dataset for the task of unconstrained face recognition, which is composed of

images with the shape of . The shape of We randomly split all the datasets into training data, test data. qubits are used for VQC, and the shape of the input tensor is set as . The other settings are kept the same as the configurations for the MNIST task.

c.2 Experimental results

Table 6 presents the simulation results under the noiseless quantum circuit condition, while Table 7 demonstrates the empirical results in the setting of noisy quantum circuits. The QTN-VQC outperforms the Dense-VQC counterpart ( vs. ), and it owns much fewer model parameters ( vs. ). The experimental results on the LFW dataset also highlight the advantages of QTN-VQC in terms of fewer model parameters and better empirical performance.

Models Params CE Acc ()
Dense-VQC 0.3011 91.27 0.25
AlexNet-VQC 0.2875 93.21 0.36
QTN-VQC 2816 0.2910 92.15 0.43
Table 7: Simulation results on the LFW test dataset under a noiseless circuit condition.
Models Params Acc () Acc ()
Dense-VQC 88.65 1.22 87.23 1.04
AlexNet-VQC 89.76 1.34 88.66 1.08
QTN-VQC 2816 89.93 1.09 89.64 1.07
Table 8: Empirical results on the LFW test dataset under the noisy quantum circuit setting.