1 Introduction
The stateoftheart machine learning (ML), particularly based on deep neural networks (DNN), has enabled a wide spectrum of successful applications ranging from the everyday deployment of speech recognition
(Deng et al., 2013)and computer vision
(Sermanet et al., 2014) through to the frontier of scientific research in synthetic biology (Jumper et al., 2021). Despite rapid theoretical and empirical progress in DNN based regression and classification (Goodfellow et al., 2016), DNN training algorithms are computationally expensive for many new scientific applications, such as new drug discovery (Smalley, 2017), which requires computational resources that are beyond the computational limits of classical hardwares (Freedman, 2019). Fortunately, the imminent advent of quantum computing devices opens up new possibilities of exploiting quantum machine learning (QML) (Biamonte et al., 2017; Schuld et al., 2015; Schuld and Petruccione, 2018; Schuld and Killoran, 2019; Saggio et al., 2021; Dunjko, 2021) to improve the computational efficiency of ML algorithms in the new scientific domains.Although the exploitation of quantum computing devices to carry out QML is still in its initial exploratory stages, the rapid development in quantum hardware has motivated advances in quantum neural networks (QNN) to run in noisy intermediatescale quantum (NISQ) devices (Preskill, 2018; Huggins et al., 2019; Huang et al., 2021; Kandala et al., 2017)
. A NISQ device means that not enough qubits could be spared for quantum error correction, and the imperfect qubits have to be directly used at the physical layer. Even though, a compromised QNN approach is proposed by employing hybrid quantumclassical models that rely on the optimization of variational quantum circuits (VQC)
(Benedetti et al., 2019; Mitarai et al., 2018). The resilience of the VQC based models to certain types of quantum noise errors and high flexibility concerning coherence time and gate requirements (McClean et al., 2018) admit many practical implementations of QNN on NISQ devices (Chen et al., 2020a; Yang et al., 2021; Du et al., 2020, 2021; Skolik et al., 2021; Dunjko et al., 2016; Jerbi et al., 2021; Ostaszewski et al., 2021). One notable limitation in the current QNN training pipeline is that the quantum embedding is not fully realizable in a quantum computer, which may impede the learning of the QNN. Hence, this work proposes QTNVQC to enable an endtoend trainable QNN, including data embedding to quantum measurements, that are easily realizable in quantum devices, where QTN stands for the quantum tensor network (Orús, 2019; Huckle et al., 2013; Biamonte et al., 2017; Murg et al., 2010) for generating quantum embedding.As shown in Figure 1, our QNN builds a unitary linear operator that consists of three main components: (1) quantum embedding generation; (2) variational quantum circuit; (3) measurement. Quantum embedding generation, also known as quantum encoding, applies a fixed unitary linear operator
transforming classical vectors
x to quantum states in a Hilbert space. This step is an important aspect of designing quantum algorithms that directly impact the entire computation cost of VQC and owns a characteristic of quantum superposition. Moreover, the VQC comprises two types of quantum gates: (1) ControlledNOT (CNOT) gates; (2) learnable parametric quantum gates. The CNOT gates ensure the property of quantum entanglement through mutually connecting the qubits, and the parametric quantum gates can be adjustable to best fit the quantum input states. The model parameters of VQC should be optimized by employing variants of gradient descent algorithms during the training process. Those parametric quantum gates of VQC are similar to the weights assigned to DNN, and such quantum circuits have been justified to be resilient to quantum noises (Farhi et al., 2014; Kandala et al., 2017; McClean et al., 2016). Besides, the measurement aims at projecting the quantum output states to one classical output .This work focuses on quantum embedding generation because it is quite related to the practical usage in machine learning applications in terms of computational cost and representation capability of classical input features. In particular, we design a novel quantum tensor network (QTN) for quantum embedding generation. More specifically, the QTN consists of a tensortrain network (TTN) for dimension reduction and a quantum tensor encoding framework for outputting quantum embeddings. The dimension reduction is a necessary procedure before the quantum encoding because only a small number of qubits could be supported on available NISQ computers at this moment. A typical approach for dimension reduction relies on a classical fullyconnected layer, also known as a dense layer, to convert highdimensional input vectors
y into lowdimensional ones x. However, since a dense layer cannot be physically mapped on a quantum computer, much overhead has to be incurred by frequently communicating between classical and quantum devices during the endtoend training pipeline.As shown in Figure 2 (b), one of our contribution is to leverage a tensor train network (TTN) to replace the dense layer in Figure 2 (a). The benefits of applying TTN arise from two aspects: (1) TTN can maintain the representation power of the dense layer, which will be justified in our theorems; (2) TTN is a tensor network and can be flexibly placed in quantum computers, which enables an endtoend training process fully conducted in a quantum computer. Moreover, in this work, a tensor product encoding (TPE) is delicately designed for generating quantum embedding, which builds the relationship between a classical vector x and the corresponding quantum state
; Besides, we further investigate the representation of QTNVQC in terms of model size and nonlinear activation function used in TTN. We denote a QTN as the combination of TTN and TPE and utilize QTNVQC as a genuine endtoend learning framework for QNN.
2 Related Work
The work (Schuld and Petruccione, 2018; Biamonte et al., 2017; Dunjko and Briegel, 2018) demonstrate that VQC shows great promise in surpassing the performance of classical ML. Prominent examples of VQC based models include quantum approximate optimization algorithm (QAOA) (Farhi et al., 2014), and quantum circuit learning (QCL) (Mitarai et al., 2018). Various architectures and geometries of VQC have been shown in tasks ranging from image classification (Henderson et al., 2020; Chen et al., 2020b; Kerenidis et al., 2020)
(Chen et al., 2020a).As for quantum embedding, basis encoding is the process of associating classical input data in the form of binary strings with the computational basis state of a quantum system (Leymann and Barzen, 2020). Similarly, amplitude encoding is a technique involving encoding data into the amplitudes of a quantum state (Soklakov and Schack, 2006). Unfortunately, the computational cost of both quantum embedding and amplitude encoding becomes exponentially expensive with the increasing number of qubits (Schuld and Killoran, 2019). A new technique of angle embedding makes use of the quantum gates to generate quantum states (Fu et al., 2011), but it cannot deal with the highdimensional feature inputs. Therefore, this work exploits the use of TTN for dimension reduction followed by a TPE for generating quantum embedding.
In particular, this work employs the TTN for dimensionality reduction. The TTN model based on TT decomposition in neural networks was first proposed in (Oseledets, 2011)
, and it could be flexibly extended the convolutional neural network (CNN)
(Garipov et al., 2016)and recurrent neural network (RNN)
(Tjandra et al., 2017). The empirical study of TTN on machine learning tasks shows that TTN is capable of maintaining the DNN baseline results (Qi et al., 2020a; Yu et al., 2017; Yang et al., 2017; Jin et al., 2020). However, to our best knowledge, no existing works have applied TTN to QML. Besides, since the tensor networkbased machine learning model like TTN is closely related to quantum machine learning in terms of their model structures (Liu et al., 2018; Gao et al., 2017), the QTNVQC model can be directly regarded as the classical simulation of the corresponding quantum machine learning. In addition to a classical dense layer, more complicated architectures like AlexNet (Lloyd et al., 2020) could be used for dimension reduction, and we also compare the performance between TTN and AlexNetbased models.3 Notations
We denote as a dimensional real coordinate space, and refers to a space of order tensors. The symbol represents a order multidimensional tensor in , and the symbols and represent a vector and a matrix, respectively.
For the notations of quantum computing, , the symbol denotes a quantum state associated with a dimensional vector in a Hilbert space. Particularly, and .
The quantum gate means a Pauli gate with a unitary operator as defined in Eq. (1), which implies a qubit rotates the Bloch sphere along the Yaxis by a given angle .
(1) 
Moreover, the operator is a tensor product. Given the vectors , the tensor product of vectors is defined as , which is a dimensional vector and can provide a compact representation for . Similarly, the symbol means a tensor product of quantum states of . Furthermore, for a scalar , the quantum state can be written as:
(2) 
4 QTNVQC: Our Proposed EndtoEnd Learning Framework
This section introduces our proposed endtoend learning framework, namely QTNVQC in this work. As shown in Figure 3, the QTN model includes two components (a) TTN and (b) TPE, which will be separately introduced in Section 4.1 and Section 4.2. Moreover, Figure 4 illustrates the framework of VQC and Section 4.3 is devoted to discussing the details of VQC.
4.1 Tensor Train Network for Dimension Reduction
We leverage TTN (Novikov et al., 2015) for the dimension reduction of input features. TTN relies on the TT decomposition (Oseledets, 2011) and has been commonly employed in machine learning tasks like speech processing (Qi et al., 2020b) and computer vision (Yang et al., 2017). The TT decomposition assumes that given a set of TTranks , a order tensor is factorized into the multiplication of order tensors . In more detail, given a set of indices , is decomposed as:
(3) 
where , . Since , the term is a scalar value.
TTN employs the TT decomposition in a dense layer and is explicitly demonstrated in Figure 3 (a). In more detail, for an input tensor and an output tensor , we achieve
(4) 
where , and which results in a scalar because of the ranks ; is closely associated with as defined in Eq. (3), if each index is set. The multidimensional tensor is decomposed into the multiplication of order tensors
. A nonlinear activation function, e.g., Sigmoid, Tanh, and ReLU, is imposed upon the tensor
. Compared with a dense layer with parameters, a TTN owns as few as trainable parameters.When a TTN is utilized for the dimension reduction, the highdimensional input vector is first reshaped into a tensor , and then we can represent as a TT format that goes through TTN. The outputs of TTN can be converted back to a tensor , which is further reshaped to a lower dimensional vector . Here, we define and . Moreover, the computational complexities of TTN and the related dense layer are in the same scale, which is discussed in (Yang et al., 2017).
Eq. (4) suggests that TTN is a multidimensional extension of a dense layer, where the trainable weight matrix of a dense layer is changed to the learnable core tensors. Additionally, many empirical studies demonstrate that a TTN is capable of maintaining the baseline results of the dense layer (Qi et al., 2020b; Yang et al., 2017; Novikov et al., 2015; Qi et al., 2020a). More significantly, since TTN can be flexibly mapped into a quantum circuit, the quantumness inherent in TTN brings great advantages over other architectures like the dense layer. In other words, although TTN is treated classically, it is possible to substitute equivalent quantum circuits for TTN when more qubits become available (Du et al., 2020), which implies that QTNVQC stands for a genuine endtoend QNN learning architecture on a quantum computer.
Furthermore, since the gradient exploding and diminishing problems are serious issues in the TTN training. To avoid those training problems, we only consider order core tensors and small TTranks to configure a simple TTN in our experimental simulations. Our theoretical analysis of QTNVQC based on Theorem 3 in Section 5 suggests that the representation power is not related to TTranks and the tensor order , thus small TTranks and the tensor order are preferred. In particular, a lower can significantly reduce the computational cost and speed up the convergence rate.
4.2 Tensor Product Encoding
In this subsection, we first introduce Theorem 1, and then we derive our TPE associated with the circuits in Figure 3 (b).
Theorem 1.
Given the classical vector , a TPE as shown in Figure 3 (b) can result in a quantum state with the following complete vector representation as:
(5) 
Proof.
Since each element in the vector x can be written as , the quantum state can be written as:
(6) 
When the vector x goes through the quantum tensor network, which implies the following as:
(7) 
The preceding equation, in turn, implies that Eq. (5). ∎
Theorem 1 builds a connection between the vector x and the quantum state , and the resulting is taken as the quantum embedding as the inputs to VQC. Since is a reversely unitary linear operator, there is no information loss incurred during the stage of quantum encoding. Furthermore, if the input is multiplied with a constant , we obtain the following term as:
(8) 
which corresponds to Figure 3 (b).
4.3 The Framework of Variational Quantum Circuit
The framework of VQC is shown in Figure 4 (a), where 4 qubit wires are taken into account, and the CNOT gates aim at mutually entangling the channels such that , , and lie in the same entanglement state. The PauliX, Y, Z gates , and with learnable parameters , , , are built to set up the learnable part. Being similar to the unitary operators of , and , which are defined in Figure 4 (b), are separately associated with the rotations along Xaxis and Zaxis by the given angles of and . Besides, the quantum circuits in the dash square can be repeatedly copied to compose a deeper architecture. The outputs of VQC are connected to the measurement which projects the quantum states into a certain quantum basis that becomes a classical scalar .
As for the endtoend training paradigm for QTNVQC, the learnable parameters come from the VQC and TTN models, and they should be updated by applying the backpropagation algorithm based on the Adam optimizer. Given qubits and depths, there are totally trainable parameters for VQC. Consequently, there are parameters for QTNVQC. On the other hand, the DenseVQC model possesses more model parameters than QTNVQC ( vs. ).
5 Characterizing Representation Power of QTNVQC
This section focuses on analyzing the representation power of QTNVQC. As shown in Figure 5, given qubits and a target quantum state , since is known as a linear operator and is defined as a definite mapping from input x
to the unitary matrix
, the representation power of QTNVQC is determined by how TTN can approximate the classical vector . To understand the expressiveness of TTN, we first start with the discussion on the expressive capability of DenseVQC (a dense layer is taken for dimension reduction) and then generalize it to QTNVQC. Based on the universal approximation theorem (Cybenko, 1989; Barron, 1994)for a feedforward neural network, we derive the following theorem as:
Theorem 2.
Given a target vector , there exists a feedforward neural network with a dense layer connecting to qubits, then
(9) 
where the activation function is imposed upon the dense layer, and is a constant associated with the target vector .
Since TTN is a compact TT representation of a dense layer, by modifying Theorem 2 for TTN, we can also derive the upper bound on the approximation error as follows:
Theorem 3.
Given a target vector , there exists a TTN, denoted as , with a TT layer connecting to qubits, then
(10) 
where , the Sigmoid activation function is imposed upon the TTN model, denotes the multidimensional order, is a constant associated with the target vector .
Comparing the two upper bounds, it is observed that TTN can attain an identical upper bound as the dense layer on the approximation error because . That implies that TTN can at least maintain the representation power of a dense layer. Besides, the number of qubits is a key factor determining the upper bound on the approximation error. However, is a small fixed number on a NISQ device, and a larger number of qubits is expected to further improve the representation power of QTNVQC. However, the computational costs of classical simulation may grow exponentially with the increasing number of qubits, and a small number of qubits have to be considered in practice.
6 Experiments and Results
6.1 Experimental setups
We assess our QTNVQC based endtoend learning system on the standard MNIST. MNIST is a dataset for the task of digit classification, where there are and image data assigned for training and testing, respectively. The full MNIST dataset is challenging for quantum machine learning algorithms, and many works only consider 2digit classification on the MNIST task (Wang et al., 2021; Chen et al., 2020b). Moreover, the image data are separately reshaped into
dimensional input vectors. DenseVQC and PCAVQC are taken as our experimental baselines to compare with our QTNVQC model. DenseVQC denotes that a dense layer is used for dimension reduction, and PCAVQC refers to using principal component analysis (PCA) to extract lowdimensional features before training the VQC parameters.
As for the experiments of QTNVQC, the image data are reshaped into 3order tensors. We set small TTranks as to reduce the computational cost of TTN. the image data are represented as the TT format according to Eq. (3) before going through the TTN model. Since qubits are used for the quantum encoding, the output of TTN needs to configure the tensor format as , which results in
dimensional output vectors. Besides, the model parameters of QTNVQC are randomly initialized based on the Gaussian distribution, and the backpropagation algorithm is applied to train the models. The Sigmoid function is utilized for the hidden layers of TTN.
To be consistent with QTNVQC, the weight of the dense layer for DenseVQC is configured as the shape of . Although DenseVQC is a hybrid classicalquantum model, the training process of DenseVQC can also be set as an endtoend pipeline and the weights of the dense layer are updated during the training stage. The Sigmoid function is used for the dense layer. On the other hand, PCA is employed to reduce the feature dimension to , and the resulting lowdimensional features are further encoded into quantum states. Consequently, PCAVQC admits the VQC parameters solely to be updated during the training stage. A standard AlexNet (Iandola et al., 2016) is employed to constitute an AlexNetVQC to compare the performance.
Moreover, 6 VQC layers are constructed to form a deep model, and the outputs of the VQC model are connected to classes with a nontrainable matrix. The backpropagation algorithm based on the Adam optimizer with a learning rate of is employed for the model training. The loss of crossentropy (CE) is utilized as the objective function during the training stage, and it is also taken as the metric to evaluate the model performance. We leverage the tools of Pennylane (Bergholm et al., 2018)
and PyTorch
(Paszke et al., 2019) to simulate the model performance. In particular, we separately simulate the model performance with noiseless quantum circuits and noisy quantum circuits corrupted by quantum noises from IBM quantum machines.6.2 Experimental Results of Noiseless Quantum Circuit
Table 1 shows the final results of the models on the test dataset. QTNVQC owns much fewer model parameters than DenseVQC ( vs. ) and attains even higher classification accuracy than DenseVQC ( vs. ) and lower loss values than DenseVQC ( vs. ). However, PCAVQC with trainable VQC parameters attains the worst performance by all metrics, which implies that a trainable quantum embedding is of significance to boost experimental performance. Although our empirical results cannot reach the stateoftheart classification performance of classical ML algorithms, our empirical results demonstrate the advantages of QTNVQC over the PCAVQC and DenseVQC counterparts. With the development of more powerful quantum devices supporting more qubits, the representation power of QTNVQC can be improved and better experimental results could be attained. Moreover, AlexNetVQC achieves better results than QTNVQC (), but it involves more model parameters than QTNVQC.
Models  Params  CE  Acc (). 

PCAVQC  144  0.5877  82.48 1.02 
DenseVQC  6416  0.4132  88.54 0.73 
AlexNetVQC  0.2562  92.81 0.47  
QTNVQC  328  0.3090  91.43 0.51 
6.3 Experimental Results of Noisy Quantum Circuit
To empirically validate the effectiveness of our proposed VQC algorithm, we proceed with the simulation of the practical experiments with noisy quantum circuits. More specifically, we follow an established noisy circuit experiment with the NISQ device suggested by (Chen et al., 2020a). One major advantage of the setups is to observe the robustness and preserve the quantum advantages of a deployed VQC with physical settings being close to quantum processing unit (QPU) experiments without an executive queuing time. As for the detailed setup, we first use an IBM Q 20qubit machine to collect channel noise in the real scenario for a deployed VQC and upload the machine noise into our PennylaneQiskit simulator (denoted as Acc. We provide a depolarizing noisy circuit simulation (denoted as Acc) based on a depolarizing channel attained from (Nielsen and Chuang, 2002) with a noise level of . As shown in Table 2, the quantum noise brings about the performance degradation of all models, but our proposed QTNVQC consistently outperforms PCAVQC and DenseVQC in the condition of noisy quantum circuits. In particular, QTNVQC can even outperform the AlexNetVQC counterpart in noisy circuit conditions.
Models  Params  Acc ()  Acc () 

PCAVQC  144  81.23 1.34  81.98 1.17 
DenseVQC  6416  84.55 1.22  86.09 1.04 
AlexNetVQC  87.46 1.34  87.86 1.08  
QTNVQC  328  88.12 1.09  89.32 1.07 
6.4 Further Discussions
The above experimental results show the advantages of QTNVQC over DenseVQC and PCAVQC in the scenarios with noiseless and noisy quantum circuits. Next, we will further discuss the representation power of QTNVQC based on two factors: (1) the activation function used in TTN; (2) the number of qubits.
6.4.1 The activation function used in TTN
Table 3 compares the results of QTNVQC based on different activation functions. Our simulation on noiseless quantum circuits shows that the nonlinear activation functions can bring more performance gain than a linear one, but the Sigmoid function attains a better performance than the Tanh and ReLU counterparts in our experiments. Our experiments also correspond to the universal approximation theory for QTNVQC in Theorem 3.
Models  CE  Acc (). 
QTNVQC (Linear)  0.4958  
QTNVQC (Tanh)  0.4792  
QTNVQC (ReLU)  0.3764  
QTNVQC (Sigmoid)  0.3090  91.43 0.54 
6.4.2 The number of qubits
Finally, we investigate the effects of the number of qubits on the performance of QTNVQC by increasing the qubits from to and . Accordingly, the output of TTN is configured as a tensor format of , and the model size is increased from to and , respectively. Our experiments show that the baseline performance of QTNVQC can be further improved by increasing the number of qubits, which implies that more qubits are likely to possess higher accuracy.
Models  Params  CE  Acc () 

QTNVQC (8 qubits)  328  0.3090  91.43 0.51 
QTNVQC (12 qubits)  464  0.2679  92.36 0.62 
QTNVQC (16 qubits)  600  0.2355  92.98 0.52 
7 Conclusions
This work proposes a genuine endtoend learning framework for quantum neural networks based on QTNVQC. QTN consists of a TTN for dimension reduction and a TPE framework for generating quantum embedding. The TTN model is a compact representation of a dense layer to classically simulate quantum machine learning algorithms. Our theorem on the representation of QTNVQC shows that the number of qubits is inversely related to the approximation error of QTNVQC and the nonlinear activation plays an important role. Our experiments compare our proposed QTNVQC with ResVQC, DenseVQC, and PCAVQC. Our simulated results demonstrate that QTNVQC obtains better experimental performance than DenseVQC and PCAVQC with both noiseless and noisy quantum circuits, and it achieves marginally worse performance than AlexNetVQC. Besides, our results justify our theorem on the representation power of QTNVQC.
References

Deng et al. (2013)
Li Deng, Jinyu Li, JuiTing Huang, Kaisheng Yao, Dong Yu, Frank Seide, Michael
Seltzer, Geoff Zweig, Xiaodong He, Jason Williams, et al.
Recent Advances in Deep Learning for Speech Research at Microsoft.
In IEEE International Conference on Acoustics, Speech and Signal Processing, pages 8604–8608, 2013.  Sermanet et al. (2014) Pierre Sermanet, David Eigen, Xiang Zhang, Michaël Mathieu, Rob Fergus, and Yann LeCun. Overfeat: Integrated Recognition, Localization and Detection Using Convolutional Networks. In International Conference on Learning Representations, 2014.
 Lloyd et al. (2020) Seth Lloyd, Maria Schuld, Aroosa Ijaz, Josh Izaac, and Nathan Killoran. Quantum Embeddings for Machine Learning. arXiv preprint arXiv:2001.03622, 2020.
 Iandola et al. (2016) Forrest N Iandola, Song Han, Matthew W Moskewicz, Khalid Ashraf, William J Dally, and Kurt Keutzer. SqueezeNet: AlexNetlevel Accuracy with 50x Fewer Parameters and¡ 0.5 MB Model Size. arXiv preprint arXiv:1602.07360, 2016.
 Liu et al. (2018) JinGuo Liu and Lei Wang. Differentiable Learning of Quantum Circuit Born Machines. Physical Review A, 98(6):062324, 2018.
 Gao et al. (2017) Xun Gao, Zhengyu Zhang, and Luming Duan. An Efficient Quantum Algorithm for Generative Machine Learning. arXiv preprint arXiv:1711.02038, 2017.
 Wang et al. (2021) Hanrui Wang, Yongshan Ding, Jiaqi Gu, Yujun Lin, David Z Pan, Frederic T Chong, and Song Han. Quantumnas: NoiseAdaptive Search for Robust Quantum Circuits. arXiv preprint arXiv:2107.10845, 2021.
 Jumper et al. (2021) John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, Russ Bates, Augustin Žídek, Anna Potapenko, et al. Highly Accurate Protein Structure Prediction with AlphaFold. Nature, 596(7873):583–589, 2021.
 Goodfellow et al. (2016) Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio. Deep Learning, volume 1. MIT Press, 2016.
 Smalley (2017) Eric Smalley. AIPowered Drug Discovery Captures Pharma Interest. Nature biotechnology, 35(7):604–606, 2017.
 Freedman (2019) David H Freedman. Hunting for New Drugs with AI. Nature, 576(7787):S49–S53, 2019.
 Biamonte et al. (2017) Jacob Biamonte, Peter Wittek, Nicola Pancotti, Patrick Rebentrost, Nathan Wiebe, and Seth Lloyd. Quantum Machine Learning. Nature, 549(7671):195–202, 2017.
 Schuld et al. (2015) Maria Schuld, Ilya Sinayskiy, and Francesco Petruccione. An introduction to quantum machine learning. Contemporary Physics, 56(2):172–185, 2015.
 Schuld and Petruccione (2018) Maria Schuld and Francesco Petruccione. Supervised Learning with Quantum Computers, volume 17. Springer, 2018.
 Schuld and Killoran (2019) Maria Schuld and Nathan Killoran. Quantum Machine Learning in Feature Hilbert Spaces. Physical review letters, 122(4):040504, 2019.
 Saggio et al. (2021) Valeria Saggio, Beate E Asenbeck, Arne Hamann, Teodor Strömberg, Peter Schiansky, Vedran Dunjko, Nicolai Friis, Nicholas C Harris, Michael Hochberg, Dirk Englund, et al. Quantum speedups in reinforcement learning. In Quantum Nanophotonic Materials, Devices, and Systems 2021, volume 11806, page 118060N. International Society for Optics and Photonics, 2021.
 Dunjko (2021) Vedran Dunjko. Inside quantum black boxes. Nature Physics, pages 1–2, 2021.
 Preskill (2018) John Preskill. Quantum Computing in the NISQ era and beyond. Quantum, 2:79, August 2018. ISSN 2521327X.
 Huggins et al. (2019) William Huggins, Piyush Patil, Bradley Mitchell, K Birgitta Whaley, and E Miles Stoudenmire. Towards quantum machine learning with tensor networks. Quantum Science and technology, 4(2):024001, 2019.
 Huang et al. (2021) HsinYuan Huang, Michael Broughton, Masoud Mohseni, Ryan Babbush, Sergio Boixo, Hartmut Neven, and Jarrod R McClean. Power of data in quantum machine learning. Nature communications, 12(1):1–9, 2021.
 Kandala et al. (2017) Abhinav Kandala, Antonio Mezzacapo, Kristan Temme, Maika Takita, Markus Brink, Jerry M Chow, and Jay M Gambetta. HardwareEfficient Variational Quantum Eigensolver for Small Molecules and Quantum Magnets. Nature, 549(7671):242–246, 2017.
 Benedetti et al. (2019) Marcello Benedetti, Erika Lloyd, Stefan Sack, and Mattia Fiorentini. Parameterized Quantum Circuits as Machine Learning Models. Quantum Science and Technology, 4(4):043001, 2019.
 Mitarai et al. (2018) Kosuke Mitarai, Makoto Negoro, Masahiro Kitagawa, and Keisuke Fujii. Quantum Circuit Learning. Physical Review A, 98(3):032309, 2018.
 McClean et al. (2018) Jarrod R McClean, Sergio Boixo, Vadim N Smelyanskiy, Ryan Babbush, and Hartmut Neven. Barren Plateaus in Quantum Neural Network Training Landscapes. Nature Communications, 9(1):1–6, 2018.
 Chen et al. (2020a) Samuel YenChi Chen, ChaoHan Huck Yang, Jun Qi, PinYu Chen, Xiaoli Ma, and HsiSheng Goan. Variational Quantum Circuits for Deep Reinforcement Learning. IEEE Access, 8:141007–141024, 2020a.

Yang et al. (2021)
ChaoHan Huck Yang, Jun Qi, Samuel YenChi Chen, PinYu Chen, Sabato Marco
Siniscalchi, Xiaoli Ma, and ChinHui Lee.
Decentralizing Feature Extraction with Quantum Convolutional Neural Network for Automatic Speech Recognition.
In IEEE International Conference on Acoustics, Speech and Signal Processing, pages 6523–6527, 2021.  Du et al. (2020) Yuxuan Du, MinHsiu Hsieh, Tongliang Liu, and Dacheng Tao. Expressive power of parametrized quantum circuits. Physical Review Research, 2(3):033125, 2020.

Du et al. (2021)
Yuxuan Du, MinHsiu Hsieh, Tongliang Liu, Dacheng Tao, and Nana Liu.
Quantum noise protects quantum classifiers against adversaries.
Physical Review Research, 3(2):023153, 2021.  Skolik et al. (2021) Andrea Skolik, Sofiene Jerbi, and Vedran Dunjko. Quantum agents in the gym: a variational quantum algorithm for deep qlearning. arXiv preprint arXiv:2103.15084, 2021.
 Dunjko et al. (2016) Vedran Dunjko, Jacob M Taylor, and Hans J Briegel. Quantumenhanced machine learning. Physical review letters, 117(13):130501, 2016.
 Jerbi et al. (2021) Sofiene Jerbi, Casper Gyurik, Simon Marshall, Hans J Briegel, and Vedran Dunjko. Variational quantum policies for reinforcement learning. arXiv preprint arXiv:2103.05577, 2021.
 Ostaszewski et al. (2021) Mateusz Ostaszewski, Lea M Trenkwalder, Wojciech Masarczyk, Eleanor Scerri, and Vedran Dunjko. Reinforcement learning for optimization of variational quantum circuit architectures. arXiv preprint arXiv:2103.16089, 2021.
 Orús (2019) Román Orús. Tensor networks for complex quantum systems. Nature Reviews Physics, 1(9):538–550, 2019.
 Huckle et al. (2013) Thomas Huckle, Konrad Waldherr, and Thomas SchulteHerbrüggen. Computations in quantum tensor networks. Linear Algebra and its Applications, 438(2):750–781, 2013.
 Murg et al. (2010) Valentin Murg, Frank Verstraete, Örs Legeza, and Reinhard M Noack. Simulating strongly correlated quantum systems with tree tensor networks. Physical Review B, 82(20):205105, 2010.
 Farhi et al. (2014) Edward Farhi, Jeffrey Goldstone, and Sam Gutmann. A Quantum Approximate Optimization Algorithm. arXiv preprint arXiv:1411.4028, 2014.
 McClean et al. (2016) Jarrod R McClean, Jonathan Romero, Ryan Babbush, and Alán AspuruGuzik. The Theory of Variational Hybrid QuantumClassical Algorithms. New Journal of Physics, 18(2):023023, 2016.

Dunjko and Briegel (2018)
Vedran Dunjko and Hans J Briegel.
Machine learning & artificial intelligence in the quantum domain: A review of recent progress.
Reports on Progress in Physics, 81(7):074001, 2018.  Henderson et al. (2020) Maxwell Henderson, Samriddhi Shakya, Shashindra Pradhan, and Tristan Cook. Quanvolutional neural networks: powering image recognition with quantum circuits. Quantum Machine Intelligence, 2(1):1–9, 2020.
 Chen et al. (2020b) Samuel YenChi Chen, ChihMin Huang, ChiaWei Hsing, and YingJer Kao. Hybrid QuantumClassical Classifier Based on Tensor Network and Variational Quantum Circuit. arXiv preprint arXiv:2011.14651, 2020b.
 Kerenidis et al. (2020) Iordanis Kerenidis, Jonas Landman, and Anupam Prakash. Quantum Algorithms for Deep Convolutional Neural Networks. In Proc. International Conference on Learning Representations, 2020.
 Leymann and Barzen (2020) Frank Leymann and Johanna Barzen. The Bitter Truth About GateBased Quantum Algorithms In the NISQ Era. Quantum Science and Technology, 5(4):044007, 2020.
 Soklakov and Schack (2006) Andrei N Soklakov and Rüdiger Schack. Efficient State Preparation for A Register of Quantum Bits. Physical review A, 73(1):012307, 2006.

Fu et al. (2011)
Yangguang Fu, Mingyue Ding, and Chengping Zhou.
Phase AngleEncoded and QuantumBehaved Particle Swarm Optimization Applied to ThreeDimensional Route Planning for UAV.
IEEE Transactions on Systems, Man, and CyberneticsPart A: Systems and Humans, 42(2):511–526, 2011.  Oseledets (2011) Ivan V Oseledets. TensorTrain Decomposition. SIAM Journal on Scientific Computing, 33(5):2295–2317, 2011.
 Garipov et al. (2016) Timur Garipov, Dmitry Podoprikhin, Alexander Novikov, and Dmitry Vetrov. Ultimate tensorization: compressing convolutional and FC layers alike. arXiv preprint arXiv:1611.03214, 2016.
 Tjandra et al. (2017) Andros Tjandra, Sakriani Sakti, and Satoshi Nakamura. Compressing Recurrent Neural Network with TensorTrain. In Proc. International Joint Conference on Neural Networks, pages 4451–4458, 2017.
 Qi et al. (2020a) Jun Qi, Hu Hu, Yannan Wang, ChaoHan Huck Yang, Sabato Marco Siniscalchi, and ChinHui Lee. Exploring deep hybrid tensortovector network architectures for regression based speech enhancement. arXiv preprint arXiv:2007.13024, 2020a.
 Yu et al. (2017) Rose Yu, Stephan Zheng, Anima Anandkumar, and Yisong Yue. Longterm forecasting using TensorTrain RNNs. Arxiv, 2017.
 Yang et al. (2017) Yinchong Yang, Denis Krompass, and Volker Tresp. TensorTrain Recurrent Neural Networks for Video Classification. In International Conference on Machine Learning, pages 3891–3900, 2017.
 Jin et al. (2020) Xuanyu Jin, Jiajia Tang, Xianghao Kong, Yong Peng, Jianting Cao, Qibin Zhao, and Wanzeng Kong. CTNN: A Convolutional TensorTrain Neural Network for MultiTask Brainprint Recognition. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 29:103–112, 2020.
 Novikov et al. (2015) Alexander Novikov, Dmitry Podoprikhin, Anton Osokin, and Dmitry Vetrov. Tensorizing Neural Networks. In Advances in Neural Information Processing Systems. 2015.
 Qi et al. (2020b) Jun Qi, Hu Hu, Yannan Wang, ChaoHan Huck Yang, Sabato Marco Siniscalchi, and ChinHui Lee. TensortoVector Regression for MultiChannel Speech Enhancement Based on TensorTrain Network. In IEEE International Conference on Acoustics, Speech and Signal Processing, pages 7504–7508, 2020b.
 Cybenko (1989) George Cybenko. Approximation by Superpositions of A Sigmoidal Function. Mathematics of control, signals and systems, 2(4):303–314, 1989.

Barron (1994)
Andrew R Barron.
Approximation and Estimation Bounds for Artificial Neural Networks.
Machine learning, 14(1):115–133, 1994.  Kingma and Ba (2014) Diederik P Kingma and Jimmy Ba. Adam: A Method for Stochastic Optimization. arXiv preprint arXiv:1412.6980, 2014.
 Bergholm et al. (2018) Ville Bergholm, Josh Izaac, Maria Schuld, Christian Gogolin, M Sohaib Alam, Shahnawaz Ahmed, Juan Miguel Arrazola, Carsten Blank, Alain Delgado, Soran Jahangiri, et al. Pennylane: Automatic Differentiation of Hybrid QuantumClassical Computations. arXiv preprint arXiv:1811.04968, 2018.
 Paszke et al. (2019) Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An Imperative Style, HighPerformance Deep Learning Library. Advances in neural information processing systems, 32:8026–8037, 2019.
 Nielsen and Chuang (2002) Michael A Nielsen and Isaac Chuang. Quantum computation and quantum information, 2002.
Appendix A Appendix
a.1 Proof for Theorem 2
Proof.
Theorem 2 is derived from the modification of the universal approximation theory proposed by [55, 54]. The universal approximation theory is shown in Lemma 1, which suggests that a feedforward neural network with neurons can approximate any continuous function with arbitrarily small .
Lemma 1.
Given a continuous target function , we can employ a 2layer neural network with a nonlinear activation , such that
(11) 
where denotes the number of neurons, and is a constant associated with . In particular, for , satisfies the following condition as:
(12) 
where .
a.2 Proof for Theorem 3
Proof.
Assume that , and the TT decomposition of target vector is , then we obtain
(14) 
On the other hand, we denote and as the vectorization of the tensors and , respectively. We also define , as the TTN parameters, and also define as the matricization of . Moreover, refers to a nonlinear activation function.
Since that corresponds to a dense layer, we can obtain that
(15) 
In sum, we can further obtain
(16) 
where . ∎
Appendix B Appendix
This section includes additional experimental simulations. First, we assess the settings of TTranks, and then we compare the convergence rates of QTNVQC and DenseVQC in the experiments.
b.1 Experiments on TTranks for QTNVQC
Table 5 corresponds to the experiments of QTNVQC with qubits and the Sigmoid function. The empirical results suggest that the larger TTranks cannot result in better results than the smaller ones. The main reason is that the TTranks can correspond to a manifold, and there may potentially exist an optimal manifold with smaller TTranks that corresponds to the best performance.
TTranks  Params  CE  Acc () 

{1, 2, 2, 1}  328  0.3090  91.43 0.51 
{1, 4, 4, 1}  768  0.3082  91.46 0.53 
{1, 6, 6, 1}  1464  0.3079  91.47 0.52 
b.2 A comparison of convergence rates
Next, we analyze the computational complexity for TTN for QTNVQC. In more detail, given the TTranks , a multidimensional tensor is factorized into several order tensors , the computational complexity of the feedforward process is in the scale of . In contrast, the computational overhead for a dense layer is in the scale of . It means that smaller TTranks can reduce the computational cost for QTNVQC, which explains that smaller TTranks is configured in our experiments of QTNVQC.
Empirically, we compare the convergence rates of different models on the test data in our experiments. In our experimental settings with the Tanh activation function and qubits, the QTNVQC model consistently attains a faster convergence rate than the DenseVQC and PCAVQC counterparts. Moreover, Table 6 compares the absolute running time of QTNVQC with DenseVQC and AlexNetVQC. Since our experiments are conducted on the same GPUs and CPUs, the training time of all models can be comparable. Our evaluation shows that QTNVQC is marginally slower than DenseVQC, but it is much faster than AlexNetVQC.
Models  DenseVQC  AlexNetVQC  QTNVQC 

Time/epochs (mins) 
58  75  61 
Appendix C Experiments of Labeled Faces in the Wild (LFW)
c.1 Experimental setups
The LFW is a dataset for the task of unconstrained face recognition, which is composed of
images with the shape of . The shape of We randomly split all the datasets into training data, test data. qubits are used for VQC, and the shape of the input tensor is set as . The other settings are kept the same as the configurations for the MNIST task.c.2 Experimental results
Table 6 presents the simulation results under the noiseless quantum circuit condition, while Table 7 demonstrates the empirical results in the setting of noisy quantum circuits. The QTNVQC outperforms the DenseVQC counterpart ( vs. ), and it owns much fewer model parameters ( vs. ). The experimental results on the LFW dataset also highlight the advantages of QTNVQC in terms of fewer model parameters and better empirical performance.
Models  Params  CE  Acc () 

DenseVQC  0.3011  91.27 0.25  
AlexNetVQC  0.2875  93.21 0.36  
QTNVQC  2816  0.2910  92.15 0.43 
Models  Params  Acc ()  Acc () 

DenseVQC  88.65 1.22  87.23 1.04  
AlexNetVQC  89.76 1.34  88.66 1.08  
QTNVQC  2816  89.93 1.09  89.64 1.07 
Comments
There are no comments yet.