Machine Learning (ML) and Deep Neural Networks (DNN) have gained tremendous interest on the verge of technological developments and the availability of an abundance of data. The High Energy Physics (HEP) community have long been engaged with processing vast amounts of data generated by collider experiments like the Large Hadron Collider (LHC). Recent advancements in Neural Network (NN) technology allowed physics-driven analytic analysis methods to evolve into data-driven statistical approaches, which transformed the ability and accuracy of data analysis. The emergence of quantum computers has introduced another step of evolution in this formidable avenue.
Various quantum algorithms aim to tackle challenging tasks in optimisation problems and improve the interpretability of classical NNs applied to high-energy physics data. Such algorithms include but are not limited to simulations of collision events [hepsimu, helicityamp, Carena:2021ltu, Williams:2021lvr, li2021partonic, Bravo-Prieto:2021ehz], reconstruction of the charged tracks [das2020track, qtrkreco, magano2021quantum, tysz2021hybrid], and event classification analyses [Terashi:2020wfi, Chen:2020zkj, Wu:2020cye, Guan:2020bdl, Blance:2020ktp, 2021andrew, Chen:2021ouz, Belis:2021zqi, Araz:2021un]. Despite the exceptional interest in quantum computation methods, there are still many open questions regarding the advantages of the quantum age [doi:10.1098/rspa.2017.0551].
There is particular interest in using the quantum paradigm in ML and optimization tasks, as the number of qubits needed is often already available in noisy intermediate-scale quantum (NISQ) devices, and error mitigation might be less of a concern. Quantum NNs have various advantages over classical NNs[Schuld2020, Blance:2020nhl, Eisert2021, Roche2021, Abel:2021fpn, Ngairangbam:2021yma], such as faster convergence and significantly better performance with the same network structure. Quantum algorithms achieve this by representing the correlations between input features within the quantum entanglement paradigm, providing a much richer data representation.
Tensor Networks (TN) are widely used to simulate strongly entangled quantum systems [Orus:2018dya, 10.5555/2011832.2011833], and they can represent both quantum states and circuits [Shi_2006, Vidal_2008, Verstraete_2008]
. Due to this property, TNs form a natural bridge between classical and quantum computation methods. In the context of high-energy physics data analysis, Tensor Networks have recently been used for data originating from collider experiments. TN properties can be used to classify b-jets produced at the LHCb experiment where tabular data is analysed using Tree Tensor Network (TTN) architecture[trenti2020quantuminspired]. Moreover, ref. [Araz:2021un]
shows that Matrix Product States (MPS) can achieve state-of-the-art convolutional neural network (CNN) accuracies in top tagging via calorimeter images. By employing entanglement entropy information between MPS’ tensor-blocks the same accuracy can be achieved with only 54% of the pixels in a given calorimeter image.
Given the ability to represent both NNs and quantum many-body systems, it is only natural to use TN-inspired quantum circuit architecture to transfer our classical knowledge on quantum hardware [Grant:2018vv, Huggins_2019, 10.3389/fphy.2020.586374, lazzarin2021multiclass, bhatia2019matrix, PhysRevResearch.2.033125, HUANG202189]. Besides the high accuracy rates, TN-based quantum circuits can lead to more robust results against noise in the near term quantum computers. Whilst this opens up an entirely new circuit design, near term quantum devices are still limited to a small number of qubits, restricting the usage of multimodal datasets. Hybrid classical-quantum TNs can be utilised to eliminate this issue to design end-to-end training sequences with classical data mapping and quantum classification layers [liu2020quantumclassical, chen2020hybrid]. This allows much larger datasets to be embedded in the optimisation process, and due to the duality of TNs, as more qubits are available, classical nodes can be transformed into quantum nodes to harness the full potential of the quantum hardware.
This study investigates the usage of TN-based quantum variational circuits for top jet discrimination from the QCD background in calorimeter images. We have investigated three different architectures, namely Matrix Product States (MPS), Tree Tensor Networks (TTN) and Multi-scale Entanglement Renormalisation Ansatz (MERA) as quantum circuits and compared the results with their classical counterparts for one-dimensional data embedding. Our results have shown that TNs require exponentially more trainable parameters with increasing qubit structure to achieve the same performances as their quantum counterparts, leading to computationally expensive network architectures for machine learning applications. We observed that the classical TNs require exponentially large bond dimensions to capture the same entanglement structure as the QTNs, leaving the stochastic gradient descent methods inefficient for optimising the tensor nodes. This has been improved by employing more extensive Hilbert space mapping of the input features, indicating that classical TNs require more information to represent the same data as well as QTNs. We present detailed numerical results to study quantum mechanical differences between TNs and QTNs. In the following, we investigate possible avenues for hybrid classical-quantum TN architectures, allowing a more extensive phase space to be used in the network.
This study is organised as follows, in Sec. missing we introduce TNs and QTNs as methods to perform machine learning tasks. The results have been discussed in Sec. missing, where we first introduce the dataset and preprocessing in Sec. missing and then numerical analysis are presented in sections 3.2.1 and 3.2.2.
2 Bridge between classical & quantum Machine Learning
Tensors are multidimensional objects that describe multilinear relations between algebraic entities defined on a particular vector space. This study employs “tensor diagram notation” to formalize the relation between tensors where a scalar is shown as a node (shown as a blue circle throughout this study), and each rank is shown with an external line to a given node[penrose1971applications, Orus:2013kga, Bridgeman:2016dhh].
TNs are defined as a series of Einstein summations indicating the connections between tensor nodes, forming a graph of tensors. Unlike traditional graph networks, however, where each connection indicates the “coupling strength” between nodes, connections between TNs suggest the correlation between a node and the rest of the network and limits the range of entanglement between nodes. These connections between tensor nodes are called bond (or auxiliary) dimensions, shown as . Due to the computational cost of contraction of a TN, one can only form specific architectures of TNs that can be contracted efficiently.
As mentioned earlier, TNs mainly designed to study many-body quantum systems [Verstraete:2004cf] where one-dimensional lattice systems have been extensively studied, and many efficient architectures and contraction algorithms have been developed. A wave function for a one-dimensional lattice with number of states can be written as
where are spin states spanning an -dimensional Hilbert space, , identifies the location of the state within the lattice and is a rank- amplitude tensor indicating the bond structure between each state. Top of the Fig. 1 shows in tensor diagram notation where each Hilbert space dimension has been shown by green lines (for simplicity, only seven of them are shown on the image). Each green line represents a two-dimensional vector for a lattice of spin states where outer products form . The computational complexity of simulating such objects for spin states is , which grows exponentially with each additional state. However, it is possible to decompose this monstrous beast in smaller tensors that can efficiently represent the original tensor whilst reducing the computational cost of the object111 Various tensor decomposition methods can be employed to achieve this, such as singular value decomposition
Various tensor decomposition methods can be employed to achieve this, such as singular value decomposition[10.1093/qmath/11.1.50, Eckart:1936va]..
It is, perhaps, the most intuitive to decompose a one-dimensional lattice into MPS [Fannes:1992vq, Klumper:1992vi, doi:10.1137/090752286, Bridgeman:2016dhh, 10.5555/2011832.2011833, Orus:2013kga, Verstraete_2006, Hastings:2007iok, Chen:2010gda, Schollw_ck_2011], where each state is entangled to the rest of the lattice through the adjacent states222For a detailed analysis on MPS in the context of HEP and machine learning, see ref. [Araz:2021un].. This allows a highly accurate approximation of with only parameters. Although this limits the entanglement range, MPS is a potent tool for simulating locally entangled states. The right branch of the classical portion of Fig. 1 shows the representation of MPS in tensor diagram notation where each blue node represents a rank-3 tensor except for the ones on the edges, forming only rank-2 tensors. By utilising periodic boundary conditions, one can write a circular network where all nodes are rank-3 tensors. As mentioned before, green lines represent the Hilbert space dimensions where each blue tensor is connected to a state, , on the lattice. Red lines between tensors represent the auxiliary dimensions, , where the size of determines the precision of the network by resizing the influence of each state to the rest of the lattice. Although a large bond dimension allows a more precise state representation, it also increases the computational cost of this representation. Hence, different architectures have been proposed to simulate more richly entangled lattice simulations.
Hierarchical TNs can be employed to capture a relatively more complex correlation structure. TTN is one of the widely used architectures in this avenue [Shi_2006]333For a detailed analysis on TTN in the context of HEP and machine learning, see ref. [trenti2020quantuminspired].. TTNs are constructed with feature condensing nodes where two or more vectors are condensed into a single vector. This allows neighbouring states to be mapped into a higher dimensional representation at each step. In machine learning (ML) terminology, each node can be seen as a local pooling layer. The middle branch of the classical portion of Fig. 1 shows TTN in tensor diagram notation. Each green line shows the physical dimensions where neighbouring states are collected into one node and then mapped into a higher dimensional vector shown as red lines, the bond dimensions. Such a hierarchical structure allows each state to have more extensive entanglement allowing them to be entangled with further states on a higher-dimensional manifold. MPS, in this picture, can be classified as a maximally anti-symmetrised TTN.
Further down the rabbit hole, Multi-scale Entanglement Renormalisation Ansatz (MERA) can be used to embed arbitrarily large entanglement structures into the network [Vidal_2008]. The left branch of Fig. 1 shows MERA in tensor diagram notation. Although the figure shows a mixture of rank-4 transformation and rank-3 condenser nodes, MERA does not necessarily need condenser nodes; here, to simplify its application for classification, we mixed MERA with TTN. A different MERA architecture for classification can be found in ref. [reyes2020multiscale] where after several transformation layers with rank-4 tensors, authors reduced the dimensionality via an MPS layer. Each rank-4 tensor plays the role of transformation. In quantum field theory, such nodes have been employed to embed specific known symmetries into the network [Evenbly:2011tw]. Although such a network embodies much higher entanglement between states than MPS and TTN, the computational cost is much higher due to the loops that it forms within the architecture.
Tensor Networks exhibit similarities with NNs [cohen2016convolutional, qentindl] where NN layers can be represented as TN layers, leading to more efficient and compressed network structures with similar or better outcomes [garipov2016ultimate]. TN architectures and quantum-inspired training algorithms can also be independently used as optimisation ansatz. In particular, specialised MPS training techniques can be used for classification tasks to achieve similar results to the state-of-the-art NN results [stoudenmire2017supervised, novikov2017exponential, selvan2020tensor, efthymiou2019tensornetwork, xu2021tensortrain]. TTN has also been used for quantum assisted classification problems [PhysRevA.104.042408, liu2018machine, ttn2019] and MERA has been studied as symmetry embedding layer to a classifier [reyes2020multiscale, kong2021quantum]. Beyond optimisation problems, TNs can improve our understanding of training procedures and network structure due to their widely studied theoretical foundation [martyn2020entanglement].
For a given ML application, TNs require the feature space of a given data to be mapped onto a spin state lattice. An -dimensional feature space, , can be written as outer products of -dimensional vectors via a mapping function defined as , where for a spin state [stoudenmire2017supervised, Araz:2021un]. Hence the feature space takes the following form;
Here each state is written as supper-position of spin states, , with being feature dependent coefficients, forming an -dimensional Hilbert space. The correlation between each state, in this form, is controlled by the amplitude tensor, , given in eqn. (1). The amplitude tensor represents the architecture of the TN, and depending on the decomposition sequence, it can be written as any TN architecture; for example, see Fig. 1.
In a classification task, TNs produce the output probability
where stands for the contraction of the network with the given th data and
represents the output label. Traditional ML applications require each layer to be embedded into an activation function such as ReLu or sigmoid to capture the nonlinear properties of a given data. However, TNs can capture the nonlinearity without any activation function, ensuring an utterly linear network structure.consist of trainable parameters where each parameter,
, can be optimized with respect to a given loss function,,
Here represents the truth label of a given th data. The representation of the statistical data is the main topic of any ML application.
Entanglement entropy [Eisert:2008ur, PhysRevResearch.2.033125], so-called von Neumann entropy, can be employed as a metric to evaluate the expressiveness of TN states. For a bipartite system of TN, , the entanglement entropy is defined as
where is the reduced density matrix of the system . In a quantum many-body application a non-degenerate pure ground state, , is expected to have vanishing entanglement entropy. For a quantum system that satisfies the area (volume) law, the entanglement entropy is bounded by the system’s area (volume), which implies that the entanglement entropy is proportional to the number of states . The maximum entanglement entropy of an MPS, on the other hand, is limited by its bond dimension, , hence limiting its maximum entanglement entropy to be proportional to . Thus, the required number of parameters in an MPS to represent a quantum system is exponentially large.
Similarly, MERA has been designed to have the intrinsic property of volume law where its maximum entanglement entropy is bounded by . Here, denotes the surface area of a TN graph [Eisert:2008ur]. Whilst both realisations provide an efficient approximation of a highly entangled system, each has a particular limitation that can significantly increase the computational complexity of the network for an ML application.
Although the entanglement entropy represents the potential of a given ansatz in terms of representation of the underlying data, the trainability of a given model is also an important aspect. In order to efficiently use gradient-based methods, the optimization landscape is required not to be flat. One of the biggest challenges of QML applications are barren plateau where the gradient of the loss function is exponentially suppressed; hence it is highly challenging to train the model using gradient-based methods [McClean:2018um]. However, the Fisher information matrix can be used to quantify the trainability of a given ansatz which shows the information gained by a given parametrization ansatz [martens2020new, berezniuk2020scaledependent, Abbas:2021wp]
. For a given probability distribution in eqn. (3
), the mean Fisher information is calculated as the variance of the partial derivative with respect to the model parameters,, of the log-likelihood
where is the training sample. , for
-parameters, forms a Riemannian metric capturing the sensitivity of the ansatz with respect to the change in the parameters. In a classical network, the eigenvalue distribution of the normalisedis mostly degenerate around zero with rare large values [pmlr-v89-karakida19a]. Such distribution shows that the ansatz is not sensitive to the change of most of the parameters.
Based on Fisher information, ref. [Abbas:2021wp] has introduced a measure of normalised effective dimensions
where is normalised Fisher information matrix. is the volume of the parameter space . At the limit, of any given ansatz converges to one, but the convergence of an ansatz is slowed down by small and uneven eigenvalues of normalised Fisher matrix.
A quantum algorithm is designed through a network of quantum gates to compute specific tasks. The availability of Quantum Machine Learning (QML) methods triggers the question of whether there is a viable quantum gate architecture to harness quantum advantage for ML applications. TNs can be represented as quantum circuits and hence can pose as a viable option for a variational quantum circuit (VQC) [Huggins_2019]. Due to the theory built to understand the training and the network structure of TNs, it poses as a powerful variational circuit option for ML applications [Grant:2018vv].
An MPS-inspired quantum variational circuit (Q-MPS) can be written by applying a set of unitary transformations to the initial adjacent two-qubit system. Each following two-qubit transformation block takes the last output qubit from a previous block and entangles it to the following qubit. For a six-qubit input, a Q-MPS construction is shown on the right branch of the bottom panel of the Fig. 1. Each blue transformation block represents two unitary transformations, as shown in Fig. 2, followed by a CNOT gate to entangle two states where a generic unitary transformation and the CNOT gate are expressed as
Here and are trainable variables. This definition forms the minimal Q-MPS construction. The block structure can also be enhanced via multiple CNOT gates, or auxiliary qubits can be introduced between circuit blocks to increase the bond dimension. Additionally, blocks allowing more qubits has also been proposed in other studies (e.g. see ref. [Huggins_2019]). To get the classification output, one needs to measure the expectation value of the last entangled qubit,
where is the given quantum circuit (QC) constructed by unitarities, , with a set of free parameters, , and is a single-qubit operator which we will choose as the third Pauli matrix, . As before, for a classification task, one can compute the probability of measuring the collapse of the system on a certain state,
and trainable parameters of the system, , can be optimised by minimising eqn. (4).
Similarly, a TTN-inspired quantum variational circuit (Q-TTN) can be constructed by applying a unitary block to each set of adjacent qubit. One can then discard one of the qubits and apply another unitary block to each remaining adjacent qubit. By repeating this process, one can connect the qubits in a hierarchical structure to perform a measure on the top-level qubit. The middle branch of the bottom panel of Fig. 1 shows a six-qubit representation of Q-TTN.
The MERA-inspired quantum variational circuit (Q-MERA) is closely related to Q-TTN, but a set of additional unitaries has enhanced the network ahead of each Q-TTN layer. These additional layers allow the circuit to capture more enhanced correlations between qubits that have a similar role in classical MERA.
Each of the QC representations given in Fig. 1 show an initial state of . However, the initial state of quantum hardware is ; hence a data encoding process has to take place. This can be achieved in two ways; the data can be encoded in each individual qubit amplitude (qubit encoding) or by encoding the data on the entangled state amplitude (amplitude encoding) [MLQC]. In this study, we will use qubit encoding by rotating each qubit on the y-axis with respect to the input values where the rotation is defined as , as defined in eqn. (5).
In the following section, we will demonstrate the usage of these architectures in the context of top tagging against QCD background. We have proposed two types of network ansatz where first, we will employ purely TN-inspired VQC, and then we will introduce a hybrid ansatz where the data is processed by a classical TN before passing it to the VQC for classification.
3 Top tagging through Quantum Tensor Networks
The nature of electroweak symmetry breaking (EWSB) has yet to unfold. Due to the sizeable corresponding production cross-sections, currently, there are billions of top quarks produced at the LHC. With improved capabilities at high luminosity and high energy LHC in the coming years, the top physics will move to an even higher differential precision era. The large mass of the top quark gives it a unique property of high coupling to the Higgs boson. Accessing the top quark’s properties and coupling strength can bring us closer to understanding the nature of EWSB. However, the measurement of top quark’s properties has been obscured mainly due to high levels of background originating from QCD radiation, limiting the analyses to relatively cleaner leptonic final states of the top production. This significantly reduces the production cross-section hence the sensitivity to its properties. In turn, this led the HEP community to investigate the internal structure of jets collimated sprays of radiation [Marzani:2019hun]
, where various analytical reconstruction techniques have been developed to understand the substructure of jets originated from different particles. This enabled a sophisticated, precise theoretical understanding of the plethora of highly complex data which can not be found in any field of science. With the dawn of the deep learning era, these attempts have been shifted towards data-driven analyses.
At the LHC, particularly ATLAS and CMS experiments, the hadronic decay channels of the top quark have been widely investigated. The so-called jet objects, including the information for hadronic activity, deposit their energy in the electronic and hadronic calorimeters in these experiments. These calorimeters can be interpreted as a pixellated cylindrical camera recording the transverse momentum of the radiation shower.
As a demonstration of the capabilities of these networks, we choose to study the discrimination of top quarks against QCD background images reconstructed from energy deposits of the constituents in the electromagnetic and hadronic calorimeters in ATLAS experiment. Calorimeter images provide a natural correlation between pixels, making entangled states highly advantageous for classification processes. Additionally, CNN architectures built to classify jet images have been shown to be highly successful in discriminating gluons from quarks or tops from QCD backgrounds. In this study, we use these images by mapping them onto a relevant form for the given Tensor Network (quantum and classical) to process and classify. The code implementation of this study can be found at this link444https://gitlab.com/jackaraz/tnqcircuits.
3.1 Dataset & Preprocessing
In this study, the usage of QTN classifiers has been demonstrated via the dataset provided in [Kasieczka:2019dbj, kasieczka_gregor_2019_2603256], often used for benchmarking classification algorithms. It contains collider events for hadronic tops and QCD jets at TeV. Following the event generation and parton shower in Pythia 8 [Sjostrand:2014zea]. The detector simulation has been implemented by the means of Delphes 3 package [deFavereau:2013fsa] using its default ATLAS configuration card. In order to capture collimated boosted top topology, jets are reconstructed via anti-kT algorithm [Cacciari:2008gp] with embedded in FastJet [Cacciari:2011ma] package. These jets are also required to be within GeV and bounded by . Top jets are tagged via parton matching using jet radius as the boundary for the angular separation between jet and the parton level top quark. The dataset includes 1.2 million training, 400,000 separate test and validation samples, respectively.
The jet images are prepared and standardized by following the prescription given in refs. [Araz:2021wp, Araz:2021un] where the constituents of the leading jet, defined above, has been centred on a weighted centroid in - plane. All the modified constituents of the leading jets have been mapped into a pixelated frame in - plane spanning within range. Finally, the most energetic image quadrant has been moved to the top right by horizontally and vertically flipping the image. All training samples have been standardized by fitting the pixel values within range for 200,000 mixed-signal and background samples. Fig. 3 shows the standardized average of 5000 events for top signal (left panel) and QCD background (middle panel). Additionally, the right panel presents a single top signal event. The image has been prepared by cropping twelve pixels from each side and downsampling it by averaging four pixels into one, reducing the image size to without losing any vital information.
In order to feed the data into the classical and quantum TNs, the data has to be processed a step further. Since each initial state in a quantum circuit is , one needs to prepare this initial state to represent the given data point. Whilst there are various ways of mapping the data on a quantum circuit (see ref. [MLQC]), we chose to map it by rotating each state around the y-axis by the corresponding pixel value using with defined in eqn. (5). Similarly, as shown in ref. [Araz:2021un], classical TNs require -dimensional mapping of the data to simulate the Hilbert space. Such mapping can be performed by using
where for , reduces to .
3.2 Network architecture & training
In this section, we will demonstrate a comprehensive numerical analysis and comparison between QTNs and TNs with different sized networks and architectures along with possible hybrid implementations. Our framework relies on TensorFlow (version 2.7.0) [tensorflow2015-whitepaper, DBLP:journals/corr/AbadiBCCDDDGIIK16] where quantum circuits are simulated using PennyLane (version 0.20.0) [bergholm2020pennylane] with Qiskit (version 0.32.1) backend [Qiskit] (PennyLane–Qiskit plugin version 0.20.0).
3.2.1 Classical vs. Quantum Tensor Networks
To compare the usage of different architectures presented in the previous section, we prepared two sets of ansatz, one for four qubits and another for six qubits. Such small feature space is owed to the currently accessible quantum hardware limitations, here IBM’s quantum hardware. Hence, to employ an extended feature space, we also provide a hybrid classical-quantum TN where the classical portion of the network is responsible for mapping the image into a lower-dimensional manifold, which can then be deposited into the quantum hardware.
To limit the feature space for four and six qubits, the calorimeter image of pixels, which has been presented in Sec. missing, has been cropped from each side by fourteen pixels and then downsampled, leaving an image of pixels. We used central four pixels for four qubit networks. For six-pixel networks, we added the adjacent two top pixels of the image. As prescribed in Sec. missing, the modified transverse momentum in each of these pixels then fitted within . In the following, the image has been reshaped to form a vector.
Figures 4, 5 and 6 shows the four qubit construction of classical TNs (on the left) and Q-TNs (on the right) for MPS, TTN and MERA respectively. Each TN is required to have bond dimensions of (shown via red lines), and the physical dimensions (shown via green lines) are set to via eqn. (8). Note that both bond and physical dimensions of the TNs can be set to much larger values, but we will limit them to compare with the QTN performance. The purple line in each TN shows the prediction output.
Each QTN has been constructed via a set of unitary blocks, which includes trainable parameters where we will limit each unitary block to by fixing and to zero. As seen in figures 4 and 5 both MPS and TTN network possesses six unitary transformation giving each six trainable parameters. Q-MERA shown in Fig. 6, on the other hand, possesses eight trainable parameters.
Similarly, figures 7 and 8 shows the six qubit configuration for TNs (on the left) and QTNs (on the right) for TTN and MERA, respectively. Due to its monotonic architecture, an image for MPS hasn’t been included, but the six-qubit version can be visualized by adding two more qubits to Fig. 4 with the same pattern. Both Q-MPS and Q-TTN have nine trainable parameters in this configuration, whereas Q-MERA possesses seventeen trainable parameters due to its complex structure.
For the training of TNs, we employed Adam [Kingma2014AdamAM] optimization algorithm with a learning rate of . Although it has been noted in a previous study that training TNs with normalized gradients leads to much more stable tensor evolution [Araz:2021un], we will employ standard gradient descent for the TN training. For four qubit sample, we used a batch size of 100 events; however, for the six-qubit configuration, this batch size led unstable training sequence hence reduced to 50 events per batch.
For QTNs, we employed Quantum Natural Gradient Descent (QNGD) [rebentrost2018quantum, qng2020], which improves on the convergence speed in the training of variational quantum circuits [Blance:2020nhl], for faster optimization with a learning rate of . Instead of directly updating the given trainable parameters via their gradients for a given loss function, this algorithm solves a linear equation, for , where is the metric tensor of a given circuit, and is the gradient tensor. Then the trainable parameters are updated via , where are the trainable parameters and is the learning rate. The training has been limited to 100 events per batch, and both classical and quantum ansatz has been trained with the full training sample.
Each ansatz have been trained with the cross-entropy loss function,
where both TNs and QTNs are assumed to be Born machine; hence given in equations (3) and (7), respectively. During the training of each ansatz, we observed that loss value evolution for TNs is much slower than QTNs to converge to a minimum loss value; hence TNs are trained for epochs where QTNs are only allowed to train for epochs. We required the training to be terminated if there is no improvement on validation loss value for iterations, resulting in QTNs only training for epochs depending on the ansatz; however, all TN ansatz has run for the entire epochs. We find that QTN training could be terminated within the first epochs for both four and six qubit configurations. In addition to the early termination condition, we also required learning rate decay for every epochs by a factor of , depending on the improvement of the validation loss value.
Fig. 9 shows a receiver operating characteristic (ROC) curve for four qubit test results where we only used 10,000 events from the test set due to hardware limitations. For the comparison with QTN, the bond dimension of each TN ansatz has been chosen to be five, and the Hilbert space dimensions are set to two. Although each wire carries a two-dimensional vector, we observed that led to significantly worse results than QTN’s for any given TN ansatz. The top panel in Fig. 9 shows the results that are executed with the quantum hardware, and the bottom panel shows the quantum simulation execution with various noise models based on five different hardware configurations within IBM Quantum555Quantum simulations are based on Qiskit’s AER package. The simulated noise models are based on ibmq_belem, ibmq_bogota, ibmq_lima, ibmq_manila, and ibmq_quito processors.. The same colour has visualised the results from each noise model with shaded lines for each ansatz. Each QTN has been executed 5000 times (so-called shots), where the final result is chosen to be the mean of 5000 execution. In both panels, TTN, MERA and MPS have been represented with red, green and blue lines where QTNs are shown with solid and TNs have been shown with dashed lines. For the given four qubit configuration with TTN, MPS, and MERA have 90, 130 and 250 trainable parameters, with their quantum counterparts 6, 6 and 8 trainable parameters, respectively. We have also investigated more generic gate configurations with ; however, we did not observe a significant improvement in the results. We observe that both TN and QTN lead to similar results for each other. In addition to the ROC curve, each ansatz has been presented with the area under the curve (AUC) value. For both types of ansatz, we observe minimal change between corresponding realisations. This shows that for a small network configuration, is sufficient to represent a similar entanglement structure as QTNs.
Similarly, Fig. 10 shows the ROC curve for a six-qubit configuration with the same colour scheme. Again, the top panel shows the results executed with quantum hardware, and the bottom panel shows the quantum simulation results with five different noise models. For both Quantum hardware and simulation, we observed that QTN performance had been increased by around 17%. Although all QTN architectures lead to similar results for a relatively smaller feature space, we observed significant changes between different QTN realizations after adding two more qubits. While for four qubit configuration, the maximum difference between AUC values of QTNs was 2.4% this has been increased to 3.2% for six qubit configuration. Whilst leads to comparable results between TN and QTN architectures for four qubit configuration; we observed that once the network size is increased, with such low bond dimension, TNs were not able to reproduce the same performance as QTNs. For this reason, Fig. 10 shows TN ansatz with larger and values where we managed to match the QTN performance by employing for TTN and for MPS and MERA architectures.
As a comparison, Table missing shows the values for AUC and the number of trainable parameters for each TN ansatz with different and values. We observed that, especially for MPS and MERA, only increasing the bond dimension between nodes does not necessarily improve the performance; on the contrary, it has slowed down the optimization process by exponentially increasing the complexity of the loss landscape. Hence increasing Hilbert space dimensions along with the bond dimensions has been shown to perform better with relatively less trainable parameters. Although we did not observe any overtraining for presented values, either method leads to considerable growth in the computational cost of the TN contraction, hence resulting in an inefficient network for ML applications.
In order to study the increase in the complexity of the optimisation landscape, we employed the Fisher information matrix presented in Sec. missing. The lower three panels of Fig. 11 show the eigenvalue distribution of the normalised Fisher matrix for TTN (left), MPS (middle) and MERA (right). The mean Fisher matrix has been calculated as the average of thousand executions of randomly chosen input and parameter space. Input values have been uniformly varied within where trainable parameters have been allowed to run within range for each ansatz. Each group shows a blue scatter plot for QTNs and red for different configurations of TNs. For each distribution, the y-axis has been broken to accommodate the jumps in the eigenvalue distribution without localising the other points. We observe that eigenvalue distributions for QTNs are relatively evenly distributed for each case. Q-MPS shows the most uneven distribution among all other QTN configurations with more eigenvalues around zero. The eigenvalue distribution for TNs, on the other hand, has been observed to have the typical behaviour of a classical network, where the majority of the points are clustered very close to zero, followed by substantially large values.
The effect of the eigenvalue distribution has been reflected in the normalised effective dimension distribution shown on the upper panel of Fig. 11 following the same colour scheme, where configuration has been distinguished by using different line styles. Recall that the normalised effective dimension distribution for each network realisation will converge to 1 if a large enough sample has been introduced during the training. We observe that Q-TTN obtains the most significant effective dimension value where Q-MERA shows a prolonged convergence rate. The effect of eigenvalue distribution can directly be observed in the effective dimension distribution; due to the larger cluster around zero, Q-MPS shows the lowest values for the effective dimensions.
Whilst a larger TN-bond dimension is essential to have a more accurate representation of the data, we observe that the increase in the bond dimension directly affects the efficiency of gradient-based methods to train the system. The solid and the dashed lines in Fig. 11 shows and . We observe a significant decrease in the effective dimension’s convergence rate for each architecture. Similarly, increasing the bond dimension results in a more uneven distribution in the eigenvalues of the Fisher matrix. Comparing these results with Table missing, only increasing the bond dimensions does not provide a sufficient increase in the performance of the network, which also sacrifices the trainability of the ansatz. Increasing the Hilbert space dimension along with the bond dimensions, on the other hand, has been shown to achieve significantly better performance. Table missing shows that by only increasing the Hilbert space dimensions, one can gain around an 8% increase in the AUC values. However, this plummeted the effective dimension convergence rate, leading to a flat optimisation landscape.
Despite the significant improvement in the performance, due to the hardware limitations in near-term quantum devices, it is impossible to input a larger plane into a quantum circuit. However, as shown before, since TNs can effectively represent a quantum many-body system instead of manipulating the phase space to be suitable for a small quantum circuit, TNs can be used as a data processing layer which can then be deposited into a quantum circuit for classification. In the following section, such hybrid architectures will be investigated.
3.2.2 Hybrid Quantum – Classical Tensor Networks
This section introduces two main types of hybrid classical-quantum TN architectures. Each architecture is designed to have one classical layer, which will process a large image and reduce it to a four qubit output. A quantum circuit will classify the processed output from the TN in the following. Since the calorimeter images used in this study impose two-dimensional correlations between each pixel, we will adjust each given architecture to capture these correlations effectively. We have constrained the architectures to two main groups, namely one with TTN and another with MPS classical layer. Due to the computational complexity, we did not introduce a classical MERA layer that can process the image within a hybrid ansatz666An application of MERA on a two-dimensional lattice can be found in ref. [PhysRevLett.100.240603]..
As mentioned before, TTNs can capture beyond one-dimensional correlations between lattice sites. Concretely, with a more complex TTN tensor structure, it is possible to capture two-dimensional correlations between pixels of an image [ttn2019]. We designed a TTN with two types of condenser tensors to achieve this. The tensors connected directly to the image pixels are designed to pool four neighbouring pixels, capturing the correlations in both and axes. The collection of such tensors can be interpreted as a trainable pooling layer where each node takes four pixels and maps them to a vector on a higher-dimensional manifold. Hence each of these nodes forms rank-5 tensors. In the following, each of these nodes is connected strategically through rank-3 tensors until we get the desired dimensionality. For complete classical TTN, the dimensionality is hierarchically reduced until we reach the output dimension. The hybrid realisation has been reduced to a concatenated four-dimensional vector for four qubit input. Fig. 12 shows the representation of these two architectures where the network on the top shows pure classical TN, and the bottom panel shows the hybrid version of the architecture. This specific architecture aims to group the most related set of tensors to pool the local correlation information before investigating a more global picture. As before, red and green lines represent auxiliary and physical dimensions. The grid represents the image shown in Fig. 3 where the black dots in the centre of each pixel represents the mapped pixel vector defined in Eqs. (2) and (8). Purple lines on the bottom panel show the collection of the output values from the classical layer, where each node returns a one-dimensional vector which is then concatenated into a four-dimensional vector to be processed in the quantum circuit. As before, indicates valid data embedding into the quantum circuit.
Due to the nature of MPS, its structure is purely limited to one-dimensional lattices. However, as shown in ref. [Araz:2021un], the image can be reshaped so that the locally correlated pixels are close to each other. An s-shaped reshaping procedure has achieved this based on –axis. Hence, for the classical MPS, we will strictly follow the previously proposed procedure where the pixels are reordered in the -based s-shaped reordering procedure. For hybrid architecture, on the other hand, the MPS chain has been divided into four blocks of nine nodes where each outputs a one-dimensional vector which then is concatenated and inputted into a quantum circuit. Fig. 13 shows the representation of these architectures following the same colour scheme as before. The top panel represents the classical MPS, and the bottom panel shows the hybrid architecture.
Such hybrid architecture poses the question of how to train such a network. Although SGD has been proven to be a highly effective training method, it has been repeatedly shown that the QNGD method can achieve much faster convergence for a quantum circuit. Hence we used a mixed optimization algorithm where the classical portion is trained with the Adam algorithm with an initial learning rate of and the quantum circuit has been optimized via QNGD with an initial learning rate of . Both learning rates are decayed with the same factor simultaneously, whereas if the loss value of the validation set did not improve for 25 epochs, the learning rate has been reduced by a factor of 0.5. All ansatz are trained with the complete training sample with a batch size of 100 events.
Fig. 14 presents the results of various realisations of the hybrid TN architectures compared to purely classical counterparts where the top panel shows the results generated by setting . The bottom panel shows the same for . The dashed red and blue curves represent both panels’ purely classical TTN and MPS realisations. Solid curves represent hybrid ansatz with different QTNs. The architectures with TTN are shown with red, orange and cyan, representing the networks with Q-TTN, Q-MPS and Q-MERA. The blue, purple and green curves, on the other hand, shows the MPS with Q-MPS, Q-TTN and Q-MERA, respectively. Due to the effective two-dimensional representation in TTN, we observe a 7% increase in the performance compared to MPS for configuration. This performance increase mainly originated from the high-efficiency regime, , but MPS performs better in the low-efficiency. MPS and TTN possess 1730 and 1645 parameters in this configuration, respectively. However, due to the growth in network complexity, this changes for configuration where MPS and TTN consist of and parameters, respectively. As before, TN’s optimisation capabilities degrade with the complexity of the network where gradient-based methods are not sufficient to train the network efficiently. Hence, although TTN architecture can interpret 2D objects more efficiently than MPS, it performs worse due to the complexity of the network by a factor of 3%.
We implemented a hybrid test for each classical TN layer using all four qubit QTNs mentioned in previous sections. As shown with the solid lines, each hybrid realisation performs better than the classical versions. However, this improvement is solely based on the high-efficiency regime for each configuration. This might be due to the sensitivity of the networks to the subtle information on the image. The QCD background is mainly concentrated on a few pixels; hence, the information can quickly vanish in a complex network structure if the impact is not significant enough. Whilst for configuration, we do not observe a large difference in performance for different hybrid models, the difference gets larger with configuration. Since we used the same number of qubits with the same QTNs, we conclude that this is due to the increase in the mapping capability of the classical layer. For each configuration, the TTN layer has been observed to have a larger impact on the correct classification of the data. However, for the more extensive configuration, hybrid ansatz with the MPS layer has been observed to achieve much closer results to TTN.
Tensor Networks are algebraic tools to represent high-rank tensors effectively. They have been widely studied to represent complex quantum many-body systems and capture their entanglement properties. Multimodal data can be expressed as a quantum system, and TNs can effectively represent correlations within the data structure. Moreover, due to the ability to describe quantum states, TNs are the ideal machinery to study quantum machine learning, where the expertise on “classical” methods built for TNs can directly be applied to quantum hardware.
In this study, we explored the possibility of classifying HEP data with TN-inspired quantum circuits. We mainly focused on three widely studied TN architectures: TTN, MPS, and MERA and compared their performance to corresponding quantum circuits. We have shown that, although classical TNs are very successful in representing complex data structures, they require a large auxiliary and Hilbert space dimensions to capture the natural entanglement capabilities of a quantum system. Hence, to achieve the same performance as QTNs, TNs must be executed with a much higher computational cost and more trainable parameters. The Fisher information matrix provides a Riemannian metric to measure the flatness of the optimization landscape. Based on Fisher information, the effective dimensions relate to the number of samples required to represent the statistical model well. Whilst increasing the network’s dimensionality helped improve the performance, using the Fisher information, we observed that this resulted in exponentially suppressed gradients, rendering gradient-based methods unable to train classical TNs efficiently. Additionally, using effective dimensions, we show that TNs require exponentially more data to achieve sufficient representation of the data with increasing auxiliary and Hilbert space dimensions. Thus, we find that QTNs can perform significantly better than TNs with a fraction of trainable parameters.
Despite the undeniable success of QTNs, they are still constrained to a low number of qubits due to the limitations of quantum hardware in near-term quantum devices, which results in not being able to learn the entire dataset. To surpass this limitation, we additionally proposed a hybrid end-to-end training architecture where a larger phase-space of data can be processed with classical TNs and then deposited into a QTN to be classified. Due to the nature of the TNs, this allows a flexible architecture. As a result, more classical nodes can be transformed into circuit inputs once more qubits are available. Finally, we compared the purely classical TNs with hybrid architectures and showed that hybrid networks perform much better than strictly classical TNs.
In this study, we have limited our QTN architecture to a modest size where each circuit block is transformed into two input states. Although such simple architecture already showed the quantum advantage over classical networks, these can be extended to blocks transforming more qubits. Additionally, the entanglement between blocks can be enhanced by introducing auxiliary qubits to increase the correlations between circuit blocks. This can significantly improve the expressivity of the quantum network. Additionally, the hybrid architectures presented in this study were highly simplistic. Since specific geometric properties can be embedded into TN architecture, much more complex structures can be employed using the known symmetries within the data.
We acknowledge the use of IBM Quantum services for this work. The views expressed are those of the authors, and do not reflect the official policy or position of IBM or the IBM Quantum team. In this paper we used ibmq_quito and ibm_perth, which are one of the IBM Quantum Falcon Processors. We thank Vishal S. Ngairangbam and Josh Izaac for very helpful discussions.