The resemblance between the methods used in studying quantum-many body physics and in machine learning has drawn considerable attention. In particular, tensor networks (TNs) and deep learning architectures bear striking similarities to the extent that TNs can be used for machine learning. Previous results used one-dimensional TNs in image recognition, showing limited scalability and a high bond dimension. In this work, we train two-dimensional hierarchical TNs to solve image recognition problems, using a training algorithm derived from the multipartite entanglement renormalization ansatz (MERA). This approach overcomes scalability issues and implies novel mathematical connections among quantum many-body physics, quantum information theory, and machine learning. While keeping the TN unitary in the training phase, TN states can be defined, which optimally encodes each class of the images into a quantum many-body state. We study the quantum features of the TN states, including quantum entanglement and fidelity. We suggest these quantities could be novel properties that characterize the image classes, as well as the machine learning tasks. Our work could be further applied to identifying possible quantum properties of certain artificial intelligence methods.READ FULL TEXT VIEW PDF
Tensor networks, originally designed to address computational problems i...
The harnessing of modern computational abilities for many-body wave-func...
Restricted Boltzmann machine (RBM) is one of the fundamental building bl...
Quantum many-body systems exhibit a rich and diverse range of exotic
Machine learning methods are applied to finding the Green's function of ...
Tensor networks (TN) have found a wide use in machine learning, and in
Deep convolutional networks have witnessed unprecedented success in vari...
To verify the representation power of the TTN, we use the CIFAR-10 dataset , which consists of 10 classes with 50,000 RGB images in the training dataset and 10,000 images in the testing dataset. Each RGB image was originally pixels. We transformed them to gray-scale to reduce the complexity of training, which is a reasonable trade-off for reducing information for the training.
Figs. 2 (a) and (b) exhibit that the relation between the representation power (learnability or model complexity) and the bond dimensions of the TTN. The TTN actually gives a mapping that optimally project the vectorized images from the -dimensional space to the -dimensional one. Thus, from the perspective of tensor algebra, the limitation of the representation power of depends on the input dimension of . On the other hand, the TTN can be considered as an approximation of such an exponentially large mapping, by writing it into the contraction of small tensors. The dummy indexes that are contracted inside the TTN are called virtual bonds, whose dimensions determine how close can reach the limitation.
The sequence of convolutional and pooling layers in the feature extraction part of a deep learning network is known to arrive at higher and higher levels of abstraction that help separate the classes in a discriminative learner . This is often visualized by embedding the representation in two dimensions by t-SNE 
, and by coloring the instances according to their classes. If the classes clearly separate in this embedding, the subsequent classifier will have an easy task performing classification at a high accuracy. We plotted this embedding for each layer in the TN in Fig.4. We observe the same pattern as in deep learning, having a clear separation in the highest level of abstraction.
|Training accuracy (%)||96||97||96||94||96||94||97||94||93||94||95|
|Testing accuracy (%)||97||97||95||93||95||95||96||94||93||93||92|
Furthermore, to test the generalization power of TTNs, we used the MNIST dataset, which is widely used in handwritten recognition. The training set consists of 60,000 () gray-scale images, with 10,000 testing examples. For the simplicity of encoding, we rescaled them to () images so that the TTN can be built with four layers.
With the increase of bond dimensions (both of the input and virtual bonds), we find an apparent rise of training accuracy, which is shown in Fig. 3
. At the same time, we observed the decline of testing accuracy. The increase of bond dimension leads to a sharp increase of the number of parameters and, as a result, it will give rise to overfitting and lower the performance of generalization, mirroring the theoretical principles of statistic learning. Therefore, one must pay attention to finding the optimal bond dimension – we can think of this as a hyperparameter controlling model complexity. Considering the efficiency and avoiding overfitting, we use the minimal values of bond dimensions (Table1) to reach the training accuracy around . Our results indicate that only small bond dimensions () are needed.
With , -dimensional TTN states can be defined as , with the vector corresponding to the -th label. In this work, we use the following strategy to obtain TTN states for the classification of classes. For the -th image class as an example, we first label the training samples as “yes” or “no”. Then we train the TTN as a binary classifier. Finally, is obtained as . We keep normalized in the algorithm.
The fidelity between two states is defined as . It measures the distance between the two quantum states in the Hilbert space. Fig. 5 shows the fidelity between each two ’s trained from the MNIST dataset. One can see that remains quite small in most cases. This means that are almost orthonormal. Although the total dimension of the vectorized images is , most of the relevant information gathers in a small corner spanned by the orthonormal states .
In addition, the largest value of the fidelity appears as . We speculate that this is closely related to the way how the data instances are fed and processed in the TTN. In our case, two image classes that have similar shapes will result in a larger fidelity, because the TTN essentially provides a real-space renormalization flow. In other words, the input vectors are still initially arranged and renormalized layer by layer according to their spatial locations in the image; each tensor renormalizes four nearest-neighboring vectors into one vector. Fidelity can be potentially applied to building a network, where the nodes are classes of images and the weights of the connections are given by the . This might provide a mathematical model on how different classes of images are associated to each other. We leave these questions for future investigations.
Another important concept of quantum mechanics is (bipartite) entanglement, a quantum version of correlations . It is one of the key characters that distinguishes the quantum states from classical ones. Entanglement is usually given by a normalized positive-defined vector called entanglement spectrum (denoted as ). The strength of entanglement is measured by the entanglement entropy . Fig. 5 shows the entanglement entropy of trained with the MNIST dataset. We compute two kinds of entanglement entropy by cutting the images in the middle along the x and y directions as shown in fig. 1. The results were marked by up-down and left-right in Fig.5
. The first one denotes the entanglement between the upper part of the images with the downer part. The latter denotes the entanglement between the left and the right parts. With the TTN, the entanglement spectrum is simply the singular values of the matrixwith the top tensor. This is because the all the tensors in the TTN are orthogonal. Note that has four indexes, of which each represents the effective space renormalized from one quarter of the vectorized image. Thus, the bipartition of the entanglement determines how the four indexes of are grouped into two bigger indexes before calculating the SVD.
Two implications can be achieved from the entanglement entropy. Firstly, it is known from tensor network that entanglement entropy reveals the needed dimensions of the virtual bonds for reaching a certain precision. In other words, entanglement entropy is a characterization of the computational complexity of the classification using TTN. Secondly, for a physical state with two subsystems, entanglement entropy measures the amount of information of one subsystem that can be gained by measuring the other subsystem. Here, an important analog is between knowing a part of the image and measuring the corresponding subsystem of the quantum state. Thus, we suggest that in our image recognition, entanglement entropy characterizes how much information of one part of the image we can gain by knowing the rest part of the image. In other words, if we only know a part of an image and want to predict the rest according to the trained TTN state, the entanglement entropy measures how accurately this can be done. Moreover, we show that actually possess small entanglement, meaning that the TTN can efficiently capture and classify the images with a relatively small virtual bond dimension. Our results suggest that the images of “0” and “4” are the easiest and hardest, respectively, to predict the missing part given the other part.
We continued the forays into using tensor networks for machine learning, focusing on hierarchical, two-dimensional tree tensor networks that we found a natural fit for image recognition problems. This provides a scalable approach of a high precision. We conclude with the following observations:
The limitation of representation power (learnability) of a TTN strongly depends on the input bond dimensions, and the virtual bond dimensions determine how well the TTN reaches this limitation.
A hierarchical tensor network exhibits the same increase level of abstraction as a deep convolutional neural network or a deep belief network.
Our scheme naturally connects classical images to quantum states, permitting to use quantum properties (fidelity and entanglement) to characterize the classical data and computational tasks.
Moreover, our work contributes towards the implementation of machine learning by quantum simulations/computations. Firstly, since we propose to encode image classes into TTN states, it is possible to realize the proposed machine learning by, e.g., quantum state tomography techniques . Secondly, arbitrary unitary gates can in principle be realized by the so-called digital quantum simulators . This makes another possible way of realizing our proposal by quantum simulations, thanks to the unitary conditions of the local tensors.
Our approach to classify image data begins by mapping each pixel to a -component vector . This feature map was introduced by ) and defined as:
where runs from to . By using a larger , the TTN has the potential to approximate a richer class of functions. With such an nonlinear feature map, we can project a gray-scale image from scalar space to -dimensional vector space, where the image is represented as a direct product state of local -dimensional vectors . The coefficients of are given by the feature map [Eq. (1)] from the -th pixel.
can be written as a hierarchical structure of layers TN (see Fig.1 for example), whose coefficients are given by
where is the number of tensors in the -th layer. The output for classifying the -th sample is a -dimensional vector obtained by contracting the vectorized image (denoted by for the -th sample) with the TTN, which reads as
Where acts as the predicted label corresponding to the -th sample. Based on these, we derive a highly efficient training algorithm inspired by MERA . We choose the cost function to be minimized as the square error, which is defined as
To proceed, let us give the cost function in the following form
The third term comes from the normalization of , and we assume the second term is always real.
The dominant cost comes from the first term. We borrow the idea from the MERA approach to reduce this cost. Mathematically speaking, the central idea is to impose that is orthogonal, i.e., . Then is optimized with satisfied in the valid subspace that optimizes the classification. By satisfying in the subspace, we do not require an identity from , but mean under the training samples.
In MERA, a stronger constraint is used. With the TTN, each tensor has one upward and four downward indexes, which gives a non-square orthogonal matrix by grouping the downward indexes into a large one. Such tensors are called isometries and satisfyafter contracting all downwards indexes with its conjugate. When all the tensors are isometries, the TTN gives a unitary transformation that satisfies ; it compresses a -dimensional space to a -dimensional one.
In this way, the first terms becomes a constant, and we only need to deal with the second term. The cost function becomes
Each term in is simply the contraction of the tensor network, which can be efficiently computed.
The tensors in the TTN are updated alternatively to minimize Eq. (6). To update for instance, we assume other tensors are fixed and define the environment tensor , which is calculated by contracting everything in Eq. (6) after taking out (Fig. 1) . Then the cost function becomes . Under the constraint that is an isometry, the solution of the optimal point is given by where and
are calculated from the singular value decomposition. At this point, we have .
Then, the update of one tensor becomes the calculation of the environment tensor and its singular value decomposition. In the alternating process for updating all the tensors, some tricks are used to accelerate the computations. The idea is to save some intermediate results to avoid repetitive calculations by taking advantage of the tree structure. Another important detail is to normalize the vector obtained each time by contracting four vectors with a tensor.
The scaling of both time complexity and space complexity is , where is the dimension of input vector; the dimension of virtual bond; the dimension of input bond; the number of training inputs.
The strategy for building a multi-class classifier is the one-against-all classification scheme in machine learning. For each class, we train one TTN so that it recognizes whether an image belongs to this class or not. The output of Eq. (3) is a two-dimensional vector. We fix the label for a yes answer as . For the image classes, we accordingly have TTNs that satisfy . Then for recognizing the -th sample, we introduce a -dimensional vector , where the p-th element is defined as the inner product between and the vectorized image, satisfying
The position of its maximal element gives which class the image belongs to.
SJR is grateful to Ivan Glasser and Nicola Pancotti for stimulating discussions. DL was supported by the China Scholarship Council (201609345008), the National Natural Science Key Foundation of China (61433015), and the National Natural Science Foundation of China (61771340). SJR, PW, and ML acknowledge support the Spanish Ministry of Economy and Competitiveness (Severo Ochoa Programme for Centres of Excellence in R&D SEV-2015-0522), Fundació Privada Cellex, and Generalitat de Catalunya CERCA Programme. SJR and ML were further supported by ERC AdG OSYRIS (ERC-2013-AdG Grant No. 339106), the Spanish MINECO grants FOQUS (FIS2013-46768-P), FISICATEAMO (FIS2016-79508-P), Catalan AGAUR SGR 874, EU FETPRO QUIC, EQuaM (FP7/2007-2013 Grant No. 323714), and Fundació Catalunya - La Pedrera Ignacio Cirac Program Chair. PW acknowledges financial support from the ERC (Consolidator Grant QITBOX) and QIBEQI FIS2016-80773-P), and a hardware donation by Nvidia Corporation. GS and CP were supported by the MOST of China (Grant No. 2013CB933401), the Strategic Priority Research Program of the Chinese Academy of Sciences (Grant No. XDB07010100), the NSFC (Grant No. 14474279). CP appreciates ICFO (Spain) for the hospitality during her visit and is grateful to financial support from UCAS and ICFO.
On the equivalence of restricted boltzmann machines and tensor network states.2017.