Parallelly, there has been remarkable progress in the domain of quantum computing focused towards solving classically intractable problems through computationally cheaper techniques. A major leap forward in quantum computing came when Shor 10.1137/S0097539795293172; 10.1109/SFCS.1994.365700 proposed his famous algorithm for prime factoring numbers in polynomial time, which exposed the vulnerabilities of security protocols such as RSA. Consequent research has been aimed at developing poly-time alternatives of classical algorithms utilising the core idea of quantum superposition and entanglement. We briefly describe these ideas when reviewing basic principles of quantum computing.
Quantum computing naturally lends its ideas to the domain of machine learning and consequently there been active research on trying to use principles of quantum computing to improve the representation power and computational efficiency of classical ML approaches. Quantum extensions to classical ML problems have gained prominence in recent times, such as clustering lloyd2013quantum; NIPS2019_8667; otterbach2017unsupervisedRebentrost_2014, gradient descent for linear systems Kerenidis_2020Lloyd_2014Amin_2018wilson2018quantum, recommendation systems kerenidis2016quantum
, EM algorithm for Gaussian Mixture Modelskerenidis2019quantum, variational generations for adversarial learning romero2019variational, etc.
represents a single neuron and forms the basic unit of the deep learning architectures. The idea of a quantum perceptron was first proposed by10.1016/0020-0255(94)00095-S in 1995 and has since been formalized in multiple works GUPTA2001355; NIPS2003_2363; journals/qip/SchuldSP14; Wan_2017; cao2017quantum; Daskin_2018; Farhi2018ClassificationWQ; shao2018quantum; Beer2020quantum. Recently, wiebe2014quantum showed that quantum computing can provide a more comprehensive framework for deep learning than classical computing and can help optimization of the underlying objective function.
In this work, we summarise the different ideas presented in the domain of quantum deep learning which include quantum analogues to classic deep learning networks and quantum inspired classic deep learning algorithms. We present the different schemes proposed to model quantum neural networks (QNNs) and their corresponding variants like quantum convolutional networks (QCNNs).
This work is structured as follows: we first review the basics of classical deep learning and quantum computing in Sections 2 and 3, for the benefit of an uninitiated reader. In Section 4, we provide a detailed overview of Quantum Neural Networks as formulated in several works, by examining its individual components analogous to a classical NN. We also briefly summarize several variants of QNNs and their proposed practical implementations. In Section 5
, we review works that develop quantum analogues to classical convolutional and recurrent neural networks (CNNs and RNNs). In Section6, we mention several classical deep learning algorithms which have been inspired by quantum methods, including applications to natural language processing.
2 Basic Principles of Classical Deep Learning
Neural networks represent a subset of machine learning methods, which try to mimic the structure of the human brain in order to learn. Neurons are the fundamental computational units in neural networks. Each neuron performs a sequence of computations on the inputs it receives to give an output. Most commonly, this computation is a linear combination of the inputs followed by a non-linear operation, i.e. the output is where are the inputs to the neuron. The are the parameters of the neuron, and:
Neural network architectures are constructed by stacking neurons. In fully-connected feedforward neural networks, the output of each neuron in the previous layer is fed to each neuron in the next layer. The simplest neural network is the fully-connected network with one hidden layer (Figure 1). Let the input be -dimensional and output be -dimensional. Then, a NN a single hidden layer of units performs the following computation:
and are weight matrices of dimensions and respectively. The non-linear function is applied element-wise to the vector input. This can be generalized to a NN with hidden layers as:
The universal approximation theorem citeulike:3561150; journals/nn/LeshnoLPS93 states that, a neural network with a single hidden layer can approximate any function, under assumptions on its continuity. However, it is known that deeper networks (with greater number of hidden layers) learn more efficiently and generalize better than shallow networks NIPS2014_5422; AAAI1714849. Increased availability of data, greater complexity of tasks and the development of hardware resources such as GPUs have led to the use of deeper and deeper neural networks, thus the term ‘deep learning’.
Like most machine learning algorithms, tasks in deep learning are posed as Empirical Risk Minimization (ERM) problems. Fundamentally, the parameter learning is done through gradient based optimization methods to minimize a loss function
. The loss function is computed over the training data, and depends on the task at hand. Common loss functions include the 0/1 and cross-entropy loss for classification tasks,Rumelhart:1986we; lecun1989cnn
uses the chain-rule to offer a computationally efficient way of obtaining gradients in neural networks. Learning is known to be highly sensitive to the optimization algorithmkingma2014method; lequoc2011opti as well as the initialization of the parameters glorot2010.
Complex Neural Architectures
The past decades of deep learning research have led to several breakthroughs such as convolutional neural networks fukushima:neocognitronbc; Lecun2000; KriSut12Imagenet (designed for learning hierarchical and translation-invariant features in images), recurrent neural networks 10.5555/104279.104293; hochreiter1997long (for sequential data such as time series and natural language), ResNets he2015deep
(designed to combat the vanishing gradient problem in deep learning) and Transformersvaswani2017attention (the current state of the art method in natural language processing).
Convolutional neural networks (CNNs) have revolutionized the field of computer vision, since lecun1989cnn
demonstrated how to use back propagation to efficiently learn feature maps. They form the basis of most state-of-the-art tasks in modern computer vision, and are widely deployed in applications including image processing, facial recognition, self-driving cars, etc.
Classical CNNs are designed to capture hierarchical learning of translation-invariant features in structured image data, through the use of convolutional and pooling layers. Convolutional layers consist of multiple convolutional filters, each of which computes an output feature map by convolving local subsections of the input iteratively. Pooling layers perform subsampling to reduce the dimensionality of the feature maps obtained from convolutional layers, most commonly by taking the maximum or mean of several nearby input values. A non-linear activation is usually applied to the output of the pooling layer.
A typical CNN architecture for image classification consists of several successive blocks of convolutionalpoolingnon-linear layers, followed by a fully connected layer (Figure 2). Convolutional filters learn different input patterns, at different levels of abstraction depending upon the depth of the layer. For image inputs, the initial layers of the CNN learn to recognize simple features such as edges. The features learnt by successive layers become increasingly complex and domain specific, through a combination of features learnt in previous layers. CNNs are a powerful technique, and several papers have adapted its ideas to the quantum setting, and we discuss these in Section 5.
Feedforward neural networks are constrained as they perform predefined computations on fixed-size inputs. Recurrent Neural Networks (RNNs) are designed to handle sequences of inputs, operating on one input at a time while retaining information about preceding inputs through a hidden state. For a sequential input , the simplest RNN performs the following computation:
and refer to the hidden state and output of the RNN at step of the sequence, is the initial hidden state, and are functions to be learnt. RNNs can also be used to learn representations of sequence inputs for different down stream tasks with the final hidden state as the embedding of the input . Figure 3 shows the the temporal unfolding of a simple RNN.
RNNs are trained using Backpropagation-through-time (BPPT) 58337
, a temporal extension of the backpropagation algorithm. The versatility of RNNs is such that they are used for a wide variety of applications: sequential-input single-output (e.g. text classification), single-input sequential-output (e.g. image captioning) and sequential-input sequential-output (e.g. part-of-speech tagging, machine translation) tasks. Several innovations have improved the performance of the vanilla RNN described above, such as LSTMhochreiter1997long and GRU chung2014, bidirectional RNNs 650093, attention mechanism bahdanau2014neural, encoder-decoder architecture cho2014 and more.
3 Principles of Quantum Computing
The qubit is the basic unit of information in quantum computing. The power of quantum computing over classical computing derives from the phenomena of superposition and entanglement
exhibited by qubits. Unlike a classical bit which has a value of either 0 or 1, superposition allows for a qubit to exist in a combination of the two states. In general, a qubit is represented as:
and represent the two computational basis states, and are complex amplitudes corresponding to each, satisfying . Observing
a qubit causes a collapse into one of the basis states. The probability of each state being observed is proportional to the square of the amplitude of its coefficient, i.e. the probabilities of observingand are and respectively. A qubit is physically realizable as a simple quantum system, for example the two basis states may correspond to the horizontal and vertical polarization of a photon. Superposition allows quantum computing systems to potentially achieve exponential speedups over their classical counterparts, due to the parallel computations on the probabilistic combinations of states.
Entanglement refers to the phenomenon by which qubits exhibit correlation with one another. In general, a set of entangled qubits exist as a superposition of basis states. Observing one or more qubits among them causes a collapse of their states, and alters the original superposition to account for the observed values of the qubits. For example, consider the 2-qubit system in the following initial state:
Suppose a measurement of the first qubit yields a value of 0 (which can occur with probability ). Then, collapses into:
Note that the relative probabilities of the possible states are conserved, after accounting for the state collapse of the observed qubits.
In classical computing, two fundamental logic gates (AND and OR) perform irreversible computations, i.e. the original inputs cannot be recovered from the output. Quantum gates (which operate on qubits) are constrained to be reversible, and operate on the input state to yield an output of the same dimension. In general, quantum gates are represented by unitary matrices, which are square matrices whose inverse is their complex conjugate.
An -qubit system exists as a superposition of basis states. Its state can be described by a dimensional vector containing the coefficients corresponding to each basis state. For example, the vector above may be described by the vector using the basis vectors.
Thus, a -qubit quantum gate represents a unitary matrix that acts on the state vector. Two common quantum gates are the Hadamard and CNOT gates. The Hadamard gate acts on 1-qubit and maps the basis states and to and respectively. The CNOT gate acts on 2-qubits and maps to . In other words, the first bit is copied, and the second bit is flipped if the first bit is 1. The unitary matrices corresponding to the Hadamard and CNOT gates are:
The Pauli matrices () are a set of three complex matrices which form a basis for the real vector space of 2 × 2 Hermitian matrices along with the identity matrix.
For a dimensional function space, the density operator represents a mixed state and is defined as:
where represent the computational bases of the Hilbert space, the coefficients are non-negative probabilities and add up to , and is an outer product written in bra-ket notation. The expected value of a measurement can be obtained using the density operator using the following formula:
where denotes the trace of the matrix.
4 Quantum Neural Network
Multiple research works GUPTA2001355; NIPS2003_2363; journals/qip/SchuldSP14; Wan_2017; cao2017quantum; Daskin_2018; Farhi2018ClassificationWQ; shao2018quantum; Beer2020quantum have proposed formulations for a quantum neural network(QNN) as a quantum analogue to a perceptron. NIPS2003_2363 were one of the earliest to propose a QNN which was modelled using a quantum circuit gate whose weights were learned using quantum search and piecewise weight learning. Several of these papers share a high level idea with respect to formulating the QNN through reversible unitary transforms on the data and then learning them through an approach analogous to the backpropagation algorithm. In this section, we present an overview of a QNN by breaking its components for learning a regression/classification problem in the quantum setting.
4.1 Representing the input
Inherently, the classical neural network computations are irreversible, implying a unidirectional computation of the output given the input. When mathematically posed, a classical NN computes the output from the input: . In contrast, quantum mechanics inherently depends on reversible transforms and a quantum counterpart for transforming the inputs to outputs for a NN can be posed by adding an ancillary bit to the input to obtain the output: . Muthukrishnan99classicaland show that such an operation can be always represented through a permutation matrix. To make the input representation unitary, we represent the input component of the vector through a quantum state . An ancillary dummy qubit can be added to corresponding to the output . The reversible transformation is thus rendered unitary in the quantum setting as: where represents the transformed input qubits. For multi-class classification problems, when the output labels cannot be captured by a single qubit, one can allocate output qubits to represent the label where is the number of label classes.
QNNs can take as input purely quantum data or transformation of classical data into quantum states. When representing quantum data, can be a superposition of the computational basis in the -dimensional Hilbert space where represents the 2-dimensional Hilbert space with basis and the basis for are . Thus can be denoted as where represents the complex amplitudes assigned to computational basis states .
While exploiting truly quantum data is the eventual goal of developing QNN models, the majority of related works shift their focus to the immediate benefits derived from QNNs over classical data. To transform classical data to a quantum state represented through qubits, several popular strategies have been put to use. An easy strategy, popularly used by several QNN proposals Farhi2018ClassificationWQ
, is to binarize each individual componentof the input through a threshold, and then represent each binarized dimension as a corresponding qubit resulting in being represented as a computational basis in the Hilbert space. This approach leads to a high loss of information contained in the data. To counter this, journals/corr/abs-1812-03089 suggest capturing a more fine-grained representation of as a superposition of computational basis in the Hilbert space. For example, let denote the computational basis corresponding to the quantum state with the qubit 1 in the position for each dimension . Then can be represented as a quantum state where .
In parallel work, some strategies have been proposed in the continuous-variable architecture journals/corr/abs-1806-06871
, which encodes the input to quantum states through continuous degrees of freedom such as the amplitudes of the electromagnetic fields. This approach avoids the information loss due to the discretization of continuous inputs, however at the cost of complexity of practical realization.
4.2 Modeling the Quantum Network
The quantum network has been most popularly modelled through learnable variational quantum circuits Torrontegui2018 . A permutation matrix can be used to transform and therefore is the simplest technique for the QNN model. Mathematically, a square matrix is a permutation matrix if and all entries of are either 0 or 1. However, the total number of distinct permutation matrices is a discrete set of size and therefore restricts the richness of representations that they can capture. This transformation can be modelled more richly using unitary matrices, which are characterized by learnable free parameters. Any unitary matrix can be expressed as , where
is a Hermitian matrix. Since every Hermitian matrix can be written as linear combinations of tensor products of the Pauli matrices () and the identity matrix(), the unitary matrix over bits can be written as
where denotes respectively for and is the trainable free parameter. For notational brevity, we will denote a K bit unitary as where is the set of all free parameters . For our input representation we need a bit unitary matrix to transform this to the output . Thus the simple variant of a quantum neural network, analogous to a single perceptron in the classical setting, uses a single unitary matrix of dimension and can be denoted by
To capture detailed patterns in the input, a quantum neural network may be a cascade of several variational circuits, similar to a classical deep neural network. A sequential cascade of unitary matrices may be denoted as the following (we skip writing the for notational brevity):
where denotes the unitary matrix corresponding to the layer and is the set of all parameters.
Some recent works Beer2020quantum have further increased the modeling complexity of through a more direct inspiration from classical NNs: having multiple hidden units for every layer in the model. We introduce an additional notation for the mixed state density corresponding to the input state as , where denote the computational basis of the Hilbert space. In Beer2020quantum, the first layer initializes a state of of dimension (hidden state dimension) in addition to the input state . is applied to , where corresponds to the ancillary output qubit. Here can be denoted as a sequential product of multiple unitary matrices corresponding to number of perceptrons in layer . This transformation is denoted as . From , the density operator corresponding to the hidden state qubits and the output ancillary qubit are extracted using a partial trace operator, and fed to the next layer where the transforms are applied in the same way. Having number of perceptrons in layer allows a greater degree of freedom to the QNN to capture patterns in the data.
In the continuous variable architecture, journals/corr/abs-1806-06871
model a QNN as a variational quantum circuit, with gaussian and non-gaussian gates used to implement linear and non-linear transformations.
4.3 Observing the Output
journals/qip/SchuldSP14 describe several works Menneer1995; Zak1998
where measuring the output from the network corresponds to the collapse of the superposition of quantum states to a single value, forming a close analogue to the non-linearity imposed in classical NNs through activation functions.
When the data is truly quantum in nature, the output state corresponding to the input state can be a pure computational basis or a mixed quantum state. Let the the mixed state density for the output state obtained from the QNN be denoted by , corresponding to the last qubit in the final quantum state obtained after the unitary matrix operations. A popular measure of closeness between the observed and actual output quantum state is their fidelity, which when averaged over the training data can be mathematically represented as:
Beer2020quantum show that the fidelity is a direct generalization of the classical empirical risk. When the the output state for the input is mixed quantum state and not a computational basis, the fidelity expression can simply be modified to account for the case when is mixed.
When the input data was originally in a classical form and the output is a classical scalar/vector value, measurement of the output state from the QNN has been the popular approach Farhi2018ClassificationWQ; Wan_2017 to compute the cost function (). Farhi2018ClassificationWQ measure a Pauli operator, say on the readout bit and denote this measurement by . Measuring is probabilistic in the different possible outcomes, and hence an average of is measured for multiple copies of the input . Averaging computes the following:
The loss can now be defined as a mean squared error or loss with respect to this averaged value of as:
where corresponds to the original output and averaged QNN output for input .
4.4 Learning network parameters
Similar to classical deep learning, the QNN parameters, for , are learnt by using first-order optimization techniques to minimize a loss function over the dataset. The simplest gradient based update rule is the following:
where are the parameters being learnt, is the loss computed over the data and
is the step-size. A second order estimate of the derivative of a function can be found using the finite difference method as:
For this, the loss function for a particular value of the parameter set for the unitary matrix of layer , needs to be estimated to within and Farhi2018ClassificationWQ show that this requires measurements. This needs to be done for every layer parameter independently resulting in such repetitions for a -layer QNN.
Under a special condition on the unitary matrices for the QNN where they can be represented as ( being a tensor product of Pauli operators acting on a few qubits), an explicit gradient descent update rule can be obtained. The gradient of the cost function with respect to the for the layer parameters is given by:
where is the tensor product of Pauli operators corresponding to layer defined above and refers to the imaginary part. Farhi2018ClassificationWQ make the interesting observation that is a unitary operation and can therefore be viewed as a quantum circuit of unitaries each acting on a few qubits, therefore enabling efficient gradient computations.
4.5 QNN Variants
There have been multiple ideas proposed similar to a learnable QNN as described above. mitarai2018quantum pose a problem through the lens of learning a quantum circuit, very similar to the QNN, and use a gradient-based optimization to learn the parameters. Romero_2017 introduce a quantum auto-encoder for the task of compressing quantum states which is optimized through classical algorithms. ngoc2020tunable propose an alternate QNN architecture only using multi-controlled NOT gates and avoiding using measurements to capture the non-linear activation functions of classical NNs. zhao2019qdnn suggest interleaved quantum structured layers with classical non-linear activations to model a variant of the QNN. Multiple ideas mitarai2018quantum; zhao2019qdnn; journals/corr/abs-1812-03089 utilise a hybrid quantum-classical approach where the computation is split so as to be easily computable on classical computers and quantum devices.
4.6 Practical implementations of QNNs
While modelling a QNN has been a hot topic in the field of quantum deep learning, several of the algorithms cannot be practically implemented due to the poor representation capability of current quantum computing devices. There has been considerable research in the field of practically implementing QNNs behrman2002quantum and developing hybrid quantum-classical algorithms which can successfully perform computations using a small QRAM.
Early works in practically implementing QNNs used the idea of representing the qubits through polarized optical modes and weights by optical beam splitters and phase shifters altaisky2001quantum. Parallely, Behrman2000 proposed implementing the QNN through a quantum dot molecule interacting with phonons of a surrounding lattice and an external field. Such a model using quantum dotshas been extensively studied since Toth_1996; 831067; Altaisky2014.
Korkmaz_2019 used a central spin model as a practical implementation of a QNN using a system of 2 coupled nodes with independent spin baths. A similar idea was earlier proposed by Deniz2017 using a collisional spin model for representing the QNN thereby enabling them to analyse the Markovian and non-Markovian dynamics of the system.
The majority of the recent research in the area of practical implementations of QNNs has been centered on simulating quantum circuits on Noisy Intermediate-Scale Quantum Computing (NISQ) devices. Juncheng2015 presented a neuromorphic hardware co-processor called Darwin Neural Processing Unit (NPU) which is a practical implementation of the Spiking Neural Network (SNN) Tavanaei_2019; NIPS2018_7417, a type of biologically-inspired NN which has been popularly studied recently.
potok2017study conduct a study of performance of deep learning architectures on 3 different computing platforms: quantum (a D-Wave processor Johnson2011Quantum), high performance, and neuromorphic and show the individual benefits of each. Tacchino2019 experimentally use a NISQ quantum processor and test a QNN with a small number of qubits. They propose a hybrid quantum classical update algorithm for the network parameters which is also parallely suggested by tacchino2019quantum.
5 Complex Quantum Neural Network Architectures
5.1 Quantum CNNs
Cong_2019 propose a quantum CNN through a quantum circuit model adapting the ideas of convolutional and pooling layers from classical CNNs. The proposed architecture (shown in Figure 5) is similarly layered, however it differs in the fact that it applies 1D convolutions to the input quantum state (contrary to 2D/3D convolutions on images). The convolutional layer is modeled as a quasi-local unitary operation on the input state density . This unitary operator is denoted by and is applied on several successive sets of input qubits, up to a predefined depth. The pooling layer is implemented by performing measurements on some of the qubits and applying unitary rotations to the nearby qubits. The rotation operation is determined by the observations on the qubits. This combines the functionality of dimensionality reduction (the output of is of lower dimension) as well as non-linearity (due to the partial measurement of qubits). After the required number of blocks of convolutional and pooling unitaries, the unitary implements the fully connected layer. A final measurement on the output of yields the network output.
Similar to classical CNNs, the overall architecture of the quantum CNN is user-defined, whereas the parameters of the unitaries are learned. The parameters are optimized by minimizing a loss function, for example by using gradient descent using the finite difference method described in Section 4.4. Cong_2019 demonstrate the effectiveness of the proposed architecture on two classes of problems, quantum phase recognition (QPR) and quantum error correction (QEC).
More recently, Kerenidis2020Quantum identify the relation between convolutions and matrix multiplications, and propose the first quantum algorithm to compute the forward pass of a CNN as a convolutional product. They also provide a quantum back propagation algorithm to learn network parameters through gradient descent. In an application of CNNs, journals/pr/ZhangCWBH19 and journals/corr/abs-1901-10632 propose special convolutional neural networks for extracting features from graphs, to identify graphs that exhibit quantum advantage.
5.2 Hybrid CNNs
henderson2019quanvolutional introduce the quanvolutional layer, a transformation based on random quantum circuits, as an additional component in a classical CNN, thus forming a hybrid model architecture. Quanvolutional layers consist of multiple quantum filters, each of which takes a matrix of 2D values as input, and outputs a single scalar value. Similar to convolutional filters, the operations are iteratively applied to subsections of the input. Each quantum filter is characterized by an encoder, random circuit, and decoder, where the encoder converts the raw input data into an initialization state to be fed into the random circuit and the output from the circuit is fed to the decoder which yielding a scalar value. henderson2019quanvolutional do not present a learning methodology to optimize the random circuits since the quanvolutional layer has no learnable parameters. However, the experimental results suggest that the quanvolutional layer performed identically to a classical random feature extractor, thus questioning its utility.
5.3 Quantum RNNs
There has also been several interesting suggestions to the front of developing quantum variants of recurrent neural networks. hibatallah2020recurrent propose a quantum variant of recurrent neural networks(RNNs) using variational wave-functions to learn the approximate ground state of a quantum Hamiltonian. roth2020iterative propose an iterative retraining approach using RNNs for simulating bulk quantum systems via mapping translations of lattice vectors to the RNN time index. Hopfield Networks hopfield-neural-networks-and-1982 were a popular early form of a recurrent NN for which several works Rebentrost_2018; Tang_2019; Rotondo_2018 have proposed quantum variants.
6 Quantum inspired Classical Deep Learning
Quantum computing methods have been applied to classical deep learning techniques by several researchers. journals/corr/AdachiH15
suggest a quantum sampling-based approach for generative training of Restricted Boltzmann Machines, which is shown to be much faster than Gibbs sampling.C6SC05720A use quantum mechanical (QM) DFT methods to train deep neural networks to build an molecular energy estimating engine. journals/ijon/LiXCJ19
propose to use quantum based particle swarm optimization to find optimal CNN model architectures.da_Silva_2017 propose a quantum algorithm to evaluate the performance of neural network architectures. Behera2004 use a quantum RNN variant to simulate a brain model, and use it to explain eye tracking movements.
Natual Language Processing
Clark2008ACD; coecke2010mathematical introduce a tensor product composition model(CSC) to incorporate grammatical structure into algorithms that compute meaning. Zeng_2016 show the shortcomings of the CSC model with respect to computational overhead and resolve it using QRAM based quantum algorithm for the closest vector problem.
suggest a language modelling approach inspired from the quantum probability theory which generalizesSordoni_2014. Zhang_2018 present an improved variant of the quantum language model which has higher representation capacity and can be easily integrated with neural networks.
Galofaro_2018 tackle the problem of typification of semantic relations between keyword couples in hate and non-hate speech using quantum geometry and correlation. Li_2018 utilise the Hilbert space quantum representation by assigning a complex number relative phase to every word and use this to learn embeddings for text classification tasks. oriordan2020hybrid recently present a hybrid workflow toolkit for NLP tasks where the classical corpus is encoded, processed, and decoded using a quantum circuit model.
Quantum computing and deep learning are two of the most popular fields of research today. In this work, we have presented a comprehensive and easy to follow survey of the field of quantum deep learning. We have summarized different schemes proposed to model quantum neural networks (QNNs), variants like quantum convolutional networks (QCNNs) and the recent progress in quantum inspired classic deep learning algorithms. There is a tremendous potential for collaborative research at the intersection of the two fields, by applying concepts from one to solve problems in the other. For example, Levine_2019 demonstrate the entanglement capacity of deep networks, and therefore suggest their utility for studying quantum many-body physics.