Supervised learning is the most commonly applied form of machine learning. It works in two stages. During the training stage, the algorithm extracts patterns from the training dataset that contains pairs of samples and labels and converts these patterns into a mathematical representation called a model. During the inference stage, this model is used to make predictions about unseen samples. Machine learning algorithms, in general, are data hungry; their performance depends heavily on the size of the datasets used for training. However, this training process is computationally expensive, and working with large datasets requires huge amounts of computational horsepower. As the number and volume of datasets available for research and commercial purposes continues to grow exponentially, new technologies such as NVIDIA CUDA [nickolls2008scalable] and Google TPUs [jouppi2017datacenter] have emerged to enable faster processing of data. Computational speeds have been increasing rapidly over the last several decades in accordance with the observation that the number of transistors in a dense integrated circuit (IC) doubles about every two years - this is informally known as Moore’s Law [friedman2015moore]. The semiconductor manufacturing process has shrunk from 10 micrometers in 1970 to about 5 nanometers in 2020. It is believed, however, that we are nearing the limits of the processing power that classical computers can offer us [leiserson2020there]. At scales smaller than this, quantum mechanical effects come into play [sperling2018quantum], and they impose physical limitations on how small electronic components can get.
Quantum computing, on the other hand, proposes to leverage these quantum mechanical effects to carry out computation. In contrast to classical computers that operate on bits that can exist in only one out of two states at a time, quantum computers exploit the fact that quantum bits (qubits) can exist in any one of the infinite possible linear superpositions of these two states. This allows quantum computers to execute multiple paths of computation simultaneously. Quantum computers can efficiently solve computational problems which are believed to be intractable for classical computers[nielsen2002quantum]. Owing to its roots in theoretical physics, most research articles on the topic are written for physicists; this makes them difficult to access for researchers from other fields. The purpose of this paper is to give a broad overview of the synergies between quantum computing and machine learning. We briefly outline the history of quantum physics, describe the preliminaries of quantum computation, and then review the latest research on applying the principles of quantum computation to supervised machine learning. A large-scale general-purpose quantum computer does not yet exist, but restricted quantum machines capable of solving optimization problems are already being sold commercially [castelvecchi2017ibm][johnston2013d]. By eschewing results from physics that have little bearing on quantum computation and by providing additional background that may benefit the unfamiliar reader, we hope to make this introduction accessible to data scientists, machine learning practitioners, and researchers from other fields.
Quantum mechanics arose through a number of discoveries in the early twentieth century. Einstein [einstein17heuristic] explained the photoelectric effect in 1905 by postulating that light and all electromagnetic radiation is made up of discrete particles that later came to be called photons. Previously in 1865, Maxwell [maxwell1865viii] had demonstrated that electric and magnetic fields travel through space as waves. De Broglie [broglie1924xxxv] proposed in 1923 that particles can exhibit wave-like properties and waves can behave like particles. Building on his approach, Heisenberg [edwards1979mathematical] developed matrix mechanics, and Schrödinger [schrodinger1926undulatory] developed wave mechanics, both of which were later found to be equivalent. These developments laid the foundation of quantum mechanics. The equations of quantum mechanics have since been extensively tested in innumerable experiments, but even after almost a century of debate, physicists strongly disagree over how those equations should be interpreted and mapped to reality [schlosshauer2013snapshot].
Benioff [benioff1980computer] in 1980 and Feynman [feynman1999simulating] in 1982 observed that simulating the evolution of certain quantum systems may be an intractable problem that cannot be solved efficiently by computers, and yet, these quantum systems solved the problem by merely evolving thus suggesting that the evolution of quantum systems could be used as a method of computation. In 1985, Deutsch [deutsch1985quantum]
designed a universal quantum computer as a quantum-counterpart to the Universal Turing Machine. Deutsch and Jozsa[deutsch1992rapid] proposed the Deutsch-Jozsa problem for which the deterministic quantum algorithm is exponentially faster than any deterministic classical solution. Shor’s algorithm [shor1999polynomial] for factoring large integers was the first quantum algorithm that could solve a problem of practical significance faster than any classical algorithm. Grover’s algorithm [grover1996fast] showed quadratic improvement in unordered search problems. These results laid the foundation of quantum computation. Since then, quantum algorithms have been proposed for numerous areas including cryptography, search and optimisation, simulation of quantum systems, solving large systems of linear equations, and machine learning [montanaro2016quantum].
The benefits quantum computing can bring to machine learning go beyond speed-up in execution. Many tasks in machine learning such as maximum likelihood estimation using hidden variables, principal component analysis, training of neural networks etc. require optimization of a non-convex objective function. Optimizing non-convex functions is an NP-hard problem. Classical optimization methods such as gradient descent can get stuck at local minimum or saddle points and may never find the global minimum. Adiabatic quantum computers use quantum annealing to solve non-convex optimization problems by finding low-energy configurations of an appropriate energy function by exploiting tunneling effects to escape local minima[santoro2006optimization]. Methods based on Grover’s search can find the global minimum in a discrete unordered search space. Many machine learning algorithms involve repeated execution of linear algebra routines on large matrices. Quantum solutions can offer exponential speed-up for these routines [harrow2009quantum][wiebe2012quantum]. Besides optimizing subroutines used in classical machine learning algorithms, many fully quantum versions of these algorithms have also been developed.
The term quantum machine learning is generally used to denote analysis of classical data on quantum computers. This is known as quantum-enhanced machine learning. There are however other ways in which the fields of quantum computing and machine learning overlap. Classical machine learning can be applied to data emanating from quantum systems to solve problems in physics. Another stream of research deals with generalizing classical machine learning to work with quantum data where the input and output are quantum states. Recently, Tang [tang2019quantum] developed a classical algorithm for recommendation systems that was inspired by quantum computing creating a new category referred to as quantum-inspired algorithms. These are classical algorithms that can be run on conventional computers which borrow ideas from quantum computing to achieve significant theoretical speed-ups over the best prevailing classical algorithms. This paper limits itself to quantum-enhanced machine learning and presents a selection of quantum approaches for implementing supervised machine learning algorithms on quantum computers. Far from being a comprehensive review of the field, it aims to offer the reader a background on the multitude of approaches proposed over the years with enough detail to set the stage for a more detailed exploration. For additional information, we recommend the following excellent surveys and reviews: [schuld2015introduction][wittek2014quantum][biamonte2017quantum][adcock2015advances][arunachalam2017guest][kopczyk2018quantum][dunjko2018machine][dunjko2020non][montanaro2016quantum].
2 Background of Quantum Computation
Quantum mechanics is based on four fundamental postulates [dunjko2018machine][nielsen2002quantum]
: (1) the pure state of a quantum system is given by a unit vectorin a complex Hilbert space; (2) the evolution of a pure state in a closed system is governed by a Hamiltonian as specified by Schrödinger’s equation
; (3) the quantum state of a composite system is given by the tensor product of the individual systems; (4) projective measurements (observables) are specified by Hermitian operators, and the process of measurement changes the observed system fromto an eigenstate
with probability given by the Born rule. In this section, we briefly set up the background of quantum computation based on the above postulates.
2.1 Single Qubit
A classical bit can exist in one of two states denoted as 0 and 1. A quantum bit or qubit can exist not only in these two discrete states but in all possible linear superpositions of them. Mathematically, the quantum state of a qubit is represented as a state vector in a two-dimensional Hilbert space. In the Dirac notation, the state vector of a qubit is called a ket and is written as:
where and are complex numbers and . The Born’s rule tells us that if this qubit is measured, we will get with probability and with probability . Quantum measurements are non-deterministic, and the act of measurement changes the quantum state irreversibly. Before measurement, the qubit exists in a quantum superposition of the states and . The outcome of the measurement, however, is not quantum but classical i.e. you get either a or a but not a superposition of the two. During the measurement, the quantum state collapses111Different interpretations exist regarding the collapse of the quantum state [schlosshauer2013snapshot]. The popular Copenhagen Interpretation suggests that the wave function of a quantum system collapses on observation. The alternative Many Worlds Interpretation suggests that there is no collapse of the wave function; instead, the act of observation results in the observer getting entangled with the observed system. to the classical state it gets observed in, and all subsequent measurements deterministically result in this same outcome with a probability equal to 1.
The choice of basis vectors and is arbitrary. We can represent the system using a different set of orthogonal basis vectors such as and (called the Hadamard or sign basis). Once the computational basis is decided, kets can be represented as column vectors:
The two representations for given in equations (1) and (3) are equivalent.
2.2 Multiple Qubits
The quantum state of a system consisting of more than one unentangled qubits can be represented as the tensor product of the quantum states of the individual qubits. The state of a two-qubit system comprising of qubits represented by and can be written as . In general, the state of qubits can be represented as:
However, not all multi-qubit states can be represented as a tensor product of individual states. Consider the state below, one of the Bell states:
Suppose it could be decomposed into the tensor product of two states as below:
From equations (5) and (6), we know that . Therefore, either or . But, from equation (5), both and . This proves that the Bell state cannot be decomposed into the tensor product of two single-qubit states. In such cases, we say that the two qubits are entangled. Given an entangled pair of qubits, measurement on one qubit instantaneously affects the other qubit. Entanglement plays a central role in many quantum algorithms especially in the field of quantum cryptography. There is no counterpart to quantum entanglement in classical physics.
2.3 Quantum Gates
Classical computers manipulate information stored in bits using logic gates such as AND, OR, NOT, NAND, XOR etc. Likewise, quantum computers manipulate qubits using quantum gates. Transformations on quantum states are represented as rotation of the Hilbert space. Rotation is linear and reversible. Consequently, all transformations on quantum states must be linear and reversible. Quantum gates essentially transform the system from one state to another state. These transformations can be represented as matrices. The simplest quantum gate is the NOT gate. The NOT gate transforms to and can be represented as:
The Hadamard gate acts on a single qubit. It is often used to map a qubit from one of its basis states into an equal superposition of all basis states. It transforms to and to and is given by:
In general, an n-qubit Hadamard gate is used to initialize an n-qubit system into an equal superposition of all basis states:
where denotes all strings of length consisting of and .
The CNOT (controlled-NOT) gate acts on two qubits where the first qubit acts as a control signal that decides whether the NOT operation should be performed on the second qubit. If the control qubit is , the NOT operation is applied; if it is , it is not applied. The CNOT gate leaves the states and unchanged, while it maps to and to . It is represented as:
The SWAP gate swaps the states of two qubits transforming to . The CSWAP (controlled-SWAP) gate acts on three qubits and swaps the state of the second and third qubit if the first qubit is . The Toffoli gate (CCNOT) acts on three qubits and performs the computation:
2.4 Quantum Parallelism
While classical computers can execute only one computational path at a time, quantum computers can leverage the ability of quantum states to exist in superpositions to simultaneously execute multiple computational paths. For example, consider the classical function . The function takes two bits as input and outputs a single bit. To evaluate on all two-bit permutations using classical computation, we need to call four times: , , , and . Quantum superposition allows us to evaluate all four inputs in a single call to .
Since quantum transformations must be reversible and is not reversible, we define a reversible quantum function:
The input is set up in a superposition of states by initializing two qubits to and applying the Hadamard transform:
Setting , we apply as follows:
Thus, with a single application of , we simultaneously evaluate four inputs. Using the Hadamard transform the set up in the input in an equal superposition of all basis vectors is a useful starting point for many quantum algorithms.
2.5 No Cloning Theorem
An important result that has profound implications is the no cloning theorem [wootters1982single] which states that it is not possible to create a copy of an unknown quantum state. Since measurement irreversibly changes the quantum state, given a single copy of the state , the values of the amplitudes and cannot be exactly determined. Although quantum parallelism can be leveraged to simultaneously execute multiple computational paths, the no cloning theorem places restrictions on the amount of information one can extract from the final quantum state222Although perfect cloning is impossible, Buzek and Hillery [buvzek1996quantum] proposed a universal cloning machine that can make imperfect copies of unknown quantum states with high fidelity..
2.6 Adiabatic Quantum Computation
Numerous models have been proposed for quantum computing such as the quantum Turing machine, quantum circuit model, adiabatic quantum computing, measurement-based quantum computing, blind quantum computing, topological quantum computing etc. [dunjko2018machine]. All these models are computationally equivalent, but they are implemented very differently. An approach that has shown promise in solving optimization problems is adiabatic333In the context of quantum computing, an adiabatic process is a process which changes the state of a system so gradually that the state can adapt its configuration at each point. quantum computing [farhi2000quantum] and is of particular interest because building restricted quantum computers to perform quantum annealing (section 3.2) based on adiabatic quantum computing is simpler than building universal quantum computers. In adiabatic quantum computing, the optimization problem to be solved is encoded as a boolean satisfiability problem such that the ground state of its Hamiltonian444The Hamiltonian operator represents the total energy of the system. represents the desired solution. The quantum system is initially set up with a simple Hamiltonian that is easy to construct. The system is then evolved from the initial state to a final state. The adiabatic theorem states that if the system is evolved slowly enough, it will remain in the ground state of the instantaneous Hamiltonian throughout the evolution. The final system configuration then represents the solution to the optimization problem.
3 Quantum Machine Learning
Quantum computing methods for machine learning can be divided into two broad classes: (1) methods designed to run on a universal quantum computer that involve preparation, storage, and processing of quantum states and the retrieval of the classical solution from these states; (2) methods designed to run on quantum annealers that solve optimization problems through the physical evolution of quantum systems according to the principles of adiabatic quantum computing. In section 3.1, we describe common subroutines designed for circuit quantum computers that can be applied to machine learning problems. In section 3.2, we present quantum annealing that can solve quadratic unconstrained binary optimization (QUBO) problems. Finally, in section 3.3, we present quantum versions of selected classical machine learning algorithms.
3.1 Important Subroutines of Quantum Algorithms
A straightforward approach to achieving speed-ups over classical machine learning algorithms is to identify their computationally expensive and frequently executed subroutines and develop quantum alternatives for them. In this section, we describe some common subroutines that form a part of many quantum learning algorithms.
3.1.1 Quantum Encoding
Qubits are a scarce and expensive resource. The restricted physical implementations of quantum computers available today have very few qubits555The recent quantum supremacy experiment conducted by Google used only 54 qubits [arute2019quantum].. An important question therefore that has implications for performance and feasibility is how to represent classical data in quantum states. Suppose we have a dataset of instances:
where each is a real number. In basis encoding, each instance is encoded as where is the binary representation of . The dataset can then be represented as a superposition of all computational basis states:
In amplitude encoding, data is encoded in the amplitudes of quantum states. The above dataset can be encoded as:
Besides basis and amplitude encoding, many other methods of encoding exist such as Qsample encoding, dynamic encoding, squeezing embedding, displacement embedding, Hamiltonian encoding etc. [schuld2019quantum][lloyd2020quantum].
3.1.2 Grover’s Algorithm and Amplitude Amplification
Grover’s algorithm [grover1996fast] is a quantum search algorithm that offers a quadratic speed-up over classical algorithms when performing a search over an unstructured search space. Suppose we are given a set of elements where and a boolean function such that:
Any classical algorithm that performs a search for in is in time. Grover’s algorithm can perform such a search in . The algorithm has three steps to it.
In the first step, a quantum state is set up in an equal superposition of basis states using the Hadamard transform. As an example, consider . We set up the state using 3 qubits as:
The second step referred to as phase inversion deals with flipping the amplitude of each if and leaving it unchanged if . To do this, we define a unitary quantum oracle . Suppose, in our example, is present at the fourth position. Applying gate on gives us:
The third step referred to as inversion around the mean involves flipping all amplitudes around their collective mean . This is performed by the Grover diffusion operator :
Applying to gives us:
Thus, after one iteration, the amplitude of the target element is higher than the amplitudes of other elements. If we were to measure the system at this point, we would get the target element as outcome with a probability of . The second and third steps are repeated times to maximize this probability. After the second iteration, we get the below state which will find the target element with a probability of :
The same algorithm can also find matching entries instead of a single entry. Several modifications have been proposed that extend this work. Durr and Hoyer [durr1996quantum] propose a quantum algorithm to find the index of the minimum value from a list of values in time with a probability of at least . These methods generalize Grover’s search and are collectively referred to as amplitude amplification techniques [brassard1997exact].
3.1.3 Calculating Inner Products using Swap Test
The swap test [buhrman2001quantum] is a simple subroutine used to compute the overlap between two quantum states and . Quantum procedures can be easily described using circuit diagrams. The circuit diagram of the swap test is shown in figure 1.
The system is initially prepared in the state . The Hadamard gate applied on the first ancilla qubit666Input qubits that do not hold any input data but are added to satisfy other conditions (most often reversibility of the transformation) are called auxiliary or ancillary qubits. transforms the state to . The CSWAP further transforms it to . After the application of the second Hadamard gate to the first qubit, the state can be written as . The probability of measuring the first qubit as is given by . If and are equal, , and the observed value of is 1. If and are orthogonal, , and the observed value of is . The degree of overlap given by the inner product of the two states can be estimated with this method to precision using copies [dunjko2018machine].
3.1.4 Solving Systems of Linear Equations (HHL)
Solving a system of linear equations is an important problem ubiquitous throughout science and engineering. A seminal result in quantum computation is the HHL algorithm [harrow2009quantum] that solves the following problem: given a Hermitian matrix and a unit vector , find a solution vector that satisfies the equation .
We present here a condensed outline of the algorithm. The solution we are interested in is . Let
be the eigenvectors of
with corresponding eigenvalues. The vector is encoded using amplitude encoding (section 3.1.1) as . Hamiltonian simulation is used to transform the matrix into a unitary operator, and quantum phase estimation777The quantum phase estimation algorithm can estimate the phase (or eigenvalue) of an eigenvector of a unitary operator. is used to carry out eigendecomposition to get the state . An ancilla qubit is added, rotation conditioned on is carried out, and the eigenvalue register is uncomputed to yield a state proportional to .
It is important to note that while a classical algorithm finds all coefficients for , the HHL algorithm finds the quantum state . Obtaining values for all takes repetitions; this observation nullifies the speed-up the quantum algorithm has over the classical counterparts. Hence, the HHL algorithm is most useful when used as a subroutine carrying out an intermediate step in a larger process where the quantum state is consumed by the next subroutine in the process.
3.1.5 Quantum Random Access Memory
Most quantum algorithms assume parallel access to amplitude encoded states; this is performed using a quantum random access memory or QRAM [giovannetti2008quantum]. The classical RAM takes a memory address as input and returns the data stored at that address. A QRAM performs a similar operation using qubit registers. The input register contains a superposition of addresses and the output register contains a superposition of the data at those addresses . While a classical RAM queries addresses in , QRAM performs the operation in .
3.2 Quantum Annealing
Quantum annealing is a metaheuristic optimization algorithm that leverages quantum effects to solve quadratic unconstrained binary optimization (QUBO) problems that deal with optimizing functions of the form:
where , , . A wide range of problems can be mapped to QUBO and then solved by quantum annealers which are special-purpose quantum computers specifically built to perform quantum annealing.
The Ising model is used in physics to represent a large variety of systems. It was originally proposed to model magnetic materials where every molecule has a spin that can align or anti-align with an applied magnetic field [bian2010ising]. The Hamiltonian of the system representing its energy is given by:
where is the spin of the molecule, is the strength of the magnetic field at the molecule, and is the strength of the interaction between neighboring spins and . From equations (24) and (25), it can be seen that the QUBO problem convenient maps onto the Ising Hamiltonian with the mapping .
Quantum annealing works as follows. An initial Hamiltonian that is easy to construct is chosen. The system is evolved under a time-dependent Hamiltonian given by:
where is gradually changed from to and the final Hamiltonian is the same as in equation (25). At , the system starts in the ground state of . According to the quantum adiabatic theorem, the system remains in the ground state of the instantaneous Hamiltonian throughout its evolution provided it is changed sufficiently slowly [ambainis2004elementary]. At , the final Hamiltonian of the system will encode the solution to the problem.
Quantum annealing should not be conflated with the more general adiabatic quantum computing. Quantum annealing specifically solves optimization problems; adiabatic quantum computing is a model of quantum computing that is equivalent to a universal quantum computer. The Hamiltonians used in quantum annealing are classical Hamiltonians, while adiabatic quantum computing uses quantum Hamiltonians that have no classical counterparts [biswas2017nasa]. For a more comprehensive treatment of adiabatic quantum computing, we refer the reader to [albash2018adiabatic].
3.3 Quantum Algorithms for Machine Learning
In this section, we explore how the background and subroutines described in the previous sections can be applied to solve machine learning problems. The common supervised machine learning setting is as follows. The training set consists of instances with corresponding labels . Each is represented by an N-dimensional feature vector . Each label can be either a real value (for regression problems) or a discrete class label (for classification problems). In the training phase, the algorithm extracts patterns from the dataset and learns a model. In the inference phase, this model is used to process unseen instances and predict their corresponding labels.
3.3.1 k-Nearest Neighbors
The k-nearest neighbors (KNN) is one of the simplest supervised learning algorithms. To predict the label of a new unseen instance, the algorithm looks at instances in the training set that are closest to and chooses the class that appears most often in the labels of these k-nearest neighbors as the predicted label (for regression, the algorithm assigns the mean value of the k-nearest neighbors as the label). An advantage of KNN is that, unlike many other supervised algorithms, it is non-parametric and makes no assumptions about the data distribution. However, since all computation is deferred, inference can become prohibitively expensive for large training sets. During inference, the distance of the test instance from all other training instances is calculated; this is the most computationally intensive step in the process. Hence, quantum versions of KNN focus on faster evaluation of the distance between two instances.
Aïmeur et al. [aimeur2006machine] propose using the overlap as computed by the swap test (section 3.1.3) as a measure of similarity between and . Llyod et al. [lloyd2013quantum] develop a technique based on the swap test to compute the distance between and . A state is constructed by setting up an ancilla. A second state given by is constructed where . Using , the authors make the observation that . With this, the distance between and can be retrieved using a swap test [schuld2015introduction]. The authors use the above technique to implement the nearest-centroid classification algorithm in which the centroids of all training instances belonging to each class are precomputed; during inference, a class label is assigned to the test instance by calculating its distance from all centroids. Given a training set of
instances, this procedure solves the problem of classifying an-dimensional test vector into one of several classes in compared to required by classical algorithms.
Wiebe et al. [wiebe2014quantum] argue that the nearest-centroid classification presented above can perform poorly in practice since training instances are often embedded in complicated manifolds, and the centroids may lie outside these manifolds. They propose two fast methods for computing the distance between vectors based on an alternative representation of classical information in quantum states, amplitude amplification, and Durr-Hoyer minimum finding [durr1996quantum].
3.3.2 Support Vector Machines
Support vector machines (SVM) [cortes1995support]
is a popular classification algorithm that determines the optimal hyperplane that separates instances of two classes in the training data and classifies test instances based on which side of the separating hyperplane they lie on. Given training instanceswhere and , the algorithm learns an -dimensional hyperplane given by that separates the instances of the two classes with maximum margin. Classification of a test instance is performed as:
Since the instances may not be strictly separable, slack variables are introduced that provide a soft margin by allowing some data points to violate the margin criterion. This formulation is known as the maximum margin classifier or support vector classifier; this however still requires the data points to be linearly separable. SVMs overcome this limitation of linear separability by what is known as the kernel trick which transforms the feature space into a new, higher-dimensional feature space. Instances that were not linearly separable in the original feature space may be linearly separable in the new feature space. Mathematically, the kernel trick generalizes the dot product between two feature vectors by a kernel function . Different kernels can be chosen depending on the data distribution. Training an SVM involves solving the quadratic programming problem [press2007numerical]:
where , is the regularization parameter, and is the kernel function.
Anguita et al. [anguita2003quantum] observe that training support vector machines may be a hard problem and propose a quantum variant that uses Durr and Hoyer’s minimum finding [durr1996quantum] based on Grover’s algorithm to solve the optimization problem. Rebentrost et al. [rebentrost2012quantum] suggest computing inner products using a quantum method based on an approach similar to the one discussed in section 3.3.1 which leads to an exponential speed-up with respect to the dimension of the feature vector . They also describe a least-squares reformulation of the SVM algorithm with slack variables that converts the quadratic optimization problem into a problem of solving a system of linear equations which leads to an additional exponential speed-up in terms of the number of training instances :
As shown in section 3.2, quantum annealing is particularly well-suited for solving optimization problems. Willsch et al. [willsch2020support] demonstrate a practical implementation of training SVMs on the commercially available D-Wave DW2000Q quantum annealer by formulating it as a QUBO problem.
3.3.3 Neural Networks
Artificial neural networks or simply neural networks were originally inspired from biological neural networks that model the activity of neurons in human brains. The basic building block of a neural network is the neuron, also called node or perceptron, that maps the inputto the output as follows888
This is the general form used in modern feedforward neural networks. The original perceptron used a step activation function and produced only binary outputs 0 and 1.:
where , , and is the activation function. The outputs of some neurons can be fed as inputs to other neurons thus creating layers within the network.
Even though neural networks were amongst the first machine learning algorithms to be proposed, their research stagnated for several decades from 1940s to 1980s due to the inherent difficulty and large computational power required to train them. They returned to popularity after the introduction of backpropagation[rumelhart1986learning] which eased these problems by offering a faster method for training. In the last decade, with GPUs and cloud computing providing cheaper access to massive computational power, neural networks have dwarfed other learning algorithms999Many informal texts now relegate all other learning algorithms to the category conventional machine learning.
to become one of the biggest success stories of modern computers finding applications in various industries including healthcare, manufacturing, finance, analytics etc. to solve problems in image processing, computer vision, natural language processing, predictive modeling, and many other areas. However, even today, significant resources are required to train neural networks, and training times for research and industrial problems can run into weeks or even months.
An obvious difficulty arises in considering quantum computation as a means for implementing neural networks. Quantum computation (and indeed quantum mechanics itself) is a theory fundamentally based on linear transformations, while an important practical advantage neural networks enjoy over many other learning algorithms is that they can model non-linear data distributions. Bringing non-linearity into quantum algorithms is a non-trivial task[cao2017quantum]. However, classical neural networks do make heavy use of linear algebra, and the inherent randomness of quantum mechanical effects can be leveraged to automatically introduce noise in the training process to improve model robustness - something that needs to be done purposefully in classical training [allcock2018quantum]
. Numerous neural network architectures have emerged in recent times to tackle problems belonging to a wide range of supervised, unsupervised, and reinforcement learning tasks[goodfellow2016deep]. Research in this field has been scattered with different proposals addressing narrow problems in piecemeal style. We present here a select subset of these proposals.
Most early work on quantizing101010Developing a quantum alternative to a classical computation technique is often referred to as quantizing it although we prefer this term is used sparingly. neural networks focussed on Hopfield networks [hopfield1982neural] which differ from the neural networks presently used in practice. In a Hopfield network, all neurons have undirected connections with all other neurons as opposed to feed-forward networks that are organized as layers; also, each neuron outputs a binary 0 or 1 instead of a real number. Hopfield networks are used to model associative memories which allow retrieval of data based on content rather than addresses. Kak [kak1995quantum] introduces the idea of quantum neural computation and Perus [peruvs2000neural] describes a quantum associative network by drawing analogies between Hopfield networks and quantum information processing. Behrman et al. [behrman2000simulations] present a quantum realization of neural networks by showing that a single quantum dot111111Quantum dots are nanometre-scale semiconductor particles. molecule can act as a recurrent quantum neural network. More recently, Rebentrost et al. [rebentrost2018quantum] present a technique based on quantum Hebbian learning and fast quantum matrix inversion to train Hopfield networks.
Boltzmann machines [ackley1985learning]
, closely related to Hopfield networks, are stochastic generative networks that can learn a probability distribution over a set of inputs. They are trained by adjusting the interconnection weights between the neurons so that the thermal statistics of the system as described by the Boltzmann-Gibbs distribution reproduces the statistics of the data[biamonte2017quantum]
. Boltzmann machines can be conveniently represented by an Ising model whose spins encode features and interactions encode statistical dependencies between the features. In a restricted Boltzmann machine, connections exist only between neurons belonging to different layers; this makes them easier to train than fully-connected Boltzmann machines. Restricted Boltzmann machines can be stacked together to formdeep belief networks
that can be used to learn internal representations or can be trained under supervision to perform classification. Training Boltzmann machines is exponentially hard and is performed using approximation techniques like contrastive divergence[hinton2002training][salakhutdinov2009deep] that rely on Gibbs sampling. Wiebe et al. [wiebe2014quantum] propose quantum methods to efficiently train full Boltzmann machines by preparing a coherent analog of the Gibbs state from which samples can be drawn. Adachi et al. [adachi2015application] investigate an alternative approach of performing the sampling on a D-Wave quantum annealer instead of classical Gibbs sampling.
A different line of research involves developing quantum analogs for classical perceptrons. Schuld et al. [schuld2015simulating] introduce a quantum perceptron model with a step activation function that can be used to develop superposition-based learning schemes in which a superposition of training vectors can be processed in parallel. Kapoor et al. [kapoor2016quantum] develop two quantum techniques for modeling perceptrons; the first provides quadratic speed-up with respect to the number of training instances, and the second provides a quadratic reduction in the scaling of the training time with the margin between the two classes.
Feedforward networks are one of the simplest neural network architectures in which the connections between neurons do not form any loops or cycles. They are usually trained using backpropagation, and the optimization is performed using some variant of gradient descent. Most machine learning architectures used in practice today are based on feedforward networks or their derivatives such as convolutional neural networks or recurrent neural networks[goodfellow2016deep]. Allcock et al. [allcock2018quantum] define an efficient quantum subroutine for robust inner product estimation using QRAM [giovannetti2008quantum] and use it to demonstrate quadratic speed-ups in the size of the network over classical counterparts; they additionally claim that the proposed quantum method naturally imitates regularization techniques like drop-out leading to more robust networks. Farhi et al. [farhi2000quantum] present a general framework for binary classification specifically designed for near-term quantum processors in which the input strings are mapped to computational basis states, and the predicted label is given by the outcome of a Pauli operator measured on a readout qubit. This framework extends to a full quantum neural network that can classify both classical and quantum data. Convolutional neural networks (CNNs) [lecun1998] have achieved great success [krizhevsky2012imagenet] in image classification tasks in recent times. They, however, suffer from the fact that the operation of convolution is computationally expensive. Kerenidis et al. [kerenidis2019quantum] design a quantum CNN based on quantum computation of the convolution product between two tensors. They also propose a quantum tomography121212Quantum tomography is the process by which a quantum state is reconstructed using measurements on an ensemble of identical quantum states. sampling approach to recover classical information from the network output and a quantum backpropagation algorithm for efficient training of the quantum CNN.
It is important to distinguish between quantum-enhanced machine learning that focuses on techniques to implement learning on classical data using quantum computers from quantum-generalisation of machine learning algorithms that deals with developing fully quantum algorithms that work with quantum data. This is especially true for neural networks for which active research is underway on both fronts. We restrict this paper to quantum-enhanced techniques and refer the reader to [cao2017quantum][beer2020training][amin2018quantum][wan2017quantum][Cong_2019] for work on quantum generalisations. The principles of quantum computing have inspired development of new classical randomized algorithms that show exponential speed-ups over conventional algorithms [tang2019quantum]. With these quantum-inspired algorithms, the gap between certain classical and quantum algorithms is no longer exponential but polynomial. Arrazola et al. [arrazola2019quantum] provide a study of these algorithms and observe that they work well only under stringent conditions which occur rarely in practice. We do not cover these in this paper.
Quantum computation has made great strides in the last two decades in both theory and practice. A significant corpus of research has emerged in applying the principles of quantum computation to problems across many fields of science and engineering. At the same time, several approaches for physical realization of quantum computers based on superconducting quantum bits, trapped ions, optical lattices, photonic computing, nuclear magnetic resonance etc. have shown promise. However, fundamental challenges remain unresolved on both fronts. While developing quantum algorithms, we must consider the input problem and the output problem[biamonte2017quantum]. The input problem refers to the fact that the cost of reading classical data and encoding it in quantum states can sometimes dominate performance and render the further downstream speed-up irrelevant. The output problem refers to the reverse process of decoding the full classical solution from quantum states. Some important hardware challenges to constructing, operating, and maintaining large-scale quantum computers include achieving longer coherence, greater circuit depth, higher qubit quality, and higher control over qubits. Quantum error correction plays an important role, and it will likely span across hardware and software in the future.
Owing to its roots in quantum physics, research in quantum computing has so far been confined within the purview of the physics community. Although realization of quantum computers in the form of hardware will remain a problem for physicists, we believe this need not be the case when it comes to applying quantum computing to solve machine learning problems. Classical computing and machine learning, like physics and many other fields, serve as prime examples of disciplines where theoretical results were obtained far before technological progress made possible their experimental realizations. Small-scale quantum computers with less than 100 qubits and quantum annealers with around 2000 qubits have been developed and are already being sold commercially [castelvecchi2017ibm][johnston2013d]. We hope this article serves its purpose as introductory material for interested machine learning researchers and practitioners from various disciplines.