As fault intolerant quantum devices, with qubits, begin to be built, we near the dawn of the Noisy Intermediate Scale Quantum (NISQ) [preskill_quantum_2018] technology era. These devices will not be able to perform many of the most famous algorithms thought to demonstrate exponential speedups over classical algorithms [shor_polynomial-time_1997, harrow_quantum_2009]. However, they could provide an efficient solution to other problems which cannot be solved in polynomial time by purely classical means, given the limited resources of these near term devices. Showing this to be true is referred to as a demonstration of Quantum Computational Supremacy111This and similar problems are often also refereed to as quantum advantage or quantum superiority.
Many of the aforementioned proof of principle problems utilise the measurement process inherent in quantum computation by generating samples from a quantum distribution. Typically, the distribution is sampled by applying a sequence of quantum gates and measurements to some initial state. If this sequence consists of quantum gates drawn from a ‘universal’ set, we can sample from any quantum distribution. A more restricted scenario is one where we are not permitted to utilise the full suite of universal gates. The motivation here is to generate circuits that are simpler to implement experimentally on NISQ devices. Some of these simpler circuits are thought still to be usable in demonstrations of quantum supremacy, with popular examples of sub-universal models including the process underpinning BosonSampling, [aaronson_computational_2013], and Instantaneous Quantum Polynomial Time () Computations [shepherd_temporally_2009].
We will make an attempt to connect these sampling hardness results to their use in Quantum Machine Learning (QML), in more detail than has been previously attempted, [du_expressive_2018]. In this spirit we utilise a learning model called the Born Machine [cheng_information_2017, liu_differentiable_2018] to perform Quantum Circuit Learning (QCL) [farhi_classification_2018, benedetti_generative_2018] with NISQ devices. The core principle of the Born Machine is its ability to produce statistics which rely on the fundamental randomness of quantum mechanics, according to Born’s Measurement Rule for a state and measurement outcome :
By utilising quantum mechanics in machine learning we hope to be able to develop algorithms to be applied in the classical domain in areas such as accelerating recommendation systems or support vector machines[kerenidis_quantum_2016, harrow_quantum_2009, rebentrost_quantum_2014, kerenidis_quantum_2018]. One may also find new applications for QML algorithms, which exist purely in the quantum domain. One example, which we will propose in Section LABEL:ssec:automaticcompilation, is in the mimicking of target quantum circuits by ‘learning a circuit description’ using samples from its output distribution. This could be seen as an attempt to automatically compile the model circuit onto the target circuit in such a way that the outputs from the circuits are indistinguishable to a classical observer.
This contrasts with previous attempts at quantum compilation which involve directly adapting one circuit to another. We believe out method is a new way to approach the problem of compilation on quantum hardware.
It is also hoped that quantum models would achieve ‘better’ performance on certain datasets, than any purely classical one. This is motivated by the supremacy arguments mentioned above, and will provide a central theme to this work. A physical demonstration of such a task would provide a definitive separation between quantum and classical machine learning algorithms in practice. This is a more challenging task than simply demonstrating quantum supremacy by itself, or even the verification of quantum supremacy, [aaronson_complexity-theoretic_2016, bouland_quantum_2018, bremner_classical_2011, mills_information_2018] but addresses the usefulness of these near term devices. These complexity theoretic supremacy arguments are even more relevant given recent work in QML algorithm ‘dequantisations’ [tang_quantum-inspired_2018, tang_quantum-inspired_2018-1, andoni_solving_2018, chia_quantum-inspired_2018, gilyen_quantum-inspired_2018], in which quantum algorithms thought to have an exponential speedup over any classical algorithms inspired completely classical algorithms with polynomial run time. We will motivate our version of the Born Machine with complexity-theoretic arguments, to defeat obvious methods for such a dequantisation. Hence we have our first guiding principle in this work.
Base learning supremacy results on solid complexity theoretic grounds.
In contrast to previous work [liu_differentiable_2018], which shows that quantum circuits with more layers, and hence more parameters, are more expressive, we investigate NISQ devices with relatively few parameters. Specifically, we define a variant of the Born Machine, called an Ising Born Machine (IBM)222The model has no relation to the International Business Machines Corporation (IBM). from which we can recover more well known circuit classes which can theoretically demonstrate quantum supremacy and which are defined by unitary gates which are derived from an Ising Hamiltonian. Our focus on NISQ devices contrasts with those ‘coherent’ QML algorithms, such as the HHL linear equation solver, [harrow_quantum_2009], which require quantum technologies that may take many years to develop, such as Quantum Random Access Memory (QRAM) [lloyd_quantum_1999]. Our second guiding principle is the following.
Develop algorithms which consider the limitations of NISQ technology.
Our contribution in this paper can be summarised as follows:
Classical Hardness: We connect our model to quantum sampling hardness results in more detail than studied previously [du_expressive_2018, killoran_continuous-variable_2018], and also its training, which has not been previously studied.
Training Procedure: Our main contribution is to introduce new training procedures for differentiable training of quantum generative models, alternative to those proposed previously [liu_differentiable_2018, benedetti_generative_2018, du_bayesian_2018, dallaire-demers_quantum_2018] which we validate numerically to outperform the previous standard gradient based method, [liu_differentiable_2018] both on a simulator and quantum hardware.
Quantum Learning Supremacy: We propose the first definitions for what it would mean for a generative quantum model to outperform all possible classical models, for learning certain distribution classes. This provides a formulation of the idea of [rocchetto_learning_2018] to learn hard distributions.
Compilation: We propose a new viewpoint to quantum circuit compilation using classical data with our methods, with a similar mindset to other approaches, [khatri_quantum-assisted_2018, jones_quantum_2018].
This paper is organised as follows.
- Section 2:
Required terminology and background in Machine Learning and ‘Quantum Supremacy’ are introduced along with the quantum circuit classes we will use. We also provide the first definitions, to the best of our knowledge, for what we call ‘Quantum Learning Supremacy’, the ability of a Quantum algorithm to learn a distribution which is not possible efficiently classically.
- Section 3:
The Ising Born Machine is defined and related to previous work. Our first contribution, to illustrate how the underlying circuit in the model is hard to classically simulate up to a suitable notion of error, is discussed.
- Section LABEL:sec:ibmtraining:
Kernel methods are introduced, along with both ‘classical’ and ‘quantum’ kernels. We describe methods to train Born Machine models, and introduce our contributions to this area; two new cost functions leading to differentiable training of the model. These are the Stein Discrepancy and the Sinkhorn Divergence, which we argue to be ‘better’ than current approaches.
- Section LABEL:sec:ibmhardness:
Many ingredients in the training algorithm are shown to be hard for a purely classical system to perform, leveraging the hardness of the underlying quantum circuit, and relating back to Section 3.
- Section LABEL:sec:numericalresults:
Numerical results are presented to illustrate these new training techniques.
- Section LABEL:sec:applications:
Two novel applications for the Ising Born Machine are discussed. The first is automatic compilation of quantum circuits, using purely classical data, and the second is in learning hard distributions, providing a more methodological approach to fulfilling the definitions introduced in Section 2. In depth analysis of these final applications is left to future work.
We introduce some terminology and related work necessary for the reading of this paper.
2.1 Learning and Modelling
Machine learning broadly encapsulates the aspiration to be able to use algorithms to perform a task without explicit instruction, but deducing solutions using patterns or inference. Two common tasks for which ML techniques are useful are discriminative tasks, such as classification, and generative modelling. The former usually falls under the umbrella of ‘Supervised Learning’, in which an algorithm is trained using labelled data. The latter is typically used in ‘Unsupervised Learning’, in which no labels are provided, and the algorithm learns the relationships in the data by itself. The use case in this work will be parameterised generative modelling, which consists of three key components:
A phenomenon from which we can extract sample observations.
A parameterised structure which represents a characterisation of the target.
A process of updating the model parameters, based on observations of the target, by sampling from the model and the target. This is achieved by evaluating some cost or notion of ‘closeness’ and calling some optimiser to compute updates.
The goal of the training is to ensure that samples from the model match the behaviour one would expect from the target.
The particular model we shall introduce will perform a relatively newly defined paradigm known as Quantum Circuit Learning [mitarai_quantum_2018]. The general methodology for training in QCL can be broken down as follows:
- Compute Phase:
A parameterised quantum circuit is applied to an initial configuration of qubits, typically the state, resulting in a parameterised final state.
- Compare Phase:
Some information about this state is extracted, be it a series of samples via measurements in the generative case, as we explore here, or some expectation value of an observable in the classification case [farhi_classification_2018].
- Modify Phase:
Based on this information, the parameters of the circuit are updated by a classical optimiser to minimise some cost function or ‘error’. In gradient based methods, this update will depend on the gradient of the cost function.
This process is repeated multiple times, until there is some convergence or for a specified number of updates, in order to refine the model.
2.2 Classical Simulation of Quantum Computations
The central question behind Quantum Computational Supremacy is whether or not it is possible to design a classical algorithm which could produce a probability distribution, which is close to a given quantum output distribution . This notion of reproducing a quantum distribution can be formalised as classical simulation, of which there are two types. For our purposes, the more relevant notion is instead that of weak simulation, which better captures the process of sampling.
Definition 2.1 (Strong and Weak Classical Simulation).
[bremner_classical_2011, fujii_commuting_2017] A uniformly generated quantum circuit, , from a family of circuits, with input size , is weakly simulatable if, given a classical description of the circuit, a classical algorithm can produce samples, , from the output distribution, , in time. On the other hand, a strong simulator of the family would be able to compute the output probabilities, , and also all the marginal distributions over any arbitrary subset of the outputs. Both of these notions apply to some notion of error, .
As mentioned in [bremner_classical_2011], strong simulation333The suitable notion of error, , for strong simulation would be the precision to which the probabilities can be computed. is a harder task than weak simulation, and it is this weak simulatability which we want to rule out as being classically hard. The specific instances of problems which are classically hard is captured by worst case and average case hardness. Informally, worst case implies there is at least one instance of the problem which is hard to simulate. This worst case hardness holds for circuits, [bremner_classical_2011, farhi_quantum_2016], which we will illustrate in Section 2.3. A stronger notion is that of average case hardness, which has been proven for Random Circuit Sampling, [bouland_quantum_2018], and BosonSampling, [aaronson_computational_2013], but is only conjectured to hold for circuits for example.
One could ask “What if we do not care about getting samples from the exact distribution, and instead an approximation is good enough?”. Exact in this case refers to the outcome probabilities of the simulator being identical to those outputted by the quantum device; or . This is a very important and relevant question to ask when discussing quantum supremacy since experimental noise means it could be that even quantum computers cannot produce the exact dynamics that they are supposed to, according to the theory. Worse still, noise typically results in decoherence and the destruction of entanglement and interference in quantum circuit, so in the presence of noise the resulting output distribution could become classically simulatable.
We wish to have strong theoretical guarantees that experiments which claim to demonstrate supremacy, even in the presence of reasonable noise, do in fact behave as expected. Since we are dealing fundamentally with probability distributions, there are many notions of error one could choose. This question is the backbone of this work, and is extremely relevant since it provides many variants of the problem that one could work with. One of the simplest examples of which is multiplicative error.
Definition 2.2 (Multiplicative Error).
A circuit family is weakly simulatable within multiplicative (relative) error, if there exists a classical probabilistic algorithm, , which produces samples, , according to the distribution, , in time which is polynomial in the input size, such that it differs from the ideal quantum distribution, , by a multiplicative constant, :
As noted in [fujii_impossibility_2018], it would be desirable to have a quantum sampler which could achieve the bound, (2), but this is not believed to be an experimentally reachable goal444In the sense that it is not believed a physical quantum device, could achieve such a multiplicative error bound on its probabilities, relative to its ideal functionality, i.e. replacing in (2) by the output distribution of a noisy quantum device.. That is why much effort has been put in trying to find systems for which supremacy could be provably demonstrated according to the variational distance error condition, (3), which is easier to achieve on near term quantum devices.
Definition 2.3 (Total Variation () Error).
A circuit family is weakly simulable within variation distance error, , if there exists a classical probabilistic algorithm, , which produces samples, , according to the distribution, , in polynomial time, such that it differs from the ideal quantum distribution, in total variation distance, :
Intuitively, multiplicative error sampling is ‘harder’ since it must hold for all samples, i.e. the classical algorithm, , must capture all the fine features of the target distribution, . In contrast, variation distance error indicates that the distributions only have to be similar ‘overall’.
2.3 Quantum Circuit Classes
The circuit classes we introduce is strongly related to two classes, which have both had their relationship to Quantum Supremacy studied extensively [bremner_classical_2011, bremner_average-case_2016, bremner_achieving_2017, farhi_quantum_2014, farhi_quantum_2016]. Both are ‘sub-universal’, in the sense that they are not powerful enough to directly simulate arbitrary quantum computations, but are believed to achieve something outside of the classically tractable regime. They are derived from an Ising-type Hamiltonian, and differ only in the final ‘measurement’ gate applied, which is a rotation gate applied immediately preceding a measurement.
Initially, a Hadamard basis preparation is performed. This is followed by a unitary evolution by operators, each acting on the qubits in the set and described by:
This is followed by the measurement unitary which is built from single qubit gates acting on each qubit.
Where are the canonical Pauli operators, [nielsen_quantum_2010], acting on qubit . The final circuit is the following:
Sampling from distributions produced by these circuits is performed by computational basis measurements of all qubits.
We now show how the specific choices of parameters in (6) retrieve the circuit classes mentioned above.
2.3.1 Instantaneous Quantum Polynomial Time Computations (Iqp)
The first example of a sub-universal class is that of Instantaneous Quantum Polynomial Time Computations () circuits [shepherd_temporally_2009]. circuits have exactly the form of (6), but with the parameters of the measurement unitary set in the following way:
This results in a final Hadamard gate applied to every qubit, since:
Using only gates diagonal in the Pauli- basis, and thus which commute, make instantaneous but mean it is not able to a achieve the full power of universal quantum computation. However, it is still believed to be hard to classically simulate [bremner_classical_2011]:
Theorem 2.1 (informal from [bremner_classical_2011]).
If the output probability distributions generated by uniform families of circuits could be weakly classically simulated then the polynomial hierarchy () would collapse to its third level.
A collapse of is thought to be unlikely at any level, giving us confidence in the hardness of . In some sense, such a collapse to a certain level would be a generalisation of , which would correspond to a full collapse to the zeroth level.
Theorem 2.1 and similar results in [hoban_measurement-based_2014] are remarkable in their demonstration that quantum computers which are very much weaker than a universal BQP machine are still very difficult to classically simulate. In fact supremacy results of also exist in the case of the more realistic variation distance error.
Theorem 2.2 (informal from [bremner_average-case_2016]).
Assume either one of two conjectures, relating to the hardness of Ising partition function and the gap of degree 3 polynomials, and the stability of the , it is hard to classically sample from the output probability distribution of any circuit in polynomial time, up to a total variation error of .
2.3.2 Quantum Approximate Optimisation Algorithm (Qaoa)
The second well known class which can be recovered from (6) is the shallowest depth version of the Quantum Approximate Optimisation Algorithm () [farhi_quantum_2014]. The is an algorithm to approximately prepare a desired quantum state, which encodes the solution to some problem that can be extracted by measuring the final state. The canonical example is MaxCut [farhi_quantum_2014], which is an example of a constraint satisfaction problem. The QAOA is defined in terms of a ‘cost’ Hamiltonian, , and a ‘mixer’ Hamiltonian, (borrowing the terminology of [verdon_quantum_2017]). The mixer Hamiltonian is assumed to be one which has an easily prepared ground state (typically a product state), for example:
The goal of the is to produce a ground (or thermal) state of the ‘cost’ Hamiltonian, , which encodes some problem solution. This cost Hamiltonian can be exactly the exponent of the unitary in (4), where for each , :
In the most general form, a circuit consists of applying the unitaries (9) and (10) in an alternating fashion. A depth has layers of these same gate sets acting in an alternating fashion, i.e. it produces a state:
The parameters, are optimised to produce the required state, which is assumed to be difficult to prepare directly.
We are interested in the shallowest depth version of the algorithm, which produces states of the form:
Since the mixer Hamiltonian in (9) is 1-local (each term acts on only a single qubit), the evolution by the unitary
can be decomposed into a tensor product of single qubit unitaries corresponding to rotations around the Pauli-axis.
The parameters in (12) can be absorbed into the Hamiltonian parameters , and we allow to be different for each qubit. Therefore, it can be seen that this corresponds to the following setting of the parameters in (5).
is interesting for our purposes because of the following supremacy result.
Theorem 2.3 (informal from [farhi_quantum_2016]).
If we have a poly-time randomised classical algorithm that takes as input a description of and outputs a string with probability satisfying the multiplicative error bound555This form of multiplicative error is essentially the same as that in Definition (2.2).:
2.4 Supremacy of Quantum Learning
Here we provide, to the best of our knowledge, the first formalisation of what we call ‘Quantum Learning Supremacy’, specifically for distribution learning. We model our definitions around those provided in [kearns_learnability_1994], which pertain to the theory of classical distribution learnability.
Intuitively, a generative quantum machine learning algorithm can be said to have demonstrated ‘Quantum Learning Supremacy’, if it is possible to efficiently learn a representation of a distribution which for which there does not exist a classical learning algorithm achieving the same end. More specifically, the quantum device has the ability to produce samples according to a distribution that is close in total variation to some distribution, using a polynomial number of samples from that distribution. However, there should be no classical algorithm which could achieve this.
We now formalise this intuition. First we must understand the inputs and outputs to learning algorithm. The inputs are samples from the distribution to be learnt, either classical bitstrings, or which could be quantum states encoding a superposition of such bitstring states, i.e. qsamples [schuld_supervised_2018]. A generator can be interpreted as a routine that simulates sampling from the distribution. As in [kearns_learnability_1994], we will assume only discrete distribution classes, , over binary vectors of length .
Definition 2.4 (Generator [kearns_learnability_1994]).
A class of distributions, has efficient Generators, , if for every distribution , produces samples in according to the exact distribution , using polynomial resources. The generator may take a string of uniformly random bits, of size polynomial in , , as input.
The reader will notice that this definition allows, for example, for the Generator to be either a classical circuit, or a quantum circuit, with polynomially many gates. Further, in the definition of a classical Generator [kearns_learnability_1994] a string of uniformly random bits is taken as input, and then transformed into the randomness of . However, a quantum Generator would be able to produce its own randomness and so no such input is necessary. In this case the algorithm could ignore the input string .
While we are predominately interested in efficient learning with a Generator, one can also define a similar Evaluator:
Definition 2.5 (Evaluator [kearns_learnability_1994]).
A class of distributions, has efficient Evaluators, , if for every distribution , produces the weight of an input in under the exact distribution , i.e. the probability of according to . The Evaluator is efficient if it uses polynomial resources.
The distinction between and is important and interesting in this case since the output probabilities of even circuits are -Hard to compute, [bremner_classical_2011] and also hard to sample from by classical means, yet the distributions they produce can be sampled from efficiently by a quantum computer. This draws parallels to examples in [kearns_learnability_1994] where certain classes of distributions are shown not to be learnable efficiently with an Evaluator, but they are learnable with a Generator. We also wish to highlight the connections to the definitions of strong and weak simulators of quantum circuits, Definition 2.1 to reinforce the similarity between Supremacy and Learning. An Evaluator for a quantum circuit would be a strong simulator of it, and a Generator would be a weak simulator. However, we keep these definitions separate in order to connect the hardness and learnability ideas explicitly.
For our purposes, the following definitions of learnable will be used. In contrast to [kearns_learnability_1994], who was concerned with defining a ‘good’ generator to be one which achieves closeness relative to the Kullback-Leibler () divergence, we wish to expand this to general cost functions, . This is due to the range of cost functions we have access to and our wish to connect to the quantum circuit hardness results mentioned above, which typically strive for closeness in ().
Definition 2.6 (-Generator).
For a cost function, , let . Let be a Generator for a distribution . We say is a -Generator for if .
A similar notion of an -good Evaluator could be defined.
Definition 2.7 (-Learnable).
For a metric , , and complexity class , a class of distributions is called -learnable (with a Generator) if there exists an algorithm , called a learning algorithm for , which given as input, and given access to for any distribution , outputs , a -Generator for , with high probability:
should run in time .
In Definition 2.7, may, for example, be a function of the inputs to the learning algorithm. We may also wish to require a learnability definition which holds for all . This definition would, however, be too strong for our purposes. In order to claim Quantum Learning Supremacy of a Learning Algorithm, we only need to achieve closeness up to a fixed distance. This will be discussed in more detail in Section LABEL:ssec:learningharddistributions. An illustration of the procedure can be seen in Figure 2. Finally, we define what it would mean for a quantum algorithm to be superior to any classical algorithm for the problem of distribution learning:
Definition 2.8 (Quantum Learning Supremacy).
An algorithm is said to have demonstrated the supremacy of quantum learning over classical learning if there exists a class of distributions for which there exists such that is -Learnable, but is not -Learnable.
As mentioned above, a typical choice of would be , but one could imagine weaker definitions by using weaker cost functions, which will be discussed further in Section LABEL:ssec:learningharddistributions. One may also be more restrictive and look for a demonstration of Learning superiority by a class which was efficiently -Learnable, but not BPP-Learnable. This case may be more challenging to prove theoretically, but may be more amenable for the near term, precisely the original motivation for Quantum Supremacy, and, indeed, implies Definition 2.8.
3 Ising Born Machine
The Born Machine (BM) [cheng_information_2017, liu_differentiable_2018] is the natural utilisation of the measurement postulate of quantum mechanics in generative modelling and applied to QCL. In particular, the Born rule gives this generative model both its name and its sampling process, as detailed in Section 3.1.
The Born Machine definition originated from tensor network approaches to define generative algorithms and the connection between physical systems and machine learning problems [zhang_entanglement_2017, levine_deep_2017, gao_efficient_2017, liu_machine_2017, pestun_tensor_2017, han_unsupervised_2017, chen_equivalence_2018]. Since then, other works have given variants of the original definition [du_bayesian_2018], adversarial training approaches for the model [zeng_learning_2018], and adaptions to the continuous variable regime [romero_variational_2019].
It is likely, since the statistics which the Born machine produces are generated by the fundamental randomness of quantum mechanics, that there should be no classical analogue to the model. In this regard, there is hope that a model could be defined which provably cannot be simulated by classical means, and by extension, could outperform any classical model for certain learning problems, as discussed in Section LABEL:ssec:learningharddistributions.
To accommodate our requirement for the model to be implementable on NISQ devices we will restrict the Born machine, which in general could be implemented using any quantum circuit, to an Ising version. In particular, the model will be a parameterised circuit of the form discussed in Section 2.3.
The Ising Born Machine Model is the state produced by the circuit discussed in Section 2.3, where in this case we have fixed to be the set of all such that , i.e the computation consists of gates acting on either one or two qubits. The state obtained by this circuit is the following.
This restriction to two qubit gates suffices for the hardness proofs we discuss in Section LABEL:ssec:ibmcircuithardness.
The goal of the training will be to alter the parameters so that, upon measuring the state, it produces samples according to the target distribution. This will be done using QCL discussed in Section 2.1.
This approach can be compared to [farhi_classification_2018] where the authors use QCL in a classification algorithm. Typical approaches so far [mitarai_quantum_2018, liu_differentiable_2018] require making the circuit depth as large as possible and introducing extra parameters through single qubit gates. Clearly, this approach would lead to better approximations to the data since more parameters typically leads to more accurate fits.
However, the approach we will use is somewhat different. We are interested in choosing a circuit class which is as shallow as possible, but which is sufficiently complex to be hard to simulate classically. For this purpose, we choose a model which encapsulates the sub-universal circuit classes mentioned in Section 2.3. Notice, we also choose the final gate to be , defined by (5), rather than the more standard decomposition found in other Born Machine works, [liu_machine_2017, du_bayesian_2018]. Both are effectively equivalent; they can both generate any arbitrary single qubit gate (up to a phase). However, we chose our construction to make the hardness connection more transparent.
With the restrictions set discussed above, the term in the exponent of the diagonal unitary (18) can be written as an Ising Model Hamiltonian, [fujii_commuting_2017, bremner_average-case_2016]:
The parameters can be viewed as the coupling and local magnetic fields respectively. The evolved state is then:
Where refers to all parameters of the IBM circuit. To implement this circuit on NISQ hardware it is necessary to decompose the unitary, into single and two qubit gates. This is straightforward to do since all the terms in mutually commute. It is possible to find such a decomposition into two qubit
gates, defined by the unitary matrix, (21), and single qubit rotations around the Pauli- axis.
Using the decomposition from [gao_quantum_2017], the relationship between a gate between qubit and , and an Ising interaction, is as follows:
Therefore, can be expanded as follows:
Hence the specific circuit used will be given by (LABEL:circuit:isingbornmachinecircuit):