In the original quantum annealing for machine learning (QAML) algorithmnature, a training set with examples of labeled data (where
is an input vector andis a binary label for signal and background) is optimized with a set of weak classifiers , each of which gives for a signal or background prediction. For convenience, we define the variables:
In the original QAML algorithm, the following Ising model Hamiltonian is minimizedqml:
where the spin of each qubit isand a regularization constant is chosen. The strong classifier constructed by the weak classifiers is a sum of the classifiers weighted by spin, normalizing to :
This construction of a strong classifier only enables the weak classifiers to be turned on or off, creating a limitation in the strength of the final classifier.
2 QAML-Z Algorithm
By iteratively performing quantum annealing, the binary weights on the weak classifiers can be made continuous, ultimately resulting in a stronger classifier. This is achieved by performing a search on the real numbers, effectively zooming in on a region of the energy surface each iteration. Hence, we denote the zooming variant of quantum annealing for machine learning as QAML-Z. Under this reformulation, the weights of the classifiers may be extended from the set to the continuous interval , enabling the subtraction of classifiers to reduce cross-correlations between weak classifiers.
Let each qubit have a mean (starting at for all ) and let the search breadth be , where for iterations and is a free parameter. We then modify the Hamiltonian by substituting to zoom into the region of interest, shifting and narrowing the optimization problem. To derive the Ising model energy, the distance between predictions and data is minimized:
Applying the substitution, dropping terms independent of spin, and simplifying with the and variables previously defined, we find the following Hamiltonian:
This new Hamiltonian may be iteratively optimized for to update , resulting in the new strong classifier:
Since the zooming algorithm increases the possibility of overfitting, we propose a two-step randomization procedure to regularize the iterative process. After each iteration, for each qubit such that the energy worsens by the update (i.e., ), we apply the flip
with monotonically decreasing probability. Subsequently, all qubits are uniformly randomly flipped from to with probability where for all . This both prevents the strong classifier from overfitting as well as pushing it out of local minima in an annealing-inspired procedure. The functions and are specified in the SM.
To take full advantage of these continuous weights, we augment the set of original weak classifiers that returns a value in . For each , multiple classifiers are generated by shifting the threshold to round to :
where is the number of classifiers, is the offset and is the step size. Hence, the Hamiltonian is now given by:
where is iteratively optimized for to update . Similarly to before, we have defined:
3 Application of QAML-Z to the Higgs optimization problem
As an application of QAML-Z, we revisited the Higgs optimization problem.nature We implemented the QAML-Z algorithm on the programmable quantum annealer at the University of Southern California built by D-Wave Systems, Inc.dwave Because the D-Wave 2X architecture is not fully connected, we modified the Ising model to encourage sparsity for minor embedding on the D-Wave machine. In the augmentation scheme defined above, we set an offset of and a step size of . Additionally, we set the zoom parameter to perform a binary search over the real numbers. The weights of cross-terms in the matrix were pruned, keeping only the largest 5% of weights nonzero. Additionally, (polynomial-time) variable-fixing procedures in the D-Wave API were used to reduce the size of the Ising model encoded on the annealer. As in the original paper,nature each annealing consisted of averaging multiple gaugesjob2018test and excited states, where the number of gauges, number of excited states and chain strengthchains are decayed monotonically with each iteration. Due to the definition of , the marginal impact of each iteration follows an exponential decay, and thus QAML-Z was trained for only 8 iterations.
Compared to the QAML algorithm, the area under the receiver operating characteristic curve (AUROC) is significantly improved by QAML-Z on all training set sizes (Figure1). We select the best-performing traditional classifier (a deep neural network) from the original publication as a benchmark. A logistic regression directly optimizes the mean-squared error given in Eq. (4) and is thus also shown. When compared to classical simulated annealing, QAML-Z performs significantly better for all training set sizes due to its use of excited states in the neighborhood of the ground state, marking further improvement over the original QAML algorithm (Figure 2).
Since the original simulated annealing benchmark did not ensemble excited states, we report on SA-Z without excited states. However, to attempt to match the improved quantum annealing performance, we also propose SAE-Z, in which the supremum over a set of excited states is used to improve the area under ROC curve (described in Section 3 of the Supplementary Material) in the same manner as for QAML-Z and QAML. We find that SAE-Z performs statistically indistinguishably from QAML-Z, suggesting that excited states can be effectively used to generate stronger classifiers.
We find that an extension of QAML to the continuous space over a set of augmented weak classifiers yields strong classifiers that are more competitive with state-of-the-art classical techniques than previously reported.nature Although QAML-Z remains at a disadvantage to a deep neural network (DNN) for sufficiently large training sets, the performance gap between QAML and DNN has been reduced by a factor of two by applying QAML-Z. Moreover, the QAML advantage over DNN for small training sets has grown to . The extent of improvement of QAML-Z over DNN for Higgs decay classification suggests that noisy intermediate-scale quantum devices may be approaching real-world applicability in machine learning despite their limitations. Furthermore, the favorable results of zooming in on an Ising model to achieve a solution unreachable by discrete optimization provides future direction for quantum annealing applications, potentially extending to quantum machine learning algorithms beyond QAML.
Part of this work was conducted at “iBanks,” the AI GPU cluster at Caltech. We acknowledge NVIDIA, SuperMicro and the Kavli Foundation for their support of “iBanks.” This work is partially supported by DOE/HEP QuantISED program grant, Quantum Machine Learning and Quantum Computation Frameworks (QMLQCF) for HEP, award number DE-SC0019227. The work is also supported in part by the AT&T Foundry Innovation Centers through INQNET, a program for accelerating quantum technologies. The work of DL and JJ was partially supported by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via the U.S. Army Research Office contract W911NF-17-C-0050.
To motivate the QAML-Z algorithm and provide the necessary framework for its derivation, we first provide a brief description of the Hamiltonian to minimize the error between a strong classifier and a training set in the original QAML algorithm. Given weak classifiers and truth label over a training set of size consisting of labeled data , let be a strong classifier that ensembles the weak classifiers with coeffiicents . To minimize classification error, we simply minimize the distance between and :
Removing the spin-independent term and the self-spin interactions to create a problem suitable for quantum annealing, we may rewrite the Hamiltonian as follows (scaling by a factor of 2 for convenience):
To map this Hamiltonian to a continuous space, we wish to find a suitable set of where replaces each in the above Hamiltonian. By minimizing over , we may find a strong classifier given by . However, quantum annealing provides the constraint , and thus the search space needs to be split in a divide-and-conquer strategy. To probe either end of the search space at each iteration , we update by centering it around the previous mean and shifting it by the latest annealing result, yielding , where is the search breadth and is a free parameter. To provide intuition for this update rule, we consider a binary search with one qubit using this scheme, setting and . If, for instance, we find that at the first annealing, we wish to update and . If we then receive at the second annealing, we wish to update and . The weight given to each classifier must also be centered and shifted, yielding the substitution in the Hamiltonian:
The second bracketed term is constant, and thus we may remove it from the Hamiltonian. Furthermore, although we dropped quadratic self-spin terms earlier, we may recover the linear cross-terms that re-appeared after the substitution when . Simplifying, we find the following Hamiltonian:
where we have defined:
This is the Hamiltonian we encode on the quantum annealer.
2 Quantum Annealing
After augmenting the original set of 36 classifiers, there are 252 fully connected variables in the Hamiltonian. However, due to its Chimera graph architecture and the fact that only of the qubits are functional, the USC-based D-Wave 2X only has 33 fully connected logical qubits. Hence, we prune the cross-terms in the Ising Hamiltonian, retaining only the largest 5% of weights. This allows a minor embedding operation [embed1, embed2, klymko_adiabatic_2012, Cai:2014nx] in combination with the classical polynomial-time fix_variables procedure in the D-Wave API to program the problem on the quantum annealer. Each logical qubit is mapped to a chain of physical ferromagnetically coupled qubits on the D-Wave machine, where the internal coupling of each chain may be set to prevent thermal excitations and other noise from breaking the chain while still ensuring that the Hamiltonian drives the system dynamics [chains]. We set the ratio between coupling within each chain to the largest coupling in the Hamiltonian, monotonically decaying it with iteration number for the first 5 iterations before fixing it at a constant value: . Moreover, we reduce random errors on the local fields and couplers by randomizing the encoding via sign flips, annealing over gauges where also varies with iteration number. For each gauge, we sample the D-Wave annealing result 200 times and measure the qubit chains with a majority vote, using a coin-toss tie-breaker for even length chains. Additionally, we perform each annealing for s, having observed minimal variation in area under ROC curve for annealing times ranging from 5 s to 800 s.
To prevent the zooming algorithm from getting stuck in local minima and to regularize the optimization scheme for small training sets, we flip qubits at random between successive iterations through two processes: flipping qubits when an update yields a worse energy than and flipping qubits uniformly randomly. In either case, we flip to , zooming into the opposite region than the one selected by annealing. Since the first case suggests that the energy surface is more indifferent to a qubit flip, we flip qubits with a higher probability each iteration () compared to the uniform qubit flip case (
). Although we find that the heuristic of constant qubit flip probabilities provides more stability than using the Metropolis update transition probability
, it is quite possible that other update rules we did not explore (e.g., a genetic algorithm or an update rule dependent on training set size) could further improve the QAML-Z algorithm.
To select excited states for the classifier, we place two criteria: a maximum distance to the lowest-energy state found (i.e., an excited state must have an energy less than for or less than for ), and a maximum total number of excited states to be selected. To prevent an exponential increase in the tree of excited states generated by the zooming algorithm, we also decay the values of and by iteration number, setting and . We then take the supremum over the set of excited states’ background rejection values for each efficiency in the ROC curve as done in the original QAML algorithm [nature]
3 Simulated Annealing
We perform simulated annealing using the Metropolis update rule, flipping a random spin to construct a trial spin vector from the spin vector . If the energy , then the new vector is accepted with probability 1. However, if , the trial vector is accepted with probability . After randomly selecting a spin to flip times (where has spins), a sweep has been completed. The inverse temperature is stepped with a linear inverse temperature schedule from to over sweeps, incrementing the temperature by after each sweep. This process is repeated 1000 times, and the lowest-energy state is selected in the SA-Z algorithm. To assemble excited states for the SAE-Z benchmark, we perform 5000 sweeps for 5000 reads and select excited states using the same criteria as for quantum annealing, increasing the simulated annealing runtime to ensure that enough high-quality excited states are found to fill the maximum permitted number of excited states .
Temperature schedules reaching as large as 10 and performing up to 100,000 sweeps per read were found to have no significant impact on the results. Moreover, we observe that QAML-Z and SA-Z reach statistically identical lowest-energy states with 5000 sweeps and 5000 reads. On the training set, the difference in the area under ROC curve between the lowest-energy QAML-Z state and the SA-Z state is at most on the order of
with a standard deviation on the order offor all training set sizes. This suggests that both annealing methods found similar ground states at the end of the zooming procedure, although they may have taken different paths to the final state due to the insertion of the Metropolis move. Additionally, with these settings for SA, we observe that SA matches or bests QA with regards to minimum observed energy on the training set when they are each supplied identical QUBOs generated during the zooming algorithm (Figure 3). Since QAML-Z and SAE-Z achieve similar performance on the test set after ensembling excited states (Figure 2), we conclude that the zooming methodology is robust to slight differences in annealing performance.
4 Zooming and Augmentation Analysis
To provide insight into the impact of iterative zooming, we examine the normalized Ising model energy obtained purely under augmentation but without zooming:
The results are shown in Figure 4, showing a decrease in energy with iteration number as well as a decreased amount of overfitting with larger training set sizes. However, due to the different number of classifiers, the Ising model energies of augmented and non-augmented classifiers cannot be directly compared. The area under the ROC curve (AUROC) illustrates both the impact of classifier augmentation and the impact of zooming through a direct comparison to QAML (Figure 5), showing advantages in both the classifier augmentation and zooming methodologies.