I Quantum Annealing and the Ising Minimization Problem
i.1 Quantum Annealing
D-Wave quantum processing units (QPUs) act as heuristic solvers for the Ising minimization problem, defined on a graph as follows. Given an input, or Hamiltonian, comprising a collection of fields and couplings , assign values from to spin variables so as to minimize the energy function
The Ising minimization problem is NP-complete Barahona (1982); Ref. Lucas (2014) is a useful guide containing reductions from many NP-complete problems to Ising minimization. Ising minimization inputs can trivially be converted to/from maximum weighted 2-satisfiability (MAX W2SAT) inputs or quadratic unconstrained binary optimization (QUBO) inputs.
Ising minimization problems can be approached using the simulated annealing (SA) metaheuristic Kirkpatrick et al. (1983)
. In simulated annealing, a random walk in the solution space is performed according to a temperature parameter—at high temperatures the random walk can easily move to higher-energy states and therefore performs a broad exploration of the solution space, whereas at low temperatures the random walk struggles to move to higher energy states and tends to settle in local optima. If the temperature is lowered slowly enough through the course of the anneal, the random walk is able to find a ground state (global optimum) with high probability; however, this may require the cooling time to grow exponentially with the input size.
D-Wave QPUs use a fundamentally different source of non-determinism, originating from quantum fluctuations, for heuristically finding high quality solutions of the Ising problems. The strength and the rate of change in such quantum fluctuations is controlled by a schedule known as quantum annealing (QA) Kadowaki and Nishimori (1998). QA is similar in spirit to simulated annealing (SA), but rather than lowering the temperature over the course of the anneal, a transverse field222The transverse field controls the rate of transition between states due to quantum fluctuations. is applied strongly at first and is turned down gradually. Meanwhile, the classical problem Hamiltonian, which is scaled down to zero at the beginning of the anneal, is gradually scaled up. The time-dependent Hamiltonian is given by the equation:
where is a normalized time variable ranging from at the beginning of a quantum anneal to at the end, and and are Pauli matrices acting on the qubit. The functions and provide the quantum annealing schedule, analogous to the cooling schedule in simulated annealing.
In forward annealing, the system’s initial state is the uniform superposition over all states, which is a ground state of . If the anneal is performed slowly enough in an ideal system, the system will remain in a ground state throughout the anneal; any ground state of the final Hamiltonian is a ground state of .
Each D-Wave QPU implements the quantum annealing algorithm in hardware and operates on Ising minimization inputs. The graph of an input is defined by the hardwired qubits and couplers and the input Hamiltonian is programmable. The native connectivity topology for the D-Wave 2000Q QPU is based on a Chimera graph333A Chimera graph consists of unit tiles arranged in a square lattice, with each unit tile containing four horizontal qubits and four vertical qubits connected as a complete bipartite graph. See Ref. Chi for details. containing up to vertices (qubits) and edges (couplers). In each QPU, some of the qubits and couplers may be unusable due to issues in fabrication, cooldown, or calibration; the QPU used for this study had 2013 working qubits and 5818 working couplers. Its operating temperature was approximately 12 mK (millikelvin).
i.2 Reverse Annealing
Reverse annealing is very similar to forward annealing, though instead of starting with a quantum state (such as the uniform superposition), it starts with a classical state provided by the user. Reverse annealing also uses a very different annealing schedule. Rather than starting with a high transverse field and a low scaling factor for the problem Hamiltonian , these values start at and , respectively. Examples of forward and reverse annealing schedules are given in Figure 2.
In the reverse annealing schedule shown on the right of Figure 2, the anneal starts with the fully classical Hamiltonian , evolves to over the first 2 µs of the anneal, holds the Hamiltonian for 6 µs, then spends the final 2 µs of the anneal evolving back to the final, fully classical Hamiltonian .
We can also define this schedule with a more convenient parameterization: a 10 µs reverse anneal with a pause location and a pause fraction of 0.6. This means that the anneal spends 60% of its time at the pause location with the Hamiltonian held constant as , with the rest of its time split evenly between an evolution from to and an evolution back from to . The parameterization of (annealing time, , pause fraction) allows us to more conveniently search the parameter space while optimizing reverse annealing and algorithms that use it.
Ii Specifying a Genetic Algorithm
ii.1 Algorithm Components
To demonstrate a proof-of-concept for a quantum-assisted genetic algorithm, we need to specify the three components of the genetic algorithm, as well as choose appropriate inputs for demonstration.
Recombination: For recombination, we use isoenergetic cluster moves Houdayer (2001), also known as Houdayer moves (see Figure 3). This recombination operation takes two individuals and randomly selects a connected cluster of spins in which the individuals differ, then flips this cluster in each individual to create two new individuals. Isoenergetic cluster moves identify clusters of spins that would reasonably be grouped together, e.g., as a gene. Isoenergetic cluster moves are very effective in planar or quasi-planar Ising models Houdayer (2001); Zhu et al. (2015).
Mutation: As a mutation operator, we use reverse annealing. To mutate a state , we perform a reverse anneal initialized at state . This reverse anneal applies a transverse field to evolve the classical starting state into a quantum superposition of states, then removes the transverse field to settle on a new classical state.
Selection: For selection, we use truncation selection; i.e., we simply keep the best individuals at the end of each generation. This has the advantage of simplicity, but it can lead to loss of population diversity; we address this issue in Section II.2.
We implement the quantum-assisted genetic algorithm as described by the following pseudocode:
ii.2 Preserving Diversity
Diversity-preserving selection. In early experiments we found that recombinations made up only a very small fraction of the runtime of our hybrid GA. In principle, increasing the recombination rate should be able to improve performance with a modest runtime cost. However, in practice, increasing the recombination rate while using simple truncation selection causes rapid loss of diversity in the population, leading to premature convergence. This is a well-known problem in genetic algorithms and there are numerous techniques for preserving diversity.
One method of preserving diversity that fits naturally with Ising model inputs is implicit fitness sharing Smith et al. (1993). We use a combination of the raw Ising energy function and a shared Ising energy function; this is detailed in Appendix A. Our selection method preserves diversity while also ensuring that the lowest-energy solution is kept. With this selection method in hand, we were able to improve performance by increasing the recombination rate—instead of recombining each individual with one other individual per generation, we recombined each individual with 10 individuals. Higher recombination rates saw increased recombination costs but diminishing returns.
Premature convergence. Even with diversity-preserving selection, QAGA is susceptible to premature convergence. In other words, QAGA can get ‘stuck’, and running for more generations is unlikely to improve solutions. This is a well-studied issue in the area of genetic algorithms and the problem can be solved using alternative selection schemes Srinivas and Patnaik (1994); Pandey et al. (2014). In the interest of keeping our core algorithm simple, we instead opt to stop a run of QAGA after a fixed number of generations (in this case 50) and simply restart with a newly generated random population.
Iii Performance Comparison on Spin Glass Inputs
We tested our quantum-assisted genetic algorithm against a suite of classical algorithms consisting of simulated annealing (SA) Kirkpatrick et al. (1983), parallel tempering (PT) Swendsen and Wang (1986), and parallel tempering with isoenergetic cluster moves444Our implementation of PT-ICM performs isoenergetic cluster moves at every temperature; while there are more sophisticated implementations that skip the isoenergetic cluster moves at high temperatures Zhu et al. (2015), these techniques provide only a modest constant-factor speedup while demanding more onerous tuning of parameters. (PT-ICM) Houdayer (2001). For the quantum/hybrid algorithms, we use quantum annealing (QA) and our quantum-assisted genetic algorithm (QAGA). We optimized parameters for each of the five solvers; details are given in Appendix B.
This manuscript aims to demonstrate that a quantum-assisted genetic algorithm is a viable option in the realm of hybrid quantum/classical algorithms, and that quantum-assisted genetic algorithms bear further research and development. To facilitate our development and analysis, we use simple runtime models to estimate the amount of time required for each classical operation. This makes it possible to compare quantum and classical components in a larger hybrid algorithm. The alternative to using simple runtime models is to measure wall-clock times; however, for our prototype implementations that call the QPU synchronously, wall-clock time is dominated by network latency, waveform generation, and QPU initialization time, which are not the focus of this study. To simplify our development and analysis we account for computational effort as follows:
For Metropolis or Gibbs sweeps involved in SA, PT, and PT-ICM, we define the cost as 0.2 ns per spin flip update; for the 2000-qubit inputs considered in this section, this is roughly 0.4 µs per sweep.
For isoenergetic cluster moves either used in PT-ICM, or used as a recombination operator in QAGA, we define the cost as 0.2 ns per spin flip update; for the 2000-qubit inputs considered in this section, this is roughly 0.4 µs per isoenergetic cluster move. The work involved in this step is linear in the number of variables since we must identify the variables in which two individuals differ. Our cost model is consistent with experimental results.
For reverse anneals, we define the cost as the total annealing time (including pause time), discounting QPU programming time, readout time, and other overhead.
For PT and PT-ICM, we spend considerable computational effort to optimize the temperatures used; we do not include this parameter optimization effort in our costs.
For the classical algorithms, the runtime is dominated by the Metropolis/Gibbs sweeps. Our cost model is consistent with timings reported for highly optimized solvers Isakov et al. (2014).
iii.3 Input Classes
In this study we analyzed three different input classes, each generated randomly based on the underlying graph of the D-Wave QPU and its functional qubits used for these experiments. All inputs were generated based on the full chip, meaning that they all have roughly 2000 variables. All three of these input classes have fields (i.e., values) set to zero.
RAN1—Bimodal spin glasses. These inputs are generated by randomly assigning each coupling a value from . These inputs have been studied extensively as inputs to D-Wave QPUs Rønnow et al. (2014); Selby (2014); King et al. (2016, 2015a); Steiger et al. (2015) because they are nontrivial optimization problems that are trivial to generate. The low precision of RAN1 inputs, along with the fact that they are generated based on the physical layout of the QPU, work in the favor of D-Wave QPUs. However, high degeneracy of first-excited states can make it extremely hard for quantum annealers to find ground states for these inputs King et al. (2016).
AC3—Anticluster inputs. The inputs we studied for this section were randomly generated AC3 problems. AC3 problems are random spin-glass inputs whose inter-tile couplings are multiplied by a factor of 3. This works to combat the quasi-planarity of the Chimera topology Katzgraber et al. (2014). D-Wave QPUs perform well on these inputs relative to classical solvers when it comes to finding good approximate solutions King et al. (2015a); however, at the 2000-qubit scale, D-Wave QPUs struggle to find global optima, as we demonstrate in this section.
DCL—Deceptive cluster loops. Deceptive cluster loop inputs Mandrà and Katzgraber (2018) are a variant of frustrated cluster loop (FCL) inputs King et al. (2019). FCL and DCL inputs are generated by creating dense clusters of qubits that are treated as logical variables in a constraint satisfaction problem generated from frustrated loops Hen et al. (2015); King et al. (2015b). These inputs are designed to depend on quantum tunneling locally, and to require the solution of a constraint satisfaction problem globally. When variables in a cluster are coupled weakly enough that the ground state of the input may contain broken clusters, the clusters are considered deceptive because they can fool logical solvers. We generated the inputs per Ref. Mandrà and Katzgraber (2018) with parameters .
For each problem class, we generated 100 random inputs and ran the five solvers (QA, QAGA, PT, PT-ICM, SA) on all inputs. The putative ground state energies were determined and corroborated using many independent long runs of PT-ICM.
While we attempted to run all solvers for long enough to find ground states (and therefore estimate the time to solution (TTS)), this was impractical for some of the solvers and some of the inputs. We gave each solver roughly five minutes of computation time on each input (to be precise, µs). Some RAN1 inputs caused SA, QA, and QAGA to time out, and some AC3 inputs caused SA and QA to time out.
Figure 4 shows results of QA and QAGA, providing a comparison of the TTS distributions of the two solvers on the 100 inputs from each class. These plots show that QAGA has a significantly tighter distribution of time-to-solution. Most importantly, the TTS distribution for QAGA appears to have a much smaller (or skinnier) upper tail than the distribution for QA (see Refs. Steiger et al. (2015) and King et al. (2016) for detailed discussion of heavy tails in TTS distributions for QA on RAN1 problems).
|QAGA vs. QA||QAGA vs. PT-ICM|
Figure 5 provides scatter plots that show head-to-head comparisons of QAGA versus the two solvers of greatest interest: QA (which serves as a baseline quantum algorithm) and PT-ICM (which is the best known classical algorithm for these inputs). The left column of these plots, which shows performance of QAGA versus QA, confirms a point that is suggested in Figure 4. Namely, that QA only outperforms QAGA on the easiest of the random inputs.
In the right column of Figure 5, we see the comparison between QAGA and PT-ICM. We can see that the relative performance of these two solvers is highly dependent on the input class:
DCL inputs were easy for QAGA but still somewhat challenging for PT-ICM, with QAGA beating PT-ICM by over two orders of magnitude on average.
AC3 inputs were more challenging for both solvers. QAGA outperformed PT-ICM by roughly an order of magnitude on average, with PT-ICM being faster on only 7 out of 100 inputs.
RAN1 inputs were the hardest for QAGA but easier for PT-ICM than AC3 inputs. While QAGA performed better on most of the inputs, the time-to-solution distribution for QAGA had a longer tail, with QAGA timing out on 16 out of 100 inputs. 30 of the 100 inputs took over 1 s for QAGA, whereas only 12 of the 100 inputs took over 1 s for PT-ICM.
Wall-clock time. The main caveat regarding these results is that, while all of our solvers’ actual runtimes are slower than the hypothetical ideal runtimes reported, the difference is most extreme for our quantum-assisted genetic algorithm. There are two main types of overhead associated with the use of the QPU in a hybrid algorithm: overhead that requires exclusive access to the QPU (e.g., programming and readout time) and overhead incurred “outside” the QPU (e.g., network latency and conversion of an input Hamiltonian to analog waveforms). Both types of overhead can be reduced in the coming years through engineering improvements and by bringing the CPU or GPU running the outer genetic algorithm closer to the QPU.
Because QPU overhead will never be eliminated completely, and because the quantum annealing time itself is non-negligible, near-term hybrid algorithms will need to be asynchronous to optimize performance as measured by wall-clock times. After a mutation call is sent to the QPU, the classical part of the algorithm will continue working while it waits for the QPU response to return. How exactly this should be performed is an area of active research.
We have implemented and demonstrated the utility of a quantum-assisted genetic algorithm that uses reverse quantum annealing as a mutation operator. The results are promising, suggesting that reverse annealing is a viable mutation operator and that variants of QAGAs, or more generally, hybrid quantum-classical population-based algorithms, are a potential avenue to computational advantage. It would be conceivable that future highly-competitive heuristic optimization platforms would be required to employ quantum and classical co-processor Denchev et al. (2017) to harness the complementary nature and the interplay of quantum and classical fluctuations for faster sampling of the solution space.
- Kadowaki and Nishimori (1998) T. Kadowaki and H. Nishimori, Physical Review E 58, 5355 (1998).
- Kirkpatrick et al. (1983) S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, Science 220, 671 (1983).
- Wootters and Zurek (1982) W. K. Wootters and W. H. Zurek, Nature 299, 802 (1982).
- Denchev et al. (2017) V. S. Denchev, M. Mohseni, and H. Neven, “Quantum assisted optimization,” International Patent Application WO 2017/189052 Al (2017).
- D-Wave Systems (2017) D-Wave Systems, Reverse Quantum Annealing for Local Refinement of Solutions, Tech. Rep. 14-1018A-A (2017).
- Chancellor (2017) N. Chancellor, New Journal of Physics 19, 023024 (2017).
- Chancellor (2016) N. Chancellor, arXiv:1609.05875 (2016).
- Barahona (1982) F. Barahona, Journal of Physics A: Mathematical and General 15, 3241 (1982).
- Lucas (2014) A. Lucas, Frontiers in Physics 2 (2014), 10.3389/fphy.2014.00005, arXiv:1302.5843 .
- (10) “D-Wave QPU Architecture: Chimera,” https://docs.dwavesys.com/docs/latest/c_gs_4.html, D-Wave System Documentation.
- Houdayer (2001) J. Houdayer, The European Physical Journal B-Condensed Matter and Complex Systems 22, 479 (2001).
- Zhu et al. (2015) Z. Zhu, A. J. Ochoa, and H. G. Katzgraber, Physical Review Letters 115 (2015), 10.1103/PhysRevLett.115.077201, arXiv:1501.05630 .
- Smith et al. (1993) R. E. Smith, S. Forrest, and A. S. Perelson, Evolutionary Computation 1, 127 (1993).
- Srinivas and Patnaik (1994) M. Srinivas and L. M. Patnaik, IEEE Transactions on Systems, Man, and Cybernetics 24, 656 (1994).
- Pandey et al. (2014) H. M. Pandey, A. Chaudhary, and D. Mehrotra, Applied Soft Computing 24, 1047 (2014).
- Swendsen and Wang (1986) R. Swendsen and J. Wang, Phys. Rev. Lett. 57, 2607 (1986).
- Isakov et al. (2014) S. V. Isakov, I. Zintchenko, and M. Troyer, (2014), arXiv:1401.1084v1 .
- Rønnow et al. (2014) T. F. Rønnow, Z. Wang, J. Job, S. Boixo, S. V. Isakov, D. Wecker, J. M. Martinis, D. A. Lidar, and M. Troyer, Science 345, 420 (2014), arXiv:1401.2910 .
- Selby (2014) A. Selby, arXiv:1409.3934 (2014).
- King et al. (2016) A. D. King, E. Hoskinson, T. Lanting, E. Andriyash, and M. H. Amin, Phys. Rev. A 93, 52320 (2016).
- King et al. (2015a) J. King, S. Yarkoni, M. M. Nevisi, J. P. Hilton, and C. C. McGeoch, arXiv:1508.05087 (2015a).
- Steiger et al. (2015) D. S. Steiger, T. F. Rønnow, and M. Troyer, Physical Review Letters 115 (2015), 10.1103/PhysRevLett.115.230501, arXiv:1504.07991 .
- Katzgraber et al. (2014) H. G. Katzgraber, F. Hamze, and R. S. Andrist, Physical Review X 4, 21008 (2014).
- Mandrà and Katzgraber (2018) S. Mandrà and H. G. Katzgraber, Quantum Science and Technology 3, 04LT01 (2018).
- King et al. (2019) J. King, S. Yarkoni, J. Raymond, I. Ozfidan, A. D. King, M. M. Nevisi, J. P. Hilton, and C. C. McGeoch, Journal of the Physical Society of Japan 88, 61007 (2019).
- Hen et al. (2015) I. Hen, J. Job, T. Albash, T. F. Rønnow, M. Troyer, and D. A. Lidar, Physical Review A - Atomic, Molecular, and Optical Physics 92, 1 (2015), arXiv:1502.01663v2 .
- King et al. (2015b) A. D. King, T. Lanting, and R. Harris, arXiv:1502.02098 (2015b).
- Konak et al. (2006) A. Konak, D. W. Coit, and A. E. Smith, Reliability Engineering & System Safety 91, 992 (2006).
Appendix A Shared Ising Energy Function
Shared Ising energy. When using a high recombination rate, we use a form of implicit fitness sharing Smith et al. (1993) to maintain population diversity. In implicit fitness sharing, each “reward” component of the fitness function is shared evenly among all of the individuals that receive that reward. Using a shared Ising energy function promotes diversity by giving more weight to good characteristics that are rare in the population. For Ising models, a reward will be either a field that is satisfied or a coupling that is satisfied. In this study, all fields are set to zero in all inputs, so all rewards are satisfied couplings.
For example, if we have a population of 40 states, and there is a coupling that is satisfied by 5 of our 40 states, that coupling will contribute an energy of to the shared Ising energy of each of those 5 states, and will not affect the shared Ising energies of the other 35 states in our population. If there is another coupling that is only satisfied by one of our 40 states, it will contribute to that state’s shared Ising energy and nothing for the other 39 states.
To define a shared Ising energy function more explicitly, let be the number of individuals for which is negative, and let be the number of individuals for which is negative. Further, we define
With these definitions in hand, the formula for shared Ising energy is simply:
By contrast, in (explicit) fitness sharing, individuals that are closer than some sharing radius have their fitnesses reduced to represent competition for resources. We use implicit fitness sharing because it is simpler for our use case—implicit fitness sharing does not require the tuning of a sharing radius.
Pareto selection. Using a shared Ising energy function alone for selection would be problematic, since the objective of our optimization is Ising energy rather than shared Ising energy. In order to ensure that the best states are kept in our population555This property is known as elitism. while also maintaining diversity, we use a selection method involving two fitness functions. Specifically, we consider the Pareto frontier, a concept that is central to the field of multi-objective optimization (see, e.g., Konak et al. (2006)).
The Pareto frontier is the set of solutions such that, if a solution is in the set, no other solution is better than in both metrics. We order the elements on the Pareto frontier, considering the primary metric to be more important than the secondary metric. When the Pareto frontier is exhausted, we remove those solutions from the set being ordered and consider the Pareto frontier of the remaining solutions, effectively peeling a layer off of the two-dimensional point set. We continue iterating through the solutions in this manner until all have been ordered.
The resulting order is used to determine which solutions are kept and which are discarded—if our algorithm requires us to keep solutions, we simply keep the first according to the order described above.
The two fitness functions used in our selection method are raw Ising energy (i.e., the energy according to the input Hamiltonian), and shared Ising energy (i.e., the energy calculated using implicit fitness sharing). Each solution in the population is graded using raw Ising energy as the primary metric and shared Ising energy as the secondary metric. These two metrics are ways of quantifying, respectively, quality and valuable uniqueness.
Appendix B Hyper-parameter Optimization
In this section we discuss the hyperparameters that were optimized for each solver. To provide concrete examples, we give parameter values chosen for the AC3 problem class, though the parameters were optimized separately for each problem class. Parameters for each class were optimized on a set of 30 inputs which were not used in our main experiments.
The only parameter we optimized for QA was annealing time. We tried values of 10 µs, 100 µs, and 1000 µs. Overall TTS performance was roughly the same for 10 µs and 100 µs, but significantly worse for 1000 µs anneals. We opted to run the tests with 100 µs anneals because, relative to 10 µs anneals, using 100 µs anneals allows us to use a greater fraction of wall-clock time on actual annealing (i.e., on computation rather than overhead), making it possible to find ground states on harder inputs.
Quantum-assisted genetic algorithm
The quantum-assisted genetic algorithm has annealing parameters as well as standard genetic algorithm parameters. The annealing schedule used was crafted through trial and error on a small number of inputs; we settled on the schedule given by the (time, ) pairs ((0.0, 1.0), (1.0, 0.5), (7.0, 0.5), (10.0, 1.0)). In other words, we use a reverse anneal of 10 µs total annealing time, including a 1 µs evolution from to , a 6 µs dwell at , then a 3 µs evolution from to .
A population size of 40 was selected, along with a mutation rate of 1 mutation per individual. These were chosen primarily to align with the restrictions of a prototype feature known as batch reverse annealing. This feature allows reverse annealing from multiple initial states in a single API call; in this case the limit was 40 initial states in a single call. This feature accelerated experimentation significantly and it was therefore of practical interest to work within its limitations, i.e., to have a maximum of 40 mutations per generation. We tried using smaller populations but performance suffered. It is possible to use larger populations with a lower mutation rate, but we will save this parameter exploration for future work.
In addition to the fitness sharing method of preserving genetic diversity in the population, at the end of each generation we kept only 30 states, with 10 new random states being added to the population at the end of a generation. This was found to improve performance significantly.
The recombination rate was optimized and a value of 10 was chosen so that each individual would be in 10 recombinations per generation.
In parallel tempering, the parameters required to define the algorithm are simply the beta values, i.e., inverse temperatures, that define the replicas. For each input we pre-selected betas that yielded exchange rates roughly between 0.3 and 0.8. The number of temperatures used varied slightly across the inputs, but averaged around 60.
Note that preselecting parameters for each input goes against proper experimental methodology and could unfairly benefit the three classical solvers that use these temperatures. However, if we had selected parameters for a distribution of inputs, the parameters would likely lead to suboptimal performance on certain inputs. To guarantee that we are not disadvantaging classical solvers, we chose to preselect parameters for the classical solvers on a per-input basis.
For simulated annealing, we need to provide a temperature schedule, including the total number of sweeps per anneal. In this case we performed a grid search for the optimum number of sweeps per anneal, given that the schedule of betas is a linear interpolation of the optimal betas found for parallel tempering. The optimal number of sweeps per anneal was found to be.
Parallel tempering with isoenergetic cluster moves
For PT-ICM we can use the same temperatures we used for parallel tempering. The one additional parameter to optimize is the frequency of cluster moves. We found that performing cluster moves every 3 sweeps was approximately optimal.
Appendix C Time-to-Solution Calculation
The performance metric used in this study is median TTS, i.e., . We chose to analyze the median instead of distributional tail percentiles such as or
that are frequently used because the median is a far more stable statistic even for distribution-free estimates, i.e., sample medians. In certain cases, such as with simulated annealing where each sample can be considered a fixed-time Bernoulli trial, it is possible to define a simple model for the runtime distribution that yields reliable estimates of tail percentiles. However, we are not aware of any such simple parametric model for PT and PT-ICM.
For simple annealers, i.e., simulated annealing and quantum annealing, we model time to solution using a geometric distribution. With a probabilityof returning a global optimum in any given sample, and for a runtime of a single anneal, this gives
We performed anneals for simulated annealing and anneals for quantum annealing.
Quantum-assisted genetic algorithm
For the quantum-assisted genetic algorithm the calculation is similar but slightly different. The runtime per generation is fixed. Additionally, we can model the number of random restarts required (i.e., the number of times QAGA hits the maximum of 50 generations without finding a global optimum) using a geometric distribution.
Since we limit the number of generations we run for, we first model the number of restarts required to reach a global optimum using a geometric distribution, noting that, when the maximum number of generations is reached without finding a global optimum, the number of generations, and therefore the runtime, is fixed. We then add the sample median of the number of generations required when the generation limit is not hit, where this sample median is taken over all successful QAGA runs using given parameters on a given input.
Using the optimized parameter set, we performed 200 runs of QAGA on each input, with each run consisting of at most 50 generations.
For PT and PT-ICM, no simple parametric model exists for the runtime. We therefore simply use the sample median of 100 independent runs. On each run, either PT or PT-ICM was run with optimized parameters until a ground state was reached.