I Introduction
The computational task of sampling – that is producing independent configurations of random variables from a given distribution – is believed to be among the most challenging computational problems. In particular, many sampling tasks cannot be performed in polynomial time, unless strong and widely accepted conjectures in approximation theory are refuted
Sinclair and Jerrum (1989); Jerrum et al. (1986); Jerrum and Sinclair (1993). Consequently, stateoftheart algorithms for general purpose sampling are based on Monte Carlo methods that require significant computing resources to sample from distributions of practical interest.Gibbs distributions, i.e., distributions with the form , have close ties to modeling many physical systems as they capture the equilibrium configurations that matter takes at a given inverse temperature Gallavotti (2013). It has also been observed that Gibbs distributions provide a powerful foundation for building machine learning models and optimization algorithms Job and Lidar (2018); PerdomoOrtiz et al. (2018). However, the computational challenge of sampling from these distributions is often prohibitive for practical applications of machine learning and optimization. It has been observed that a variety of the emerging analog computing devices, including those based on optical Inagaki et al. (2016) and quantum Arute et al. (2019) technologies, can be used as fast Gibbs samplers because of the natural connection between probabilistic physical systems and state sampling. The emergence of these novel computing devices has the potential for a dramatic impact on stateoftheart approaches for physics simulation, machine learning and optimization.
From its inception, Quantum Annealing Kadowaki and Nishimori (1998); Brooke et al. (1999, 2001); Farhi et al. (2001) (QA) has been designed for solving optimization tasks and not sampling tasks. However, in practice, various nonideal properties of QA hardware platforms lead to outputs that are reminiscent of thermal Gibbs distributions Bian et al. (2010); Raymond et al. (2016); Marshall et al. (2017, 2019). Recent experiments have demonstrated that, in the operating regime corresponding to weak interactions, QA hardware behaves like a noisy Gibbs sampler Vuffray et al. (2020) at a hardwarespecific effective temperature, that is, a sampler from a mixture of Gibbs distributions with fluctuating parameters caused by noise. Furthermore, it was shown in Vuffray et al. (2020) that samples from this noisy mixture of distributions is indistinguishable from a single Gibbs distribution with spurious additions to interaction structure of the programmed model. In applications where sampling from a specific Gibbs distribution is required, this distortion of the model parameters may present an undesirable feature because of this mismatch in model structure.
Building on the insights of Vuffray et al. (2020), this work demonstrates that it is possible to leverage QA hardware to perform highquality Gibbs sampling from some types of input models. In particular, we identify a class of hardwarenative Ising models where the DWave 2000Q platform consistently produces samples with a total variation less than from a desired target Gibbs distribution. We also show that changing the annealing time enables the QA hardware to sample from a range of inverse temperatures between 1.9 and 5.1, which is an essential feature in a variety of applications. This work leverages the combination of three novel insights to achieve highquality Gibbs sampling on the QA hardware: (1) it focuses on the hardwarenative Ising models that are resilient to the hardware’s noise; (2) the input models are rescaled to avoid distortions caused by the transverse field in the computational model; and (3) both the annealing time and the model’s energy scale are leveraged to control the effective temperature of the target distribution. To the best of our knowledge, the approach outlined in this paper provides the first prescriptive procedure for conducting highquality Ising model Gibbs sampling on QA hardware without the need to tune the input model or perform postprocessing to correct for distorted output distributions. If required, both of these procedures can be leveraged to further improve the results presented in this work, as done in Bian et al. (2010); Benedetti et al. (2016, 2017).
This work begins by reviewing the computational model of quantum annealing and previous works exploring sampling applications in Section II. Building on these previous works, a new approach for conducting highquality thermal Gibbs sampling in proposed in Section III and evaluated on quantum annealing hardware. Section IV explores theoretical models to provide insights into the mechanisms that underpin the proposed sampling approach and Section V concludes with a discussion of future directions.
Ii Problem Formulation & Related Work
The foundational physical system of interest to this work is the Ising model, a class of graphical models where the nodes, , represent classical spins (i.e., ), and the edges, , represent pairwise spin interactions. A local field is specified for each spin and an interaction strength is specified for each edge. The energy of a given spin configuration is given by the Hamiltonian:
(1) 
This Ising model is widely used to encode challenging computational problems arising in the study of magnetic materials, machine learning, and optimization Hopfield (1982); Panjwani and Healey (1995); Lokhov et al. (2018); Kochenberger et al. (2014). The particular computational task of interest to this work is to produce i.i.d. samples from the Gibbs distribution associated with the Ising model at the inverse temperature^{1}^{1}1Throughout this work, is used to denote the energy scale of a Gibbs distribution and is used to indicate the effective inverse temperature of the quantum annealer. , that is,
(2) 
Throughout this work, we assume that the parameters are in the range of to , to provide a consistent scaling for . Given the computational challenge of Ising model sampling at finite temperature, our objective is to leverage QA hardware to conduct this task.
ii.1 A Brief Review of Quantum Annealing
The central idea of quantum annealing is to use the transverse field Ising model combined with an annealing process to find the lowenergy configurations of a classical Ising model. The elementary unit of this model is a qubit
described by the standard vector of Pauli matrices
along the three spatial directions . The hardware platform provides a programmable Ising model Hamiltonian on the axis Kadowaki and Nishimori (1998),(3) 
which encodes the Ising energy function (1
) with a onetoone mapping of qubits to Ising spins. Note, that the eigenvalues of the Ising Hamiltonian operator are in bijection with the
possible assignments of the classical Ising model from Eq. (1). The quantum annealing protocol strives to find the lowenergy assignments to a userspecifiedmodel by conducting an analog interpolation process of the following transverse field Ising model Hamiltonian:
(4) 
The interpolation process starts with and ends with . The two interpolation functions and are designed such that and , that is, starting with a Hamiltonian dominated by and slowly transitioning to a Hamiltonian dominated by . In the closed system setting and when this transition process is sufficiently slow, quantum annealing will find the ground states of , and therefore minimizing configurations of
, with high probability
Einstein (1914); M. Born and V. Fock (1928); Ehrenfest (1916); Kato (1950); Jansen et al. (2007). To that end, the length of the annealing process (in microseconds) is a user controllable parameter. The outcome of the quantum annealing process is specified by the binary string , where each element takes a value or and corresponds to the observation of the spin projection of qubit in the computational basis denoted by .ii.2 Quantum Annealing and Sampling
Given the energy minimizing nature of quantum annealing, it is not immediately clear how it may be applied to the sampling task posed in (2). It is reasonable to postulate that an adiabatic system would behave similarly to Gibbs sampling at the zero temperature limit (i.e., ), that is drawing i.i.d. samples from the ground states of . Interestingly, this is not usually the case as the transverse field component of the Hamiltonian, , induces biases in the annealing process to prefer some ground states over others Matsuda et al. (2009); Boixo et al. (2013); Zhang et al. (2017); Könz et al. (2019). Nevertheless, protocols have been proposed to help increase the fairness of ground state sampling using quantum annealing Kumar et al. (2020); Pelofske et al. (2021). In contrast to previous works, this work is concerned with sampling from thermal distributions (i.e., ), which requires accurate sampling at all of the energy levels of , presenting a formidable challenge in model accuracy.
Realworld QA hardware is an open quantum system that is subject to a wide variety of nonideal properties, including thermal excitations and relaxations that impact the results of the computation Albash and Lidar (2015); Boixo et al. (2016); Smirnov and Amin (2018). In this regard, the relevant adiabatic theorem is the open system adiabatic theorem Abou Salem (2007); Avron et al. (2012); Venuti et al. (2016): if the dynamics is governed by a master equation of Lindblad form Lindblad (1976) and if the annealing is done sufficiently slowly, the system will converge with high probability to the steady state of the dynamics. This result is promising because, for certain master equations, the thermal Gibbs state is the steadystate solution Davies (1974); Albash et al. (2012).
In addition to the fundamental impacts of open quantum systems, the DWave hardware documentation highlights five other sources of deviations from ideal system operations called integrated control errors (ICE) Inc. (2020), which include: background susceptibility; flux noise; DigitaltoAnalog Conversion quantization; Input/Ouput system effects; and variable scale across qubits. Consequently, it has long been observed that output distributions of the QA hardware produced by DWave Systems are reminiscent of a Gibbs distribution of the input Hamiltonian Bian et al. (2010); Benedetti et al. (2016); Raymond et al. (2016); Marshall et al. (2017, 2019) with a hardwarespecific effective temperature of Nelson et al. (2021); Vuffray et al. (2020). The prevailing interpretation of the hardware’s output distribution is the Freeze Out model, which proposes that the output reflects a quantum Gibbs distribution occurring at an inputdependent point towards the end of the annealing process where some small amount of remains Amin (2015). This model has been notably successful for training quantum machine learning models Amin et al. (2018); Winci et al. (2020) and generating statistics for quantum Gibbs distributions Harris et al. (2018); King et al. (2018, 2021). However, these quantum Gibbs distributions are inaccurate when targeting a desired classical Gibbs distribution for sampling applications Benedetti et al. (2016); PerdomoOrtiz et al. (2018); Marshall et al. (2020); Li et al. (2020). The recent insight from Vuffray et al. (2020) is that when this hardware is operated at a lowenergy scale (i.e., ) it behaves as a thermalized classical Gibbs sampler from but suffers from a notable amount of distortion from instantaneous noise in the local field parameters, , on the order of Nelson et al. (2021); Zaborniak and de Sousa (2021), resulting in socalled noisy Gibbs samples.
Inspired by the classical Gibbs sampling insights of Vuffray et al. (2020), this work demonstrates that there exists a class of Ising Hamiltonians where DWave’s 2000Q hardware produces highquality Gibbs samples, with minimal distortions from noise or transverse fields. In particular, we consider the class of with , that are representable on the DWave 2000Q hardware graph. The primary insight of this work is to operate the QA hardware at an energy scale, , that is large enough to be noiseresilient but small enough to avoid the degeneracy breaking properties of the transverse field. It is not guaranteed that such a ‘sweetspot’ exists but this work demonstrates that the range of on the platform considered achieves the desired properties. A discussion at the conclusion of the paper provides a qualitative study postulating why a sweetspot occurs at this particular energy scale. Additionally, this work shows that in the proposed energy scale the annealing time has a consistent effect of tuning the effective temperature (i.e., ) of the Gibbs samples generated by the hardware. Combined these observations present new opportunities for leveraging QA hardware for conducting the Ising model sampling task described by (2).
Iii Exploring Gibbs Sampling with Quantum Annealing Hardware
This work is concerned with the following four parameters: , the Ising model that one would like to sample from (restricted to ); , a scaling factor that will be used when programming the QA hardware; , the annealing time of the QA protocol; and , a scaling factor of the distribution output by the QA hardware. Given , the QA hardware is programmed with the rescaled model and executed with an annealing time . The empirical distribution output by the hardware is then compared to Gibbs distributions of at different effective temperatures, i.e., . We define as,
(5) 
that is, the effective temperature of the closest Gibbs distribution of to the results output by the hardware, using total variation distance (TV) as the measure of closeness. Details of the total variation metric are provided in Appendix A. The remainder of this section explores how changes in the and parameters impact the output distribution of the DWave 2000Q Quantum Annealer located at Los Alamos National Laboratory, known as DW_2000Q_LANL. The unexpected finding is that there exists range of values where the QA hardware is a highquality Gibbs sampler.
iii.1 Experiment Design
iii.1.1 Ising Model Selection
Given the challenges of sampling from embedded Ising models Marshall et al. (2020), this work focuses exclusively on Hamiltonians that have native representation in the QA hardware. In particular, the DWave 2000Q system implements a chimera graph Bunyk et al. (2014), which consists of a grid of unit cells each containing 8 qubits (4 horizontal and 4 vertical). In each unit cell, every horizontal qubit is connected to every vertical qubit through couplers and various qubits in adjacent unit cells are also connected through couplers. The coupling strength between qubits and can be programmed to values in the continuous range , and the local fields can be programmed to values in the range .
In selecting the Ising models for testing Gibbs sampling with QA, we strive to design models that exhibit a variety of sampling difficulties. Previous works have highlighted that QA will prefer some degenerate ground states over others as a result of residual effects from the annealing process Matsuda et al. (2009); Boixo et al. (2013); Zhang et al. (2017); Könz et al. (2019). As ground states are the most heavily weighted states in the lowtemperature Gibbs distribution, these asymmetries introduce significant sampling errors. In this work, we elicit this effect by constructing a set of 13 models with groundstate degeneracies ranging from 1 to 38 to capture both easy and challenging instances to sample from with a quantum annealer.
In particular, this work considers seven GSD (groundstate degeneracy) cases with degeneracy values of 2, 4, 6, 8, 10, 24, and 38, without local fields, and six additional GSDF models with 1, 2, 3, 4, 5, and 6 ground state degeneracies including local fields. Note that the GSDF models tend to have lower degeneracy because the inclusion of the fields breaks symmetries within energy levels. The specific instances are referred to as GSD# and GSDF# where the # indicates the amount of ground state degeneracy. The Hamiltonian’s of each instance and a discussion of how they were designed is provided in Appendix B.
iii.1.2 Quantum Annealing Data Collection
For each of the GSD models, , a family of hardware inputs is considered by sweeping between 0.0 and 1.0 with step size of 0.0125 below 0.1, 0.025 between 0.1 and 0.5, and 0.1 above 0.5. The increased step size for low scales is helpful as the output statistics tend to be more sensitive to in this regime. For each of these models and scales, samples are collected from the QA hardware. After every 100 samples, a gauge transform is applied in order to mitigate the effects of bias. For each gauge transformation, a randomly generated vector is used to redefine the Ising parameters via . This transformation preserves the energy eigenvalues of , but it relabels the spin configurations by . Using gauge transformations is a wellknown technique for eliminating these field biases, which can add additional unwanted asymmetries Rønnow et al. (2014); Albash et al. (2015); Vuffray et al. (2020). After the samples have been collected, the empirical probability of each state is recorded. This data collection process is repeated for annealing times of 1 , 5 , 25 , and 125 .
iii.1.3 Estimating the Effective Output Temperature
As discussed at the beginning of this section, we would like to identify the closest Gibbs distribution of to the empirical distribution output by the QA hardware, i.e., (5). However, the exact solution of the optimization task posed by (5) presents a significant computational challenge because of difficulties in computing candidate Gibbs distributions. The primary reason that this work has focused on the smallscale Ising models with only 16 spins is to make this optimization task computationally tractable. In this work, the optimization problem (5) is solved by a bruteforce calculation of the total variation for a discrete set of values. The
values are determined by calculating an optimistic estimate for
and then scaling this value by the same step sizes that are used for . The value is calculated by executing the QASA protocol proposed in Nelson et al. (2021) on each of the relevant spins and taking the average of the spinbyspin effective temperatures recovered by that protocol. Therefore, the minimum of the range is and the maximum is . The value from this discrete set that achieves the minimum total variation between the quantum annealing data and the Gibbs distribution is selected as . In the results, we observe that indicating that is a sufficiently optimistic value for this bruteforce optimization procedure in practice.iii.1.4 Bounding Finite Sampling Performance
Because the empirical distribution produced by the QA hardware is calculated from finite samples, there is an unavoidable error caused by finite sampling (details are discussed in Appendix A). To understand the impact of this finite sampling error, a lower bound is calculated by a simulation procedure that generates samples from the selected Gibbs distribution and calculates the total variation between this empirical distribution and the exact distribution. Because the samples are generated from a known source distribution, the only source of error is a result of finite sampling. Therefore, this value represents the best total variation that is achievable given the number of samples that were used in these experiments . This lower bound is included in the result figures to provide a measure of the total variation error that can be attributed solely to the limitations of finite sampling. In this work we find that this sampling issue is only significant when the input scale is low () as this regime reflects a distribution that is close to uniform and the probability mass is spread out among exponentially many states. The presence of errors caused by finite sampling motivates our use of a large number of samples (i.e., per data point) and provides an explanation for some of the consistent features in the results that are observed in the low regime.
iii.2 Experiment Results
iii.2.1 A Typical GSD Model
We begin by presenting detailed results on a characteristic GSD model and then provide a summary of the results across the complete collection of models. The total variation results for the GSD6 model are presented in Figure 1. As the value of is increased, the results show an updownup shape in the TV metric. At very low , the TV value is limited by finite sampling effects. As is increased, it quickly deviates from the family of Gibbs distributions and gradually becomes more Gibbslike. At some point, while increasing , a minimum TV value is reached and it begins to increase monotonically until the maximum value. This updownup trend is replicated across a variety of the models considered in this work and plausible causes for it are discussed in Section IV. The central observation of this analysis is that there is a band of between approximately 0.2 and 0.4, where the QA hardware’s output distribution deviates from the target Gibbs distribution by less than in total variation, which we designate as highquality. Outside of this particular range of , the total variation increases considerably, showing how the QA hardware’s distribution dramatically shifts away from the family of Gibbs distributions. Interestingly, there does not seem to be any significant dependence of the total variation on annealing time (i.e., each of the colored points in Figure 1 follow similar trends). These results suggest that this middle regime of is favorable for using QA hardware as a Gibbs sampler at each of the annealing times considered. Similar results on all of the GSD instances are provided in Appendix C.
The results from Figure 1 suggest there are operating regimes where the QA hardware’s is an accurate Gibbs sampler; however, the effective temperature (i.e. ) of the that distribution is not presented. Figure 2 explores this point by presenting the values that were found for each of the highquality input configurations considered (i.e., ). Given the prevailing theories of how QA hardware operates Amin (2015); Nelson et al. (2021), one expects that increasing the energy scale of the input or the annealing time will increase the effective temperature of the output distribution, . Indeed, that trend is observed in Figure 2, where the recovered values increase monotonically (or nearly so) with both and annealing time. This monotonicity is a particularly useful result as it suggest a simple procedure for tuning the effective temperature of the distribution output by the QA hardware platform, which is an essential feature for most practical sampling applications.
Generating Gibbs samples in the range of from 1.85 to 5.16 as shown in Figure 2 is an encouraging result as the critical points separating high and lowtemperature regimes of Ising spin glasses on lattices and random graphs typically occurs for values of Lokhov et al. (2018). Sampling from these lowtemperature regimes is known to present significant computational challenges for classical algorithms, giving hope that QA hardware may be able to provide a performance enhancement on these tasks. However, additional study is needed to determine if the results presented here can generalize to larger system sizes and to temperatures below the critical point of natively representable Ising spin glasses Katzgraber et al. (2014).
iii.2.2 Summary of the GSD Models
To test if the observations on the GSD6 instance generalize to a broader collection of models, we repeated the previous experiment on all 13 of the GSD models proposed in this work, seven GRD and six GSDF cases. Figure 3 displays the minimum total variation with respect to the ideal Gibbs distribution for each scale of and for each model. Only the 1 data is presented, but results with other annealing times are available in Appendix C. Although one can observe a variety of distinct features across these instances, the encouraging finding is that the range of within 0.2 and 0.4 consistently yields statistics with low total variation values, suggesting the observations on the GSD6 model do generalize to a broader class of models. In particular, achieves a low total variation for nearly all of the models considered.
It is important to briefly discuss the GSD2 and GSDF1 models, as these are the only ones that do not follow the updownup trajectory shown in Figure 1. In these cases, the total variation is high for low and then drops below 5% for greater than 0.2 as expected. However, the total variation remains low even for greater than 0.4. This result is because when
is large, the probability distribution is highly concentrated in the ground states and these instances do not exhibit ground state degeneracy breaking. As there is only one ground state in the GSDF case and two symmetrical ground states in the GSD case, the sampling task is relatively easy and effectively reduces to a ground state identification task. The QA hardware achieves low TV in these cases by simply outputting the ground states with high probability. Although one can consider these two cases to represent
easy sampling tasks, we include them to highlight the increased challenge that instances with more ground state degeneracy pose to using QA for sampling applications.The results from Figure 3 indicate that the proposed highquality Gibbs sampling regime generalizes to a variety of Ising models; however, the stability of the effective temperatures (i.e., ) of those distributions is an important question. Table 1 provides a summary of this information by presenting the minimum and maximum values that can are achieved on each model by varying both and the annealing time (see Appendix D for more detailed information). Although there is some variability in the largest values, the results indicate that the range of effective temperatures output by the hardware remains relatively stable and the hardware is suitable for sampling from all models between 1.85 and 3.97. These results also suggest that the the observations on the GSD6 model generalizes to a broader class of models.
Instance  min  max 

GSD2  1.32  3.97 
GSD4  1.85  4.95 
GSD6  1.85  5.16 
GSD8  1.59  4.37 
GSD10  1.32  4.76 
GSD24  1.59  4.58 
GSD38  1.59  4.76 
GSDF1  1.32  3.97 
GSDF2  1.32  4.37 
GSDF3  1.59  4.37 
GSDF4  1.59  4.58 
GSDF5  1.59  4.76 
GSDF6  1.59  4.37 
Iv Discussion
Using quantum annealing for Gibbs sampling has been proposed as early as 2010, Bian et al. (2010). Since then, several studies have indicated that the output distribution of a quantum annealer does not sample from the input Hamiltonian’s Gibbs distribution PerdomoOrtiz et al. (2018); Benedetti et al. (2016); Zhang et al. (2017); Könz et al. (2019). However, many of these works use an input scaling that is outside of the range identified by this work and are therefore subject to the distortions that we see in Figures 1 and 3. These figures show that if is too small () or too large () then QA hardware’s output distribution will differ greatly from that of a corresponding Gibbs distribution. Additionally, observing that the QA operating protocol proposed by this work yields highquality Gibbs samples in a variety of Ising models suggests a systematic phenomenon that produces this behavior.
To investigate what might account for the distortions in these Gibbs distributions and the unique features that occur in the range, we conduct a detailed study of the output statistics of a very small Ising Hamiltonian, i.e., a threespin Ising chain. With such a small system, we are able to create a theoretical model that reproduces the experimental data. This theoretical model suggests local field noise and residual transverse field effects as potential explanations for the observed distortions. The theoretical model found that the threespin system sampled from an Ising Hamiltonian with extraneous couplings when was in the low or highscaling regime. Only when remained in the range identified by this work did the system produce the desired Gibbs distribution. This result provides additional evidence that this restricted scaling regime is optimal for conducting Gibbs sampling with QA hardware.
iv.1 Threespin Ising Chain Statistics
In this experiment, three spins denoted by , , and are linked together in a ferromagnetic chain with coupled to and coupled to as shown in Figure 4. Notice that is not coupled to . Formally, the input Hamiltonian is defined as follows:
(6) 
where is the value of the coupling. is swept between 0.0 and 1.0 with stepsizes matching the discretization of from the 16spin experiments.^{2}^{2}2Note when programming a DWave system, negative values of are used to follow the ferromagnetic sign convention of that platform. For each value of , samples are collected from the QA hardware and the output statistics are recorded. Once again, in order to mitigate bias a spinreversal transform is applied after every 100 samples.
The output distribution is analyzed by solving the inverseIsing problem using the Interaction screening estimation algorithm Vuffray et al. (2016); Lokhov et al. (2018). This algorithm takes the empirical distribution produced by the hardware and estimates the Ising Hamiltonian that most likely produced the given output statistics. The estimated values for each of the couplings are denoted as and more specifically as , , and to indicate the specific edge in the threespin chain. These estimated values are compared to the corresponding input values in Figure 4(a). This procedure provides an understanding of the effective Hamiltonian output by the QA hardware, which may differ from the Ising Hamiltonian that has been programmed. The entire protocol is repeated for annealing times of 1 , 5 , 25 , and 125 with similar results. Only the data for a 1 annealing time is presented in Figure 4(a), see Appendix E for results for the other annealing times.
Because and are not coupled together in the input Hamiltonian, one expects from the reconstruction to be close to zero. However, as shown in Figure 4(a), appears to be negative and, hence, antiferromagnetic below and positive above that point. In Vuffray et al. (2020), this effect is referred to as a “spurious coupling” because it appears in the output distribution despite not being programmed in the input Hamiltonian. It is striking that this spurious coupling exactly cancels out when , which coincides with the optimal input scaling for for which almost all GSD models in Figure 3 are shown to have low total variation. Furthermore, the regime where the spurious coupling strength is low relative to the intended coupling strengths can be expanded to approximately include between 0.2 to 0.35, which again closely matches the optimal range for that was found for the 16 spin experiments. This provides additional support to the observation that the QA hardware considered here is an effective Gibbs sampler in this specific scaling range.
iv.2 A ThreeQubit Ising Chain Effective Model
To further explore a connection between these experimental observations and the spurious couplings observed in the hardware’s output (i.e., Figure 4(a)), we propose an extension of the effective singlequbit quantum model developed in Vuffray et al. (2020) to the threequbit context. The singlequbit model considered in Vuffray et al. (2020); Nelson et al. (2021) includes a transverse field with an intensity proportional to the input local field parameter, . This transverse field component was able to reproduce an observed saturation of the output field for large input values. Another important feature proposed in Vuffray et al. (2020) was qubit noise perturbing the input parameters of the model, which explained the spurious couplings in the regime of low input parameters. Building on these characteristics, we consider the following toy model on three qubits controlled by a single parameter in the absence of local fields. We assume the output distribution is a noiseaveraged thermal distribution:
(7) 
where is a threequbit Hamiltonian with independent binary qubit noise realized through binary random variables :
(8) 
Equation (8) describes a threequbit Ising chain, is a single input parameter controlling the strength of interactions, while and are constants rescaling the strength of the transverse field and qubit noise. Leveraging the parameters estimated in Vuffray et al. (2020), we select , and as typical values for these parameters. We would like to highlight that the model from Equation (8) is very different quantitatively and qualitatively from what is known as the Background Suceptibility model as discussed in Appendix F.
It was shown in Vuffray et al. (2020) that for small values of the input parameter , the negative spurious coupling can be explained by field noise on the involved qubits. In fact, it is informationtheoretically impossible to distinguish between a model with field noise and a model with the corresponding antiferromagnetic coupling. As seen in Figure 4(b), the toy model in Eq. (7) indeed predicts that when is small and thus field noise is significant relative to , a negative spurious coupling is preset. We therefore designate this lowscaling regime where as the noisy regime.
When increases, the field noise becomes less pronounced relative to . In this regime of higher input values, the model (7) predicts an emergence of a positive spurious coupling – an effect that is also observed in the experimental data. This suggests that a dependent residual transverse field on each of the qubits can account for the positive response in spurious link and the saturation of that link for large inputs. Similarly to how we designated the lowscaling regime as noisedominated, we can designate the highscaling regime as dominated by the effect of residual transverse fields.
An interesting intermediate regime predicted by the toy model as shown in Figure 5 happens around , where noise and transverse field influences cancel each other, leading to an effective absence of spurious couplings, which also occurs experimentally. This observation suggests that the optimal sampling parameter range for , referred to as the sweetspot previously in this work, may be explained by the interacting effects of noise and residual fields in the system.
V Conclusion
Inspired by recent work indicating that quantum annealing hardware behaves as a noisy Gibbs sampler in very low energy scales Vuffray et al. (2020), this work identified a new approach for mitigating the impacts of noise and conducting highquality Gibbs sampling in a range of effective temperatures for a class of small hardwarenative Ising models. This novel approach to using quantum annealing hardware for Gibbs sampling opens new opportunities for applications in machine learning and exploring the physics of condensed matter systems.
More broadly, the computational task of Gibbs sampling and the protocol developed in this work could provide a novel avenue for exploring the potential for quantum advantage in quantum annealing hardware. To that end, two followon works would be required. First, a class of Ising models needs to be identified that are challenging to sample from with classical algorithms (e.g., spinglasses) that also adhere to the criteria proposed in this work, i.e., naively representable on the hardware with . Such examples seem unlikely in the Chimera hardware architecture Katzgraber et al. (2014); Weigel et al. (2015); however, the new Pegasus hardware architecture Boothby et al. (2020) will likely provide new opportunities for identifying such models. Second, significant additional research is required to verify that the protocol proposed by this work will scale to larger systems, ideally with 100’s to 1000’s of qubits, so that more of the quantum annealing hardware can be used. This type of verification requires an amazing amount of computation and tuning of sophisticated Monte Carlo methods as there are no known efficient algorithms for generating Gibbs samples of the target distributions that would be required as a baseline of comparison. Another practical challenge is that scaling to larger systems may also yield unexpected side effects in practice, such as amplifying the role of hardware programming errors Zhu et al. (2016); Albash et al. (2019), putting an implicit limit on sampling accuracy at larger scales. Although significant followon investigation is required to more deeply understand the potential for performing thermal Gibbs sampling with quantum annealing hardware, this work provides a foundation for maximizing the performance available hardware platforms when conducting these computational tasks.
Vi Acknowledgements
This work was supported by the U.S. DOE through the Laboratory Directed Research and Development (LDRD) program of LANL under project number 20210114ER. This research used computing resources provided by the LANL Institutional Computing Program, which is supported by the U.S. Department of Energy National Nuclear Security Administration under Contract No. 89233218CNA000001. This material is also based upon work supported by the National Science Foundation the Quantum Leap Big Idea under Grant No. OMA1936388.
References

Sinclair and Jerrum (1989)
Alistair Sinclair and Mark Jerrum, “Approximate counting, uniform generation and rapidly mixing markov chains,” Information and Computation
82, 93–133 (1989). 
Jerrum et al. (1986)
Mark R Jerrum, Leslie G Valiant, and Vijay V Vazirani, “Random generation of combinatorial structures from a uniform distribution,” Theoretical computer science
43, 169–188 (1986).  Jerrum and Sinclair (1993) Mark Jerrum and Alistair Sinclair, “Polynomialtime approximation algorithms for the ising model,” SIAM Journal on computing 22, 1087–1116 (1993).
 Gallavotti (2013) Giovanni Gallavotti, Statistical mechanics: A short treatise (Springer Science & Business Media, 2013).
 Job and Lidar (2018) Joshua Job and Daniel Lidar, “Testdriving 1000 qubits,” Quantum Science and Technology 3, 030501 (2018).
 PerdomoOrtiz et al. (2018) Alejandro PerdomoOrtiz, Marcello Benedetti, John RealpeGómez, and Rupak Biswas, “Opportunities and challenges for quantumassisted machine learning in nearterm quantum computers,” Quantum Science and Technology 3, 030502 (2018).
 Inagaki et al. (2016) Takahiro Inagaki, Yoshitaka Haribara, Koji Igarashi, Tomohiro Sonobe, Shuhei Tamate, Toshimori Honjo, Alireza Marandi, Peter L McMahon, Takeshi Umeki, Koji Enbutsu, et al., “A coherent ising machine for 2000node optimization problems,” Science 354, 603–606 (2016).
 Arute et al. (2019) Frank Arute, Kunal Arya, Ryan Babbush, Dave Bacon, Joseph C Bardin, Rami Barends, Rupak Biswas, Sergio Boixo, Fernando GSL Brandao, David A Buell, et al., “Quantum supremacy using a programmable superconducting processor,” Nature 574, 505–510 (2019).
 Kadowaki and Nishimori (1998) Tadashi Kadowaki and Hidetoshi Nishimori, “Quantum annealing in the transverse ising model,” Phys. Rev. E 58, 5355–5363 (1998).
 Brooke et al. (1999) J. Brooke, D. Bitko, T. F., Rosenbaum, and G. Aeppli, “Quantum annealing of a disordered magnet,” Science 284, 779–781 (1999).
 Brooke et al. (2001) J. Brooke, T. F. Rosenbaum, and G. Aeppli, “Tunable quantum tunnelling of magnetic domain walls,” Nature 413, 610–613 (2001).
 Farhi et al. (2001) Edward Farhi, Jeffrey Goldstone, Sam Gutmann, Joshua Lapan, Andrew Lundgren, and Daniel Preda, “A quantum adiabatic evolution algorithm applied to random instances of an npcomplete problem,” Science 292, 472–475 (2001).
 Bian et al. (2010) Zhengbing Bian, Fabian Chudak, William G. Macready, and Geordie Rose, “The ising model: teaching an old problem new tricks,” Published online at https://www.dwavesys.com/sites/default/files/weightedmaxsat_v2.pdf (2010), accessed: 04/28/2017.
 Raymond et al. (2016) Jack Raymond, Sheir Yarkoni, and Evgeny Andriyash, “Global warming: Temperature estimation in annealers,” Frontiers in ICT 3, 23 (2016).
 Marshall et al. (2017) Jeffrey Marshall, Eleanor G. Rieffel, and Itay Hen, “Thermalization, freezeout, and noise: Deciphering experimental quantum annealers,” Phys. Rev. Applied 8, 064025 (2017).
 Marshall et al. (2019) Jeffrey Marshall, Davide Venturelli, Itay Hen, and Eleanor G. Rieffel, “Power of pausing: Advancing understanding of thermalization in experimental quantum annealers,” Phys. Rev. Applied 11, 044083 (2019).
 Vuffray et al. (2020) Marc Vuffray, Carleton Coffrin, Yaroslav A Kharkov, and Andrey Y Lokhov, “Programmable quantum annealers as noisy gibbs samplers,” arXiv preprint arXiv:2012.08827 (2020).

Benedetti et al. (2016)
Marcello Benedetti, John RealpeGómez, Rupak Biswas, and Alejandro PerdomoOrtiz, “Estimation of effective temperatures in quantum annealers for sampling applications: A case study with possible applications in deep learning,”
Phys. Rev. A 94, 022308 (2016).  Benedetti et al. (2017) Marcello Benedetti, John RealpeGómez, Rupak Biswas, and Alejandro PerdomoOrtiz, “Quantumassisted learning of hardwareembedded probabilistic graphical models,” Phys. Rev. X 7, 041052 (2017).

Hopfield (1982)
John J Hopfield, “Neural networks and physical systems with emergent collective computational abilities,” Proceedings of the national academy of sciences
79, 2554–2558 (1982).  Panjwani and Healey (1995) Dileep Kumar Panjwani and Glenn Healey, “Markov random field models for unsupervised segmentation of textured color images,” IEEE Transactions on pattern analysis and machine intelligence 17, 939–954 (1995).
 Lokhov et al. (2018) Andrey Y. Lokhov, Marc Vuffray, Sidhant Misra, and Michael Chertkov, “Optimal structure and parameter learning of Ising models,” Science Advances 4 (2018), 10.1126/sciadv.1700791.
 Kochenberger et al. (2014) Gary Kochenberger, JinKao Hao, Fred Glover, Mark Lewis, Zhipeng Lü, Haibo Wang, and Yang Wang, “The unconstrained binary quadratic programming problem: a survey,” Journal of Combinatorial Optimization 28, 58–81 (2014).
 Einstein (1914) A. Einstein, “Beiträge z. quantentheorie,” Verh. d. D. phys. Ges. 16, 826 (1914).
 M. Born and V. Fock (1928) M. Born and V. Fock, “Beweis des adiabatensatzes,” Zeit. f. Physik 51, 165–196 (1928).
 Ehrenfest (1916) P. Ehrenfest, “Over adiabatische veranderingen van een stelsel in verband met de theorie der quanta,” Verslagen Kon. Akad. Amserdam 25, 412 – 433 (1916).
 Kato (1950) T. Kato, “On the adiabatic theorem of quantum mechanics,” J. Phys. Soc. Jap. 5, 435 (1950).
 Jansen et al. (2007) Sabine Jansen, MaryBeth Ruskai, and Ruedi Seiler, “Bounds for the adiabatic approximation with applications to quantum computation,” J. Math. Phys. 48, 102111 (2007).
 Matsuda et al. (2009) Yoshiki Matsuda, Hidetoshi Nishimori, and Helmut G Katzgraber, “Quantum annealing for problems with groundstate degeneracy,” Journal of Physics: Conference Series 143, 012003 (2009).
 Boixo et al. (2013) Sergio Boixo, Tameem Albash, Federico M. Spedalieri, Nicholas Chancellor, and Daniel A. Lidar, “Experimental signature of programmable quantum annealing,” Nature Communications 4, 2067 (2013).
 Zhang et al. (2017) Brian Hu Zhang, Gene Wagenbreth, Victor MartinMayor, and Itay Hen, “Advantages of unfair quantum groundstate sampling,” Scientific Reports 7, 1044 (2017).
 Könz et al. (2019) Mario S. Könz, Guglielmo Mazzola, Andrew J. Ochoa, Helmut G. Katzgraber, and Matthias Troyer, “Uncertain fate of fair sampling in quantum annealing,” Phys. Rev. A 100, 030303 (2019).
 Kumar et al. (2020) Vaibhaw Kumar, Casey Tomlin, Curt Nehrkorn, Daniel O’Malley, and Joseph Dulny III, “Achieving fair sampling in quantum annealing,” (2020), arXiv:2007.08487 .
 Pelofske et al. (2021) Elijah Pelofske, John Golden, Andreas Bärtschi, Daniel O’Malley, and Stephan Eidenbenz, “Sampling on nisq devices: ”who’s the fairest one of all?”,” (2021), arXiv:2107.06468 .
 Albash and Lidar (2015) Tameem Albash and Daniel A. Lidar, “Decoherence in adiabatic quantum computation,” Phys. Rev. A 91, 062320 (2015).
 Boixo et al. (2016) Sergio Boixo, Vadim N. Smelyanskiy, Alireza Shabani, Sergei V. Isakov, Mark Dykman, Vasil S. Denchev, Mohammad H. Amin, Anatoly Yu Smirnov, Masoud Mohseni, and Hartmut Neven, “Computational multiqubit tunnelling in programmable quantum annealers,” Nature Communications 7, 10327 (2016).
 Smirnov and Amin (2018) Anatoly Yu Smirnov and Mohammad H Amin, “Theory of open quantum dynamics with hybrid noise,” New Journal of Physics 20, 103037 (2018).
 Abou Salem (2007) Walid K. Abou Salem, “On the quasistatic evolution of nonequilibrium steady states,” Annales Henri Poincaré 8, 569–596 (2007).
 Avron et al. (2012) J. E. Avron, M. Fraas, G. M. Graf, and P. Grech, “Adiabatic theorems for generators of contracting evolutions,” Communications in Mathematical Physics 314, 163–191 (2012).
 Venuti et al. (2016) Lorenzo Campos Venuti, Tameem Albash, Daniel A. Lidar, and Paolo Zanardi, “Adiabaticity in open quantum systems,” Phys. Rev. A 93, 032118 (2016), 1508.05558 .
 Lindblad (1976) G. Lindblad, “On the generators of quantum dynamical semigroups,” Communications in Mathematical Physics 48, 119–130 (1976).
 Davies (1974) E. B. Davies, “Markovian master equations,” Communications in Mathematical Physics 39, 91–110 (1974).
 Albash et al. (2012) Tameem Albash, Sergio Boixo, Daniel A Lidar, and Paolo Zanardi, “Quantum adiabatic markovian master equations,” New Journal of Physics 14, 123016 (2012), 1206.4197 .
 Inc. (2020) DWave Systems Inc., “Dwave system documentation,” Published online at https://docs.dwavesys.com/docs/latest/ (2020), accessed: 03/17/2021.
 Nelson et al. (2021) Jon Nelson, Marc Vuffray, Andrey Y. Lokhov, and Carleton Coffrin, “Singlequbit fidelity assessment of quantum annealing hardware,” IEEE Transactions on Quantum Engineering , 1–1 (2021).
 Amin (2015) Mohammad H. Amin, “Searching for quantum speedup in quasistatic quantum annealers,” Phys. Rev. A 92, 052323 (2015).

Amin et al. (2018)
Mohammad H. Amin, Evgeny Andriyash, Jason Rolfe, Bohdan Kulchytskyy, and Roger Melko, “Quantum boltzmann machine,”
Phys. Rev. X 8, 021050 (2018).  Winci et al. (2020) Walter Winci, Lorenzo Buffoni, Hossein Sadeghi, Amir Khoshaman, Evgeny Andriyash, and Mohammad H Amin, “A path towards quantum advantage in training deep generative models with quantum annealers,” Machine Learning: Science and Technology 1, 045028 (2020).

Harris et al. (2018)
R. Harris, Y. Sato, A. J. Berkley, M. Reis, F. Altomare, M. H. Amin, K. Boothby, P. Bunyk, C. Deng, C. Enderud, S. Huang, E. Hoskinson, M. W. Johnson, E. Ladizinsky, N. Ladizinsky, T. Lanting, R. Li, T. Medina, R. Molavi, R. Neufeld, T. Oh, I. Pavlov, I. Perminov, G. PoulinLamarre, C. Rich, A. Smirnov, L. Swenson, N. Tsai, M. Volkmann, J. Whittaker, and J. Yao, “Phase transitions in a programmable quantum spin glass simulator,”
Science 361, 162–165 (2018).  King et al. (2018) Andrew D. King, Juan Carrasquilla, Jack Raymond, Isil Ozfidan, Evgeny Andriyash, Andrew Berkley, Mauricio Reis, Trevor Lanting, Richard Harris, Fabio Altomare, Kelly Boothby, Paul I. Bunyk, Colin Enderud, Alexandre Fréchette, Emile Hoskinson, Nicolas Ladizinsky, Travis Oh, Gabriel PoulinLamarre, Christopher Rich, Yuki Sato, Anatoly Yu. Smirnov, Loren J. Swenson, Mark H. Volkmann, Jed Whittaker, Jason Yao, Eric Ladizinsky, Mark W. Johnson, Jeremy Hilton, and Mohammad H. Amin, “Observation of topological phenomena in a programmable lattice of 1,800 qubits,” Nature 560, 456–460 (2018).
 King et al. (2021) Andrew D. King, Cristiano Nisoli, Edward D. Dahl, Gabriel PoulinLamarre, and Alejandro LopezBezanilla, “Qubit spin ice,” Science (2021), 10.1126/science.abe2824.
 Marshall et al. (2020) Jeffrey Marshall, Andrea Di Gioacchino, and Eleanor G. Rieffel, “Perils of embedding for sampling problems,” Phys. Rev. Research 2, 023020 (2020).
 Li et al. (2020) Richard Y Li, Tameem Albash, and Daniel A Lidar, “Limitations of error corrected quantum annealing in improving the performance of boltzmann machines,” Quantum Science and Technology 5, 045010 (2020).
 Zaborniak and de Sousa (2021) Tristan Zaborniak and Rogério de Sousa, “Benchmarking hamiltonian noise in the dwave quantum annealer,” IEEE Transactions on Quantum Engineering 2, 1–6 (2021).
 Bunyk et al. (2014) P. I. Bunyk, E. M. Hoskinson, M. W. Johnson, E. Tolkacheva, F. Altomare, A. J. Berkley, R. Harris, J. P. Hilton, T. Lanting, A. J. Przybysz, and J. Whittaker, “Architectural considerations in the design of a superconducting quantum annealing processor,” IEEE Transactions on Applied Superconductivity 24, 1–10 (2014).
 Rønnow et al. (2014) Troels F. Rønnow, Zhihui Wang, Joshua Job, Sergio Boixo, Sergei V. Isakov, David Wecker, John M. Martinis, Daniel A. Lidar, and Matthias Troyer, “Defining and detecting quantum speedup,” Science 345, 420–424 (2014).
 Albash et al. (2015) Tameem Albash, Walter Vinci, Anurag Mishra, Paul A. Warburton, and Daniel A. Lidar, “Consistency tests of classical and quantum models for a quantum annealer,” Phys. Rev. A 91, 042314 (2015).
 Katzgraber et al. (2014) Helmut G. Katzgraber, Firas Hamze, and Ruben S. Andrist, “Glassy chimeras could be blind to quantum speedup: Designing better benchmarks for quantum annealing machines,” Phys. Rev. X 4, 021008 (2014).
 Vuffray et al. (2016) Marc Vuffray, Sidhant Misra, Andrey Lokhov, and Michael Chertkov, “Interaction screening: Efficient and sampleoptimal learning of Ising models,” in Advances in Neural Information Processing Systems, Vol. 29, edited by D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett (Curran Associates, Inc., 2016).
 Weigel et al. (2015) Martin Weigel, Helmut G. Katzgraber, Jonathan Machta, Firas Hamze, and Ruben S. Andrist (Octomore Collaboration), “Erratum: Glassy chimeras could be blind to quantum speedup: Designing better benchmarks for quantum annealing machines [phys. rev. x 4, 021008 (2014)],” Phys. Rev. X 5, 019901 (2015).
 Boothby et al. (2020) Kelly Boothby, Paul Bunyk, Jack Raymond, and Aidan Roy, “NextGeneration Topology of DWave Quantum Processors,” arXiv eprints , arXiv:2003.00133 (2020), arXiv:2003.00133 [quantph] .
 Zhu et al. (2016) Zheng Zhu, Andrew J. Ochoa, Stefan Schnabel, Firas Hamze, and Helmut G. Katzgraber, “Bestcase performance of quantum annealers on native spinglass benchmarks: How chaos can affect success probabilities,” Phys. Rev. A 93, 012317 (2016).
 Albash et al. (2019) Tameem Albash, Victor MartinMayor, and Itay Hen, “Analog errors in ising machines,” Quantum Science and Technology 4, 02LT03 (2019).
 Boixo et al. (2014) Sergio Boixo, Troels F. Rønnow, Sergei V. Isakov, Zhihui Wang, David Wecker, Daniel A. Lidar, John M. Martinis, and Matthias Troyer, “Evidence for quantum annealing with more than one hundred qubits,” Nature Physics 10, 218–224 (2014).

Coffrin et al. (2019)
Carleton Coffrin, Harsha Nagarajan, and Russell Bent, “Evaluating ising processing units with integer programming,” in
Integration of Constraint Programming, Artificial Intelligence, and Operations Research
, edited by LouisMartin Rousseau and Kostas Stergiou (Springer International Publishing, Cham, 2019) pp. 163–181.
Appendix A Comparing Distributions with Total Variation Distance
Given two distributions and over a binary string of size , i.e. , the total variation between them is defined as,
(9) 
that is, the absolute difference between the probability of each state divided by 2, to normalize the metric to the range of 0 to 1. Given that the metric is in the range of 0 to 1, it is often presented as a percentage between 0 and 100.
The TV distance can be interpreted as the maximum discrepancy between the probability of an event computed with the distribution instead of , that is,
(10) 
where is an element of the power set of . Therefore, it comes as the distance of choice to measure the error that can be made by estimating probabilities using a surrogate distribution of a target distribution . However, note that this sets a very strong quality criteria for estimating the reliability of a surrogate distribution as the TV is equal to the worsecase estimation error. It is important to note that if or are only accessible through a set of finite samples, there is an unavoidable error caused by sampling that will be captured by the TV distance. To avoid finite sampling error in the TV estimate, one needs to use a number of samples that is on the order of the typical support of and . This is required to estimate the absolute difference in Eq. (9) accurately for each state . Typically, the number of samples required grows exponentially in and is maximal for distribution and that are close to a uniform distribution where it is in the order of .
It is fairly common to consider KL divergence as an alternative to the TV distance, especially when the nonconvexity of the latter becomes a computational bottleneck, taking advantage of Pinsker’s inequality, i.e., . However, the KL divergence does not share the same meaning as the TV distance: the KL divergence estimates the inefficiency in compressing a source using a code optimized for . In general, it is not suitable for measuring the quality of probability estimates between a target distribution and its surrogate. A simple situation that illustrates this last point is when the distribution is almost identical to except that it has an infinitesimally small probability mass that lies outside of the support of . The TV distance between these distribution is infinitesimally small, while the KL divergence becomes infinite.
Appendix B The Ground State Degeneracy Models
The Ising model instances used in this work were generated by the following procedure. First, 16 spins (2 unit cells) on the DWave 2000Q Quantum Annealer DW_2000Q_LANL hardware graph were selected with qubits 296 to 303 making up one unit cell and qubits 304 to 311 making up an adjacent unit cell in the Chimera graph architecture. Then the values for incident to these qubits were assigned randomly from the set for each coupler in the hardware graph. Second, a bruteforce calculation was used to compute the groundstate degeneracy for a given instance.^{3}^{3}3This approach is only computationally feasible for small systems spins, such the ones considered here. This procedure was repeated many times and a subset of cases were selected to highlight a smooth range of groundstate degeneracies. This yielded a total of seven GSD (groundstate degeneracy) cases with degeneracy values of 2, 4, 6, 8, 10, 24, and 38, which are presented in Table 2. The same procedure was repeated while also assigning random values to the local fields to , yielding six additional GSDF models with 1, 2, 3, 4, 5, and 6 ground state degeneracies, which are presented in Table 3. The specific instances are referred to as GSD# and GSDF#, where the # indicates the amount of ground state degeneracy. Note that the GSDF models tend to have lower degeneracy because the inclusion of the fields breaks symmetries within energy levels.
It is worth noting that these models are reminiscent of the RAN and RANF models that have been used to benchmark earlier generations of QA processors Boixo et al. (2014); Coffrin et al. (2019). The RAN model class on Chimera graphs has been shown to exhibit a zero temperature phase transition Katzgraber et al. (2014); Weigel et al. (2015), indicating that sampling from this model class at finite temperature is not expected to be computationally hard. Here however instances are handpicked from this class to exhibit varying degrees of ground state degeneracy breaking by the transverse field. In this sense, we do not expect that the instances of this work to be representative samples from the RAN class, and instead expect them to be challenging for a quantum annealer to sample from.
Appendix C Total Variation at Different Scales and Annealing Times
In this section, we present supplementary figures for each of the experiments in the main text. Figures 7 and 8 display the total variation results for every GSD model as a function of . As in Figure 1 where only GSD6 is shown, the results for each of the annealing times are overlaid with different colors. Note how GSD2 and GSDF1 are the only instances where total variation decreases as increases. This is because models with low degeneracy are easier to sample from in the high scale regime, which was previously explained in Section III.2.2. For the rest of the instances, the consistent Ushaped trend supports our claim that the most effective for achieving high accuracy is in the medium scale regime. This optimal band between of 0.2 and 0.4 remains consistent across all GSD models. In addition, these figures show that varying annealing time does not shift the optimal region by a significant amount, with the exception of GSD10. However, there is a slight yet noticeable shift to the left in almost all instances as anneal times increase.
Figure 9 communicates the same information as Figures 7 and 8, but more directly compares various GSD models by stacking their total variation values in a heatmap. Only the plot was presented in Figure 3 in the main text, but here the results for all anneal times are shown. Once again, it is evident that the optimal region for remains the same for various anneal times.
Appendix D Effective Temperatures at Different Scales and Annealing Times
Figures 9 and 10 present the values that correspond to the optimal range between 0.2 and 0.4. The minimum and maximum value in each heatmap was used to construct Table 1. These figures show how and anneal time can be used to smoothly tune and that the effect is consistent across all GSD models. Although the exact values vary slightly from model to model, the trend is predictable and so these figures suggest a straitforward approach for producing Gibbs samples at at desired value.
Appendix E ThreeSpin Ising Chain Annealing Times
Finally, Figure 13 shows results for the threespin Ising chain experiment for various anneal times. The most important feature of these plots is the point at which the spurious coupling reduces to zero, as we hypothesize that this is the optimal input scale for sampling. From the plots, this optimal value slightly decreases from 0.275 to a low of 0.2 as anneal time increases. However, this value still remains within the optimal region seen in the 16spin scaling experiments. As anneal time increases, both programmed and spurious couplings appear to increase more rapidly with the increase of . This increase results in the spurious coupling almost saturating by the time reaches 0.4 for anneal times of 25 and 125 . However, the majority of our claimed optimal region between 0.2 and 0.4 remains in the regime where the spurious coupling is small compared to the programmed couplings.
Appendix F Background Susceptibility Model for the ThreeSpin Ising Chain
It is believed that physically neighboring qubits are not perfectly isolated from each other and give rise to uncontrolled interactions. These spurious couplings are described by the Background Susceptibility model Inc. (2020) and take the form of an interaction that is proportional to the coupling intensities emanating from neighboring qubits. For the threespin Ising chain discussed in Section IV, it results in the following Hamiltonian with background susceptibility,
(11) 
where is the background susceptibility constant^{4}^{4}4In Inc. (2020), is a negative quantity following the DWave programming convention that ferromagnetic couplings are negative. between qubits one and three. Notice that the intensity of the spurious link between and is quadratic in the input coupling intensity and is of ferromagnetic nature when is positive. Because the Hamiltonian in Eq. (11) is diagonal, we immediately see that this model behaves very differently than the effective Hamiltonian found experimentally (see Figure 5). The background susceptibility model predicts no saturation in the input coupling or spurious coupling intensity, as the earlier grow linearly and the later grows quadratically. Moreover, the spurious coupling of the background susceptibility model remains ferromagnetic unlike what is measured experimentally for small values of . This behavior is depicted in Figure 11 for the typically encountered values of and .
GSD2  GSD4  GSD6  GSD8  GSD10  GSD24  GSD38  

1.0  1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  1.0 
GSDF1  GSDF2  GSDF3  GSDF4  GSDF5  GSDF6  
1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0  
1.0  1.0  1.0  1.0  1.0  1.0 
Comments
There are no comments yet.