# Very strong evidence in favor of quantum mechanics and against local hidden variables from a Bayesian analysis

The data of four recent experiments --- conducted in Delft, Vienna, Boulder, and Munich with the aim of refuting nonquantum hidden-variables alternatives to the quantum-mechanical description --- are evaluated from a Bayesian perspective of what constitutes evidence in statistical data. We find that each of the experiments provides strong, or very strong, evidence in favor of quantum mechanics and against the nonquantum alternatives. This Bayesian analysis supplements the previous non-Bayesian ones, which refuted the alternatives on the basis of small p-values, but could not support quantum mechanics.

• 1 publication
• 8 publications
• 12 publications
• 3 publications
08/08/2021

### Incompatibility between 't Hooft's and Wolfram's models of quantum mechanics

Stephen Wolfram and Gerard 't Hooft developed classical models of quantu...
08/04/2015

### Factor Graphs for Quantum Probabilities

A factor-graph representation of quantum-mechanical probabilities (invol...
03/21/2020

### A Quantum Vocal Theory of Sound

Concepts and formalism from acoustics are often used to exemplify quantu...
07/30/2018

### Objective and Subjective Solomonoff Probabilities in Quantum Mechanics

Algorithmic probability has shown some promise in dealing with the proba...
06/18/2021

### The dihedral hidden subgroup problem

We give an exposition of the hidden subgroup problem for dihedral groups...
12/09/2020

### Emergent Quantumness in Neural Networks

It was recently shown that the Madelung equations, that is, a hydrodynam...
12/02/2015

### The GTR-model: a universal framework for quantum-like measurements

We present a very general geometrico-dynamical description of physical o...

## I Introduction

Four recent experiments in Delft Delft , Vienna Vienna , Boulder Boulder , and Munich Munich tested the variants of Bell’s inequality Bell:64 introduced by Clauser et al. Clauser+3:69 and Eberhard Eberhard:93 . The shared aim of these experiments was the refutation of descriptions in terms of local hidden variables (LHV) that Bell and others had proposed as an alternative to the description offered by quantum mechanics (QM). Upon extracting small p-values from the respective data, with values between (Vienna) and (Delft), each of the four groups of scientists concluded that their data refute the LHV hypothesis. Putting aside all other caveats about, objections against, and other issues with the use of p-values Evans:p-value ; ASA:16 ; Benjamin+71:17 , let us merely note that the use of p-values can only make a case against LHV but not in support of QM. Yet, a clear-cut demonstration that the data give evidence in favor of QM is surely desirable.

We present here an evaluation of the data of the four experiments that shows that there is very strong evidence in favor of QM and also against LHV. Our analysis does not rely on p-values or any other concepts of frequentist statistics. We use Bayesian logic and measure evidence — in favor of alternatives or against them — by comparing the posterior with the prior probabilities of the alternatives to be distinguished.

The basic notion is both simple and natural Evans:15 : If an alternative is more probable in view of the data than before acquiring them, then the data provide evidence in favor of this alternative; and, conversely, if an alternative is less probable after taking note of the data than before, then the data give evidence against this alternative.

In our analysis, we only employ this principle of evidence and no particular measure for quantifying the strength of the evidence strength . As it happens, all alternatives save one are extremely improbable in view of the data so that the evidence in favor of the privileged alternative is overwhelming, and a quantification of the strength of the evidence is not needed here.

While Bell’s inequality and its variants are central to the design of the experiments, they play no role in our evaluation of the data. What matters are the probabilities of occurrence of the various measurement outcomes in the experiments. As discussed in Sec. III, the permissible probabilities make up an eight-dimensional set. It is composed of three subsets: one accessible only by QM, another only by LHV, and the third by both; see Fig. 1. We then ask Do the data provide evidence in favor of or against each of the three subsets? and, from the data of each of the four experiments, we find strong evidence for the QM-only subset and against the other two.

An essential part of the Bayesian analysis is the choice of prior — the assignment of prior probabilities to the three regions in Fig. 1, thereby accounting for our prior knowledge about the experiment and the assumptions behind its design. If we were to strictly follow the rules of Bayesian reasoning, we would endow the set of QM-permitted probabilities with a prior close to 100% and allocate a very tiny prior to the subset of LHV-only probabilities. For, generations of physicists have accumulated a very large body of solid experimental and theoretical knowledge that makes us extremely confident that QM is correct. Not one observed effect contradicts the predictions of QM, while there is not a single documented phenomenon in support of all those speculations about LHV. In fact, this was already the situation in the mid 1960s when Bell published Ref. Bell:64 and gave a physical interpretation to an inequality known to Boole a century before Boole:62 .

Moreover, for the evaluation of the data from the four experiments, a by-the-rules prior, namely a properly elicited prior, would have to reflect our strong conviction that the experimenters managed to implement the experiment as planned in a highly reliable fashion, with the desired probabilities from the QM-only subset. Accordingly, we really should assign a very large prior probability to the “QM only” region symbolized in Fig. 1, a much smaller one to the “both” region, and an even smaller one to the “LHV only” region.

Such a prior, however, could bias the data evaluation in favor of QM and against LHV. Therefore, we deliberately violate the rules and use a prior that treats QM and LHV on equal footing; see Sec. V. To demonstrate that our choice of prior is not biased toward QM, we check for such a bias and confirm that there is none; more about this in Sec. V.3. Yet, all this tilting of the procedure does not help the LHV case: The data speak clearly and loudly that QM rules and LHV are out.

This contributes also to the development of Bayesian methodology, inasmuch as we demonstrate that the subjective biases inherent in a Bayesian statistical analysis through, for example, the choice of the prior, can be assessed a priori. To the best of our knowledge this is one of the first applications of this type of computation to ensure that such choices are not producing foregone conclusions.

We recall the experimental scheme common to all four experiments (Sec. II) and the ways in which the probabilities of detecting the various events are parameterized in the QM formalism or by LHV (Sec. III). This is followed by a discussion of how the difference between the prior and the posterior content of a region gives evidence in favor of this region or against it (Sec. IV). Then we explain our choice of prior on the eight-dimensional set of permissible probabilities — permitted either by QM or by LHV (Sec. V); more specifically, we define the prior by the algorithm that yields the large sample of permissible probabilities needed for the Monte Carlo integrations over the three regions symbolized in Fig. 1.

Then, having thus set the stage, we present, as a full illustration of the reasoning and methodology, the detailed account of the various aspects of our evaluation of the data recorded in one run of the Boulder experiment (Sec. VI.1

). This includes the estimation, from the data, of an experimental parameter for which the value given in Ref.

Boulder is not accurate. While an accurate value is not needed for calculating the p-value reported in Ref. Boulder , it is crucial for the QM account of the experiment. The results of processing the data from three other runs of the Boulder experiment are reported in Sec. VI.2. The evaluation of the data from three runs of the Vienna experiment also requires the estimation of the analogous parameter (Sec. VII.1) whereas there is no need for that in the context of the experiments conducted in Delft and Munich (Sec. VII.2).

All four experiments separately provide strong evidence in favor of QM and against LHV. Jointly, they convey the very clear message that this verdict is final.

## Ii Experimental scheme

The four experiments realize variations of one theme; see Fig. 2. Upon receiving a trigger signal, the source of qubit pairs equips Alice and Bob with one qubit each; the success probability for this is denoted by . Alice chooses one of two settings, denoted by and , for her selector in front of her qubit detector, which fires with efficiency . Likewise Bob chooses between settings and for his selector and then detects the selected qubits with efficiency . For each trigger signal, the outcome is recorded and counts as an event of one of four kinds: a “ event” if Alice’s and Bob’s detectors both fire; a “ event” if Alice’s detector fires and Bob’s does not; a “ event” if Alice’s detector does not fire and Bob’s does; or a “ event” if both detectors do not fire. The data consist of the number of events observed of the four kinds, for the four settings available by choosing or and or ; together there are sixteen counts, such as for the events in the setting with and , and

 D=(n(ab)++,n(ab)+0,…,n(ab′)0+,…,n(a′b′)00) (1)

reports the data for one run of the experiment as a 16-element string of natural numbers. Their sum

 N=n(ab)+++n(ab)+0+⋯+n(ab′)0++⋯+n(a′b′)00 (2)

is the total number of trigger signals.

Table 1 lists the parameters of the four experiments fn:data . The Vienna and Boulder experiments exploit the polarization qubits of photon pairs generated by down-conversion processes that happen rarely (

). The unit vectors

, and , that specify the selections refer to the orientation of polarization filters, and the detectors register the photons that are let through.

The qubits in the Delft and Munich experiments are in superpositions of hyperfine states of two atoms in spatially separated traps. The preparation of the initial state is achieved by entanglement swapping and is, therefore, heralded so that in these event-ready setups. The selection and detection are implemented by probing for a chosen superposition, specified by the Bloch vectors , and , .

Since the physics is independent of the coordinate systems adopted for the description, we can regard and as vectors in the plane of the Bloch ball for Alice’s qubit, and likewise and are in the plane for Bob’s qubit. All that matters is the angle between and , and the angle between and . Our choice of coordinate systems is then such that

 \vecfonta\vecfonta′} = ±\vecfontexsin(12θA)+\vecfontezcos(12θA), \vecfontb\vecfontb′} = ±\vecfontnxsin(12θB)+\vecfontnzcos(12θB), (3)

where , , are the cartesian unit vectors for Alice’s qubit and , , are those for Bob’s. Table 1 reports the respective values of and .

A comment is in order about the table entries for and . The information given in Refs. Delft ; Vienna ; Boulder ; Munich is in terms of angles with respect to a reference direction, such as the setting of a wave plate relative to the conventional direction of vertical polarization. This tells us, for each setting, the magnitudes of the probability amplitudes in the superposition of vertical and horizontal polarizations but not their complex phases. It appears that the experimenters assumed that there are no relative phases, and this assumption leads to the values of and in Table 1. When the assumption is not made, we get a range of values for some of the s and s, such as in the Boulder experiment. It is possible to estimate and from the data (in a manner analogous to that of Sec. VI.1.1) and we performed such an estimation for the Boulder data in Table 2 below, with the outcome that is reasonable. We regard this as assurance that there are no relative phases to be concerned about.

Another comment concerns the uncertainties of the s and s, and of the s and s in Table 1, such as and for the Boulder experiment Boulder . While the evaluation of the data reported in Secs. VI and VII refers to the parameter values in Table 1, we also used slightly different values for comparison and found that our conclusions are not affected at all.

## Iii Permissible probabilities

For each setting or or or , we have the four probabilities , , , and of recording the respective events for the next trigger signal. These have unit sum,

 ∑α,β=+,0p(S)αβ=1, (4)

but are otherwise unrestricted. Accordingly, the quartets of probabilities for one setting compose the whole standard probability 3-simplex. There are four simplices of this kind, one for each setting, which are linked because Alice’s detection probabilities do not depend on Bob’s setting,

 p(a)+ ≡ p(ab)+++p(ab)+0=p(ab′)+++p(ab′)+0, p(a′)+ ≡ p(a′b)+++p(a′b)+0=p(a′b′)+++p(a′b′)+0, (5)

and Bob’s detection probabilities do not depend on Alice’s setting,

 p(b)+ ≡ p(ab)+++p(ab)0+=p(a′b)+++p(a′b)0+, p(b′)+ ≡ p(ab′)+++p(ab′)0+=p(a′b′)+++p(a′b′)0+. (6)

These are the so-called “no signaling” conditions.

Together, then, the sixteen probabilities obey eight constraints and, therefore, the probability space is eight-dimensional. The regions sketched in Fig. 1 are regions in this eight-dimensional probability space. It is fully parameterized by the four probabilities in Eqs. (III) and (III) and the four null-event probabilities , as illustrated by

 p(ab)++ = p(a)++p(b)++p(ab)00−1, p(ab)+0 = 1−p(ab)00−p(b)+, p(ab)0+ = 1−p(ab)00−p(a)+ (7)

for setting .

### iii.1 QM probabilities

In the description offered by QM, a trigger signal results in a qubit pair with probability and yields nothing with probability . With denoting the statistical operator of the qubit pair, the Pauli vector operator of a qubit, and the identity operator, we have

 p(a)+ = γηAtr{12(1+\vecfonta⋅σ)⊗1ρ}, p(a′)+ = γηAtr{12(1+\vecfonta′⋅σ)⊗1ρ}, p(b)+ = γηBtr{1⊗12(1+\vecfontb⋅σ)ρ}, p(b′)+ = γηBtr{1⊗12(1+\vecfontb′⋅σ)ρ} (8)

for Alice’s and Bob’s individual probabilities Munich-dark , and the null-event probabilities are

 p(ab)00 = γtr{[1−ηA12(1+\vecfonta⋅σ)] (9) γtr{⊗[1−ηB12(1+\vecfontb⋅σ)]ρ}+(1−γ)

and analogous expressions for , , and .

Owing to their small values, the probabilities in the Vienna and Boulder experiments occupy only a very small portion of the linked 3-simplices because the probabilities are bounded by

 p(S)++≤γηAηB,p(S)00≥1−γ, p(a)+,p(a′)+≤γηA,p(b)+,p(b′)+≤γηB. (10)

For the Delft and Munich experiments, which have and , no major portions of the 3-simplices are excluded.

The set of permissible QM probabilities, enclosed by the symbolic ellipse in Fig. 1, is made up by the probabilities obtained from all thinkable s in accordance with Eqs. (III)–(9). Each statistical operator is represented by a density matrix — a hermitian, nonnegative, unit-trace matrix. As a consequence of Eq. (II), the s are linear combinations of the expectation values of the eight operators

 σx⊗1,σz⊗1,1⊗σx,1⊗σz, σx⊗σx,σz⊗σx,σx⊗σz,σz⊗σz, (11)

all represented by real matrices if we employ the standard real matrices for and . Therefore, only the real parts of the matrix elements of matter, and we only need to consider s represented by real density matrices, which make up a nine-dimensional convex set. The ninth parameter is the expectation value of .

### iii.2 LHV probabilities

In the LHV reasoning, the sequence “first create a qubit pair, then select, finally detect” of Sec. III.1 is meaningless; all that has meaning is “detection event after trigger signal” Brunner+4:14 . There are no sequential processes controlled by the hidden variables step-by-step, they control the overall process. Therefore, the trigger-to-pair probability and the detection probabilities , , which are central to the correct application of Born’s rule in Eqs. (III.1) and (9), play no role when relating the s to hidden variables.

Following Wigner Wigner:70 and others (see, for example, Refs. Fine:82 ; Kaszlikowski:00 ), we parameterize the LHV probabilities in terms of sixteen hypothetical probabilities from the 15-simplex,

 ∑α,α′=+,0β,β′=+,0w(αα′ββ′)=1withw(αα′ββ′)≥0. (12)

Here, is the fictitious joint probability of obtaining, for the next trigger signal, result for Alice’s setting , and result for her setting , and result for Bob’s setting , and also result for his setting (never mind that she has setting or and he has or ). The eight marginal probabilities

 p(a)+ = ∑α′,β,β′w(\mathnormal+α′ββ′),p(a′)+=∑α,β,β′w(α\mathnormal+ββ′), p(b)+ = ∑α,α′,β′w(αα′\mathnormal+β′),p(b′)+=∑α,α′,βw(αα′β\mathnormal+), p(ab)00 = ∑α′,β′w(0α′0β′),p(ab′)00=∑α′,βw(0α′β0), p(a′b)00 = ∑α,β′w(α00β′),p(a′b′)00=∑α,βw(α0β0) (13)

then determine the sixteen s in accordance with Eq. (III).

The hidden probabilities control all aspects of the experiment, and they are such that they mislead us into regarding QM as correct. That is, the LHV probabilities should have as many properties of the QM probabilities as possible. Therefore, we require that all inequalities in Eq. (III.1) are respected by the LHV probabilities. Through these inequalities, then, the values of , , and , which are properties of the experimental apparatus, enter the LHV formalism. Accordingly, the set of permissible LHV probabilities, enclosed by the symbolic triangle in Fig. 1, is made up by the probabilities obtained from all thinkable s in accordance with Eqs. (III) and (III.2), subject to the constraints in Eq. (III.1).

Note that, as a consequence of the different values of , , and , we have different sets of permissible probabilities for the four experiments. Symbolically, there are several different ellipses and several different triangles in Fig. 1.

Note also that the “LHV only” region is not empty. For example, if we choose for all hidden probabilities except for and , then the constraints of Eq. (III.1) are obeyed and we get for all four settings; there is no statistical operator for which the QM probabilities of Eqs. (III.1) and (9) are like this. And any for which a Bell-type inequality, such as , is violated will give probabilities in the “QM only” region.

## Iv Prior and posterior content; evidence

We write for the prior probability assigned to the infinitesimal vicinity of a point in the probability space, where the differential element

 (dp)=dp(ab)++dp(ab)+0⋯dp(a′b′)00wcstr(p) (14)

incorporates the constraints that restrict to the set of permissible values, symbolized by the union of the regions enclosed by the ellipse and the triangle in Fig. 1. In particular, we have when Eqs. (4)–(III) are not obeyed. Other constraints result from the nonnegativity of the statistical operator and the hidden probabilities, and from the restrictions imposed by Eq. (III.1). Although there are algorithms for checking whether the constraints are obeyed by any given , we do not have an explicit expression for ; we also do not need one.

While depends on the parameters of the experiments, with different constraints for the four experiments because they differ in the values of , , , , and (see Table 1), the factor reflects what we know about the experiments before the data are acquired. Our choice for is discussed in Sec. V; here, we shall assume that a certain choice has been made.

Then

 SR=∫R(dp)w0(p) (15)

is the prior content of region (its “size”). The three regions of interest are the ones symbolized by the red, blue, and green areas in Fig. 1, that is: the sets of probabilities permitted only by QM, only by LHV, or by both. The three prior contents have unit sum,

 SQMonly+SLHVonly+Sboth=1, (16)

which states the normalization of to unit integral.

The likelihood function tells us how likely are the data if the probabilities are the case. Since successive trigger signals and the resulting detection events are statistically independent runs-test , the likelihood has the multinomial form

 L(D|p)=N!4N∏S,α,βp(S)αβn(S)αβn(S)αβ!, (17)

where we assume that the four settings are chosen randomly with equal probability, and there is a new setting for each trigger signal. The joint probability of having inside the region and observing the data is

 ∫R(dp)w0(p)L(D|p)=L(D)CR(D), (18)

where

 L(D)=∫all(dp)w0(p)L(D|p) (19)

is the overall probability of obtaining the data , and is the conditional probability that is inside the region given the data . There is evidence in favor of the region when , and there is evidence against the region when Evans:15 .

In this posterior content of the region (its “credibility”),

 CR=1L(D)∫R(dp)w0(p)L(D|p)=∫R(dp)wD(p), (20)

we recognize the posterior density

 wD(p)=w0(p)L(D|p)L(D), (21)

the Bayesian update of the prior density in the face of the data . The posterior contents of the three particular regions of interest also add up to unity,

 CQMonly+CLHVonly+Cboth=1, (22)

as is normalized, too. Owing to the unit sums in Eqs. (16) and (22), whatever the data, there will be evidence in favor of one of the regions, and evidence against another, and we can have evidence in favor of the third region or evidence against it.

Regarding the -independent combinatorial factor in Eq. (17) we note the following. This particular combination of factorials refers to the situation in which one takes data until , the number of trigger signals, reaches a pre-chosen value. Such is the stopping rule of the Delft and Munich experiments. Other stopping rules have other combinatorial factors. For example, in the Boulder and Vienna experiments, the value of is pre-chosen and sets the stopping rule. Further, the factor in the combinatorial factor does not apply when the settings are not equally likely. Other modifications are required if several consecutive events are recorded before the setting changes, as is the case in the Boulder experiment; see Sec. VI.

In the context of our investigation here, however, it does not matter what the stopping rule is. The combinatorial factor associated with the rule cancels in Eq. (21) and is of no further consequence. Therefore, we shall use the combinatorial factor of Eq. (17

) for all datasets we evaluate, irrespective of the actual stopping rule. The lack of dependence on the stopping rule is characteristic of Bayesian inferences generally.

## V Choice of prior

We need to choose the prior density in order to give specific meaning to the integrals in Eqs. (15), (19), and (20). These eight-dimensional integrals are computed by Monte Carlo integration, for which we need a large sample of permissible s such that the number of sample points in a region is proportional to its prior content . It is, therefore, expedient to define by the sampling algorithm, and this is what we do.

For the reasons mentioned in the Introduction, we shall not choose the prior following the rules of proper Bayesian reasoning. Instead, we opt for a prior that, under the ideal circumstances of perfect detectors, does not distinguish between QM and LHV for a single setting .

Our samples are composed of sets of probabilities generated from randomly chosen quantum states plus another sets from random LHV. We employ two sampling algorithms, one for QM and the other for LHV, so that half of our sample points are from the symbolic ellipse of Fig. 1, and the other half from the triangle. When marginalized over the other three settings, the sample points inside the 3-simplex of the fourth setting have equal density for both algorithms under the ideal circumstances of , with the same marginalized prior for each of the four 3-simplices.

After completing the QM sampling and the LHV sampling, described in Secs. V.1 and V.2, and confirming that there is no hidden bias in the sample (see Sec. V.3), we have a suitable random sample of points in the space of permissible probabilities — permitted either by QM or by LHV, that is — and we also know how many sample points are in the three regions of interest. Put differently, we know the three prior contents that are added in Eq. (16). Owing to the random process of sampling, the sample has fluctuations which give rise to sampling errors in , , and , and also in other quantities computed by Monte Carlo integration with this sample. For the applications reported in Secs. VI and VII, however, we find that a sample with entries is large enough to ensure that the sampling errors do not affect the conclusions; more about this in Sec. VI.1.2.

### v.1 QM contribution to the sample

In all four experiments, the source yields the qubit pairs in an entangled state of high purity, a very good approximation of the pure target state that motivates the experimental effort — the two-qubit state that requires the smallest threshold detector efficiency (Vienna and Boulder) or leads to the strongest violation of the Bell-type inequality (Delft and Munich). With this in mind, we produce the QM sample by the following five-step procedure.

Step 1 Draw four independent real numbers , , ,

from a normal distribution with zero mean and unit variance i.e., the probability element is

 dx1√2πe\footnotesize−12x2; (23)

then

 ϱ=1x21+x22+x23+x24⎛⎜ ⎜ ⎜⎝x1x2x3x4⎞⎟ ⎟ ⎟⎠(x1x2x3x4) (24)

is a real pure-state density matrix. Repeat three times, thus producing , , , and .

Step 2 Use the convex sum

 ρˆ=(1−3ϵ)ϱ1+ϵ(ϱ2+ϱ3+ϱ4) (25)

with to make up the density matrix of a high-purity full-rank statistical operator .

Step 3 Calculate the probabilities of Eqs. (III.1) and (9) and enter into the sample.

Step 4 By checking if is inside Fine’s polytope Fine:82 ; Froissard:81 ; Brunner+4:14 , or by any other method, determine whether belongs to the “QM only” or the “both” set of probabilities.

Step 5 Repeat Steps 1–4 until the sample has entries.

Some comments are in order: (i) The probability element in Eq. (23) is such that the pure-state density matrices of Eq. (24

) are uniformly distributed over the 3-sphere; put differently, the distribution is uniform for the Haar measure on O(4). Then, for each pure-state

of Step 1 and each setting , the marginal distribution on the 3-simplex has the prior element

 dp(S)++dp(S)+0dp(S)0+dp(S)00δ(p(S)+++p(S)+0+p(S)0++p(S)00−1)π2γ√p(S)++p(S)+0p(S)0+[p(S)00−(1−γ)] (26)

if , where all factors in the argument of the square root must be positive. (ii) The value chosen for in Step 2 is a compromise. Values that are much bigger result in a prior content of the “QM only” region that is too small to be useful; values that are much smaller, by contrast, yield a sample of quantum states with unreasonably high purity. That said, other small values of could be chosen in Step 2, or one could determine small s at random by a suitable lottery.

### v.2 LHV contribution to the sample

The algorithm for sampling from the LHV-permissible probabilities consists of the following five steps.

Step 1 Draw sixteen independent positive numbers , , …, from a distribution, i.e., the probability element is

 dyy−78(−78)!e\footnotesize−y; (27)

then put

 w(\mathnormal+\mathnormal+\mathnormal+\mathnormal+) = γy1Y, w(\mathnormal+\mathnormal+\mathnormal+0) = γy2Y, ⋮ w(000\mathnormal+) = γy15Y, w(0000) = (1−γ)+γy16Y, withY = y1+y2+⋯+y16. (28)

Repeat three times, thus producing , …, .

Step 2 With the same value of as in Eq. (25), use the convex sum

 w(αα′ββ′) = (1−3ϵ)w1(αα′ββ′)+ϵ[w2(αα′ββ′) (29) +w3(αα′ββ′)+w4(αα′ββ′)]

for calculating the probabilities of Eq. (III.2).

Step 3 Enter into the sample if the inequalities of Eq. (III.1) are obeyed, and proceed to Step 4; otherwise discard this and return to Step 1.

Step 4 Use the procedure described in Sec. 4.3 of Ref. Seah+4:15 , or any other method, to determine whether this belongs to the “LHV only” or the “both” set of probabilities.

Step 5 Repeat Steps 1–4 until the sample has entries.

Here, too, some comments are in order: (i) The probability element in Eq. (27), with the particular power , is such that we get, for each setting , the same single-setting marginal distribution on the 3-simplex as for the QM sampling, that is: Eq. (26) applies to the LHV sample as well. (ii) We include the constraint into the parameterization of the s in Step 1 rather than into the acceptance or rejection procedure of Step 2, for the technical reason that this gives us a much higher acceptance rate when as is the case for the Vienna and Boulder experiments. (iii) Having ensured that the respective Steps 1 of the QM and the LHV sampling give the same single-setting marginal distribution, we choose the same in the mixing in the respective Steps 2 to keep the single-setting distributions on equal footing.

### v.3 Checking the prior for bias

It is important to confirm that there is no bias in the prior that would make us unfairly prefer one conclusion over the others. For example, if we were to conclude regularly that there is evidence in favor of the “QM only” region for data that are typical for s in the “both” region, that would indicate a procedural bias for the “QM only” region.

Accordingly, our test for a bias proceeds as follows (see Sec. 4.6 in Ref. Evans:15 ). We draw a random from the prior for the experiment in question and simulate data for this “true ” for as many trigger signals as in the experimental data. The simulated data give evidence in favor of some regions and against others. This is repeated for many such mock-true probabilities , one thousand or more for each of the three regions.

In our tests, we almost never get evidence in favor of the “QM only” region for true s from another region when evaluating the data from the experiments conducted in Boulder and Vienna (Tables 6 and 9 in Sec. VI, Table 13 in Sec. VII.1). Less rare are cases with evidence in favor of the “both” region for a mock-true in the “QM only” region, but that is of no concern. Owing to the much smaller counts of events in the Delft and Munich experiments, for them it happens more often that we find evidence for the “QM only” region for true s in the “both” region, and even for true s in the “LHV only” region (Table 17 in Sec. VII.2). This is understandable since statistical fluctuations in the simulated data have a much larger chance of producing somewhat untypical data when the data are few; indeed, such evidence for a “wrong” region occurs more often when simulating the Delft experiment than the Munich experiment, which has more than one-hundred times as many counts. In summary, the bias checks establish that there is no procedural bias in favor of the “QM only” region.

## Vi The Boulder experiment

In the Boulder experiment Boulder , every one of the settings was active for about ns before a random switch to another (or the same) setting occurred. Pulses of short-wavelength light, ns apart, were impinging on the nonlinear crystal that generated down-converted photon pairs with a longer wavelength. The fifteen pulses per setting constitute a trial, and a selected subset of corresponding pulses from all trials make up the trigger signals of a run. When selecting one pulse only (the 6th), one gets the run with one trigger signal per trial; likewise selecting three pulses (the 5th, 6th, and 7th) yields the run with three trigger signals per trial; there are also runs with five or seven trigger signals per trial, obtained by selecting the 4th to 8th pulses or the 3rd to 9th pulses, respectively. In the runs with three, five, or seven trigger signals per trial, then, there is the same setting for this many consecutive events before the setting is changed at random. Further, since the raw-data trials, of fifteen pulses each, are the same for all four runs, these runs are not referring to independently collected data. Roughly one third of the events in the run with three trigger signals per trial are also contained in the run with one trigger signal per trial, and correspondingly for the other runs.

### vi.1 Trials with five trigger signals

We give here a detailed evaluation of the run with five trigger signals per trial. In total, there are trigger signals in this run Boulder-run ; see Table 2 for the observed data and Table 1 for the parameters of the experiment.

The left part of Table 3 summarizes our findings. While almost all of the prior is shared, roughly equally, between the “both” and “LHV only” regions, the “QM only” region contains merely of the prior. This is a consequence of the detector efficiencies of about — above, but not far above, the threshold found by Eberhard Eberhard:93 . The posterior, by contrast, is entirely confined to the “both” region, so that these data are inconclusive: very strong evidence in favor of “both” and against “QM only” and also against “LHV only”.

This verdict is completely at odds with that reached by the authors of Ref.

Boulder who confidently reject the hypothesis of LHV on the basis of their data. A careful consideration of all aspects of the experiment convinced us that the discrepancy originates in the inaccurate value of the trigger-signal–to–qubit-pair conversion probability , given as “” in Ref. Boulder . When we use our best guess for — estimated from the data as described below — namely , which is some 40% larger than the quoted value, we get the numbers in the right part of Table 3. While there is little change in the prior contents of the three regions, the posterior is now entirely contained in the “QM only” region, so that we have very strong evidence in favor of this region and against the other two, against LHV that is. Accordingly, we confirm that the LHV hypothesis is rejected, indeed.

It is worth noting here that plays a very different role in the QM formalism than in the LHV formalism. The QM probabilities in Eqs. (III.1) and (9) involve quite explicitly, whereas it restricts the LHV probabilities through the bounds in Eq. (III.1). Therefore, a change in the value of has quite different consequences for the points of view offered by QM and LHV. This is clearly demonstrated by the numbers reported in Table 3 and also by those in Tables 4 and 5 as well as Fig. 3 in the next section.

#### vi.1.1 Estimating γ from the data

As an exercise in quantum state estimation (QSE; see, for example, Refs. LNP649 ; Shang+4:13 ; Teo:16 ), we determine the QM probabilities that maximize the likelihood of Eq. (17) and so find the QM-based maximum-likelihood estimator (QM-MLE; see Ref. Shang+2:17 for a fast and reliable algorithm). Another maximization of , now over the LHV-permissible probabilities, identifies the LHV-based maximum-likelihood estimator (LHV-MLE). In the top part of Table 4, we compare the probabilities of the two MLEs with those of the target state — the ideal two-qubit quantum state that the source should make available — and with the relative frequencies associated with the counts in Table 2. The subtables are composed of the four corresponding probabilities, with substantial variation within most of the subtables.

What is particularly unsettling is the colossal ratio of the maximum values of the likelihood: (QM) versus (LHV). The data are much much more likely for LHV than for QM — by more than orders of magnitude. What is often termed “the largest discrepancy in physics” Adler+2:95 , a modest orders of magnitude, pales in comparison.

Now, the methods of QSE can be used for determining parameters of the experiment in addition to the s of the MLE. One then speaks of self-calibrating QSE; see, e.g., Refs. Mogilevtsev:10 ; Branczyk+5:12 ; Quesada+2:13 . In particular, one can optimize both the statistical operator of Eqs. (III.1) and (9) and also the value of when maximizing the likelihood. As stated above, the best guess we thus obtain is , for which the maximum value of the likelihood is (QM); the choice for (in this range) has no effect on the LHV value of . For this optimized value, then, the data are much more likely for QM than for LHV — by eleven orders of magnitude.

In passing we note that such small values of the likelihood are not surprising if there are so many counts, simply because there is a huge number of similar data, with a slight redistribution of counts, that could have been observed equally well. An absolute upper bound is given by the maximum of over all s that obey the no-signaling constraints but are otherwise unrestricted. This establishes , less than 15% in excess of the maximum of over the QM-permissible probabilities Signaling .

The bottom part of Table 4 reports the probabilities for . There is less variation within the subtables and, in particular, the relative frequencies resemble the probabilities of the target state much better, and also those of the QM-MLE.

This observation can be quantified, for which purpose we use (a variant of) the Bhattacharyya angle Bhattacharyya:43 between two sets of s, computed by the following algorithm. First, for each set of s we introduce a corresponding set of s in accordance with

 p(S)αβ = 4γq(S)αβif\ αβ≠00, p(S)00 = (1−γ)+4γq(S)00; (30)

the s are positive and have unit sum,

 ∑S∑α,βq(S)αβ=1, (31)

as implied by Eqs. (III.1) and (4). Second, for any two sets of s we compute the Bhattacharyya fidelity ,

 FB(p,p′)=∑S∑α,β√q(S)αβq′(S)αβ, (32)

and then the Bhattacharyya angle

 ϕB(p,p′)=cos−1(FB(p,p′)), (33)

whereby and . The smaller the value of , the more similar are the two sets of probabilities.

Table 5 shows the Bhattacharyya angles between the relative frequencies and the probabilities of the target state and the two MLEs, both for and for . Clearly, the relative frequencies resemble the target-state probabilities and the QM-MLE probabilities much better for than for ; for the LHV-MLE probabilities, the difference between the angles for the two values is minimal and it originates entirely in the implicit dependence of the Bhattacharyya angle that we introduce in Eq. (VI.1.1).

We close this discussion with a look at Fig. 3. It shows, for between and , the maximum value of the likelihood on the set of QM probabilities (solid black curve) and on the set of LHV probabilities (dashed black line). We observe that the QM value ranges over very many orders of magnitude while the LHV value is independent of in this interval.

This assures that there is no need for an accurate value of if one is only interested in the LHV description for the experiment when, for example, refuting the LHV hypothesis on the basis of small p-values. In our Bayesian evaluation of the data, however, we look for evidence in favor of QM in addition to evidence against LHV, and the QM treatment of the data requires an accurate value for . It is fortunate that we can estimate reliably from the data themselves.

The grey strip in Fig. 3 marks the values for which the probabilities of the QM-MLE violate a Bell inequality of the Eberhard kind Eberhard:93 ; is clearly outside. This tells us once more that the actually observed data are typical for but not typical at all for