1 SETI as One-Shot Hypothesis Testing
As mentioned, communicating and detecting
are, in general, related but distinct tasks. Moreover, detecting whom one wishes to communicate with is necessary before trying to extract information from a physical system that is encoding a message. Abstractly, one can view the receiver in SETI as taking in massive amounts of data in various forms and trying to determine if it is the result of natural processes (the null hypothesis) or processes engineered by an extraterrestrial intelligence (the alternative hypothesis). Without essential loss of generality, we can think of the receiver device as taking in many bands of electromagnetic waves.111We say without essential loss of generality in the sense that we presume sound and smell will not be used, and even if the signal were not electromagnetic itself, such as for inscribed matter, one must interact with it through electromagnetic radiation (sight, sensing equipment, etc.) which may then be data-processed to more coarse-grained properties. Formally, by discretizing time and power of the input signals, at each time step the detector reads an outcome , where is a finite alphabet of possible detector readings. Therefore, if the time monitored includes time bins, the total observed sequence would be .
As such, SETI supposes the sequence came from either the natural process distribution, , or the ETI process distribution, . Note that may be a mixture of many possible extraterrestrial received signals (detected sequences). The receiver then must ‘decode’ whether this sequence corresponds to or . As noted in [cover2006, Section 11.7], an intuitive way of ‘decoding’ is to define a set of sequences so the null hypothesis is declared if a sequence in is observed and the alternate hypothesis is declared otherwise. We however relax this decision function to be probabilistic, and sufficiently general to encapsulate both quantum mechanical and classical signals. (See SI for a brief introduction to quantum information theory.) Given
, we define a vector space
with basis vectors pertaining to the possible sequences. Probability distributions over the original sequence space, such as the null and alternative hypotheses, , can then be written as diagonal matrices contained in the space of linear operators from to itself. We will denote the space of probability distributions over in the vector space by . A deterministic classical decision function can then be written as a projector
where we have used bra-ket notation. The type I and type II error probabilities of this decision function,and respectively, can then be expressed as
where is the projector onto the orthogonal complement of . Given this intuition, we then can generalize to consider any (quantum) probabilistic decision function using the transformations:
is the identity matrix on, are (finite) quantum probability distributions over , and is an arbitrary positive semidefinite operator such that is also positive semidefinite. It follows forms a positive-operator-value-measurement (POVM), which means it represents a physically implementable measurement device. In the sequel, we remain agnostic to whether are quantum or classical unless stated explicitly. We denote the set of quantum probability distributions as . Note that , and so any statement that holds in the quantum case includes the more common classical case.
Before we move forward, we note two points. First, the arrival time of a signal in SETI is unknown. As such, its starting time in the sequence is best viewed as distributed over all time steps according to a (hidden) random variable. Furthermore, for any length of time that we consider, should be seen as greater than the value that upper bounds the arrival time . This is because if the signal never arrived, one would be considering the signal sampled from regardless. Also note that if there were a fixed delay, would no longer be a random variable, but a constant, reducing to a distribution which samples from until some time . This means the test should be unaffected by the delay since one can remove the elements of each sequence for times less than . This mirrors the insight that communication capacity with fixed delay is the same as capacity with no delay [Chandar12].
1.1 Information-Theoretic Limits of Detection of ETI
With this formalization, we can view the SETI problem from the sender’s side or from the receiver’s side. We begin with the receiver’s problem of constructing the optimal binary test. This depends on the electromagnetic radiation one is trying to detect, as this determines the alternative hypothesis and the cone of space the detector takes information from, and in turn determines the noisy channel and the null hypothesis . With these fixed, the problem is just designing the optimal decision function. This is fundamentally a one-shot problem where at best the receiver saves the whole sequence and adaptively updates the decision function as more signals are acquired (i.e. as increases). Assuming and are not mutually exclusive/orthogonal, the error probabilities (1) cannot be zero, and so the tradeoff between these must be considered.
Given the consequence of claiming the detection of ETI, the probability of false alarm should be bounded above by , which is small, and then the probability of missing an ETI’s signal should be minimized under this constraint [Neyman1933]. Formally, the receiver aims to minimize while guaranteeing for . To write the optimization problem cleanly, note that
where the second equality follows from the unit trace of , as it is a (possibly quantum) probability distribution. By the definition of , (1), and the identity (2), the optimization problem that determines the optimal in this setting is:
Note that if are classical, the optimal decision function is also classical. Moreover, in this case, the optimizer is guaranteed to be diagonal, so (3
) reduces to a linear program (LP) which can be efficiently solved for large data sets. More generally, (3) is always a semidefinite program (SDP), so if is small, it can be efficiently evaluated. Most interestingly, (3) is the argument of the -hypothesis testing relative divergence [wang12], which is therefore the fundamental limit for ETI detection.
The generalized Quantum Stein’s Lemma [generalizedQuantumStein] determines the fundamental limit of one-shot hypothesis testing for a large class of approximately repetitive signals. Informally,222See the Supplementary Information for the formal statement. it states that if for any number of time steps , the set of possible alternative hypotheses, , is closed, convex, only contains permutation-invariant hypotheses, and satisfies a few other consistency conditions, then there exists a sequence of decision functions such that the asymptotic type II error is given by:
The requirement that the null hypothesis is identically and independently distributed (i.i.d. ) is not restrictive since natural processes, such as noise from space, are i.i.d. The primary limitation of this theorem is that it requires the alternative hypotheses to be permutation-invariant over time. However, for repetitive signals, this seems reasonable as one can often model the initial signal as i.i.d. over some time scale, and as long as the noise the signal experiences is memoryless, the received signal will be i.i.d.
In the case the ETI signals that are finite time, we prove convergence of the decision function (see SI for derivation).
For all , a finite set of signals with finite maximum length and a finite set of possible i.i.d. null hypotheses has an optimal asymptotic type II error achieved in finite time. Moreover, it may be determined using a semidefinite program.
Note this holds for multiple i.i.d. null hypotheses unlike the generalized quantum Stein’s lemma. This result along with the quantum Stein’s lemma covers all cases relevant to SETI except unbounded uncertain arrival time and infinite length non-periodic signals, neither of which one would expect to converge in general. We refer to the Supplementary Information for further analysis.
Quantum or Classical Information
Given these limits for the sender, we might ask if quantum signals could provide any advantage. We first note that one common issue in using quantum information to one’s advantage is the need for aligned reference frames [bartlett07]
. This arises because one uses quantized degrees of freedom of an object to transmit data (for example, the polarization of single photons). It follows one not only needs acodebook for quantum information theory, but a notion of alignment between the two parties’ reference frames (for example, party may send a single photon in what they view as horizontal polarization, but at arrival it is diagonal polarization as defined by party ). In this work, this issue is not explicit because by defining the distributions to evaluate (3), one would have to define the hypotheses in the fixed local reference frame. We do note however this suggests whatever signal is to be sent should not rely on the alignment of reference frames, which at least complicates the advantages of quantum mechanics. The possibility of macrosopic quantum signals [Friedman00] or superpositions of degrees of freedom that do not rely on a reference frame [Loredo19] do, however, allow the possibility of quantum ETI signals.
If quantum signals are possible, it is further possible they provide an advantage. For example, one possibility for an advantage comes from sending highly entangled states. If the receiver assumes detecting many entangled states is not a common natural process, then under the i.i.d. assumption and in the asymptotic limit, the type II error rate would be non-zero for any classical null hypotheses, and the worst case asymptotic behavior would be given by the relative entropy of entanglement [vedral1998entanglement] of the alternative hypothesis. This is distinct from the classical case, since if the alternative hypotheses are classical, no such general claim could be made. Yet, our current technologies would not be able to preserve entanglement between particles sent over many light years. Beyond this aspect, we note that for all relevant classical distributions, if quantum and classical signals were equally achievable and the assumed local reference frame were correct or not relevant, we can say that quantum signals could only help, as is stated in the following proposition. The proposition can be viewed as an immediate consequence of the data-processing inequality for (proven in the SI).
For any classical distributions , if quantum signals are implementable, we can achieve at least the same optimal type II error, , using quantum signals. Moreover, there exist cases where the advantage is strict.
Moving from receiver-side to sender-side analysis, the goal is to construct a signal that can be decoded with small error probability. Given the previous analysis, there are two options. The first is to simply construct a ‘single’ signal which could not be generated by nature (with almost any probability). This leads to and being largely orthogonal, which allows for a hypothesis test such that and being small as follows from optimization problem (3). Such signals, however, may require significant energy. Examples of such ‘single’ signals include inscribed matter [Rose04] and possibly overhead meteors, which we discuss in subsections 2.3 and 2.2 respectively. The second is to send a series of i.i.d. signals through a memoryless channel to utilize the Quantum Stein Lemma. To do so, the senders ought to conjecture a memoryless channel so as to build a memoryless device such that the received message is distinguishable from the sender’s assumed noise model, at least asymptotically. Formally, if the device and noise are memoryless processes, the hypotheses are of the form and where is the distribution over signals the device produces each pulse and is the distribution when the device is off, which may be taken to be vacuum. Given these conditions, by the Quantum Stein Lemma, the transmitter aims to construct a device, i.e. distribution, such that its source distribution satisfies some set of constraints. Denoting the feasible set of devices under said constraints as , the optimal source is then
where we note that this optimization problem can be unbounded if there exists such that the null hypothesis does not lie in the support of the alternative hypothesis. In that special case, there is perfect distinguishability asymptotically. Examples of signals for which this optimization problem applies would be radio signals or the use of transits, which we discuss in subsections 2.1 and 2.4 respectively.
1.2 Analyzing Measured Data
Note that our discussion so far has been in terms of comparing processes as a way of evaluating preferable methods of sending/receiving ETI announcements while taking the one-shot nature of the problem seriously. Moreover, the decision function once computed could be used on incoming data, though this would be under the assumption the incoming data was truly from one of the two hypotheses. One problem that may seem somewhat distinct from this is that of having obtained data and then trying to make a decision based on this data. This does not make much of a difference in the one-shot setting as there is no difference (assuming classical data) between storing the data and then implementing the decision function once and implementing the decision function as the data comes in:333This is only true if one does not condition on something in the observed data.
Denote the obtained data by .
Construct model(s) which give rise to probability distribution(s) over such that the probability of is non-zero for each.
Choose . For each , solve for the optimal error (3). While the value of is not relevant in the case of obtained data, the optimizer of the problem is the optimal decision function , where we have added the superscript as it is also a function of in general.
Let represent the obtained data in the vector space.
If , then the data should certainly not be considered evidence of the alternative hypothesis.
Otherwise, implement the decision function and apply it to input .
Although a positive decision is not definitive in determining the presence of ETI, as one cannot guarantee the assumed model holds, if nothing else this gives a rigorous way of eliminating possible data for any given model when by using our framework. Of course, for finite sets of finite length signals (Theorem 1) or convex sets of i.i.d. signals (4), strong conclusions may be drawn for reasonable sets of hypotheses. In other words, the basic approach can be extended to any generalized hypothesis testing setting such as universal or composite testing [Levitan02, Berta21].
2 Analyzing Specific Kinds of Signals
Having introduced a new formalism for analyzing detection and announcement for SETI, we now consider previously-proposed signaling methods for announcing the existence of a civilization under this framework both in terms of fundamental limits as well as using numerical tools availed to us by -hypothesis testing being an SDP (LP for classical distributions). We consider both orthodox and unorthodox proposals to better see the generality of framing SETI as hypothesis testing.
2.1 Electromagnetic Signals
Perhaps the most orthodox approach to SETI is sending radio signals, though more recently the consideration of laser signals (continuous wave and laser pulse) has grown [kingsley01]. Roughly speaking, in this approach the limiting factor is the power of the transmitter [Shannon49]. Clearly if the signal had enough power, the signal would be detectable, much in the same way the capacity of a Gaussian channel is limited by the power. In principle there is the further issue of how many planets the civilization would like to signal at once which will increase the number of transmitters necessary (and the total amount of necessary energy). If one could generate a sufficiently powerful burst from a laser, assuming it were detected, it would be sufficient. However, it is commonly held that an ETI would more likely periodically pulse a laser at their target, due to the limitation of generating sufficient energy. For periodic signals, the longer the time-span the signal is sent, the closer one is to achieving the Stein’s lemma limit in our framework. Therefore we can make predictions about an optimal transmitter under given power constraints using (5). For example, in the case of an average and peak power constraint, (5) might be written as:
where is the initial distribution of the signal, is a function that calculates the power cost, is the upperbound on the power, is the noise at the receiver’s end, is the noise during the transmission, is the noise from the sender’s end, and is the expectation of . Fixing the noise models, this gives one a close approximation of the fundamental limit of the distinguishability of the signals and the probability of false positive detection as a function of using (5).
2.1.1 Distinguishability Does Not Universally Necessitate Strong Signal
We now consider a simple but counterintuitive example using this equation. A more in-depth derivation is presented in the Supplementary Information. As the example is fully classical, for simplicity we view probability distributions as vectors in bra-ket notation, so that we can express the distributions by the non-zero probability sequences.
Consider a pulsed laser. Assume one discretizes the total signal as a sequence of length . For clarity, we let . The alphabet for each element of the sequence is the interval by discretizing the power and choosing a cutoff for the possible power of an observed signal.444One reason for such a cutoff is tolerated input of the device. Assuming the laser is a square pulse, the expected optimal choice of the initial distribution could be written as , i.e. a delta distribution. We can imagine that while there is no noise at the source, there is memoryless jitter in the laser which with probability shifts the sequence forward or backward one time bin. We therefore define the distribution
We assume that the noise during travel is loss-only, so for each time bin, the map is applied, where is a function of the distance travelled and possibly the conditions over the travel path. Finally, we assume the noise at the receiver is the composition of two maps. First we assume the data is taken over a short enough time (as lasers can pulse reasonably quickly) that the sun is additive power so that for each time bin the map . The second map assumes with some probability there is any given possible sequence.555Technically, the introduction of this map is to guarantee absolute continuity for the sake of our example. The ad-hoc introduction of such a map to guarantee this is largely an aspect of the simplicity of our model. However it is in general a rigorous way to guarantee both the null and alternative hypothesis are full rank so as to guarantee a finite value, and, by the data-processing inequality for relative entropy along with the Chernoff-Stein lemma [cover2006, Theorem 11.8.3], we know we can only have made the asymptotic error exponent worse by doing this. This is modeled by a linear map on distributions, , where is the all-ones vector and . Given these maps, one can determine from . Under the assumption , one finds that so long as , is the same constant. As the assumption implies implies , the asymptotic error rate for all powers in this range is the same.
While this model is extremely simple, and so we would not expect such independence to hold in standard cases, it exemplifies the important conceptual aspect of the problem: the goal of the signal is to distort the sample enough to be distinguishable from the pure noise case, and it is not a priori necessary that the transmission must significantly overpower the noise to achieve this. We note this point had been made previously quantitatively in [kingsley93], this example simply shows a particularly simple situation where this holds.
Finally, while in many cases the optimal signalling device may seem obvious (pick the largest average power allowed), in more elaborate cases it may not be the case, which would give an advantage to analyzing the process using (6). Moreover, even if the optimal is straightforward, we will see in a later example (Subsection 2.3) that in the one-shot setting the error probability of the optimal decision function may not scale linearly in resources for generating the ETI process, in which case further tradeoffs may be worth considering.
2.2 Near-Earth Projectiles
A less orthodox approach of recent interest is near-earth projectiles. Most generally, we take near-earth projectile signals as the construction of any series of macroscopic objects (projectiles) which are directed in a trajectory which passes near the Earth without colliding into it. It seems unlikely that it would be most efficient to construct a large number of such objects, and it seems perhaps most rational to expect there to be only a few such projectiles in the message. In this case the one-shot nature of the problem is very important as the data will only be collected once before the projectiles continue on their overhead trajectory or burn up in the atmosphere, and so it is crucial to have some notion of error probability of a false positive for such a signal detection, which is exactly what the one-shot hypothesis testing interpretation provides.
Indeed, the ability to handle the false positive probability in this setting has become a reality given the recent interstellar object that passed through our solar system, ‘Oumuamua, and the debate as to whether its origins were ETI or natural [Loeb21, bannister19]. While there have been arguments that ‘Oumuamua was of ETI origin [Loeb21], analysis concluded ‘Oumuamua was most likely of natural origins, while noting for some not-yet-explained aspects [bannister19]. The arguments presented in [bannister19] are largely about considering different observed properties of ‘Oumuamua and how they deviate from expected observations. This is exactly what one-shot hypothesis testing does in a mathematical sense. Note that this is trying to make a conclusion on already obtained data, and so the methodology of Subsection 1.2 applies.
2.2.1 Simple Numerical Example
We consider a toy example of overhead meteors to show the application of our framework. Meteors often burn up in our atmosphere. Indeed, this happens consistently enough that it is used as a tool in telecommunications by bouncing signals off of the ionization trail of the meteors, known as meteor bursts [meteorBurstCommunication]
. Both meteors simply falling and meteor bursts are generally held to be Poisson processes, and both have data consistently collected on them, so a reasonable approach to an ETI signal would be to produce meteors that differ from how we expect, as this would at least be recorded. Therefore, assuming an ETI that knows that meteors are not uniformly distributed on every planet, an ETI may send a small number of meteors in rapid uniform succession at the Earth for a short period. It would make sense for the meteors to be small to save energy and to guarantee they do not harm the Earth. We therefore can compare the meteor detection when the ETI meteors are and are not included and look at how distinguishable the two cases are.
Mathematically, as meteor bursts are a Poisson process, for any interval , , where may depend on many things, such as the time of day and of year [meteorBurstCommunication]. For simplicity, we assume a scaling such that and assume that at least the start of the ETI message begins in this time interval. We look at the probability of missing an ETI signal as a function of how many ETI rocks appear over the time interval for two choices of and three choices of which corresponds to maximum allowed error probability using (3) to compare the original Poisson process and the Poisson process with this additive noise. We note that our numerical analysis must be finite, whereas the Poisson process has a countably infinite number of outcomes. This can be rigorously handled by truncating the tail of the distribution, given the tail property of the distribution and the data-processing inequality. For completeness, this is elaborated on in the Supplementary Information. We numerically construct results for our simple example in Figure 1.
(i.e. larger variance). Each plot is plotted for three different tolerances of false positive probability.
General properties of the method can be seen in Figure 1. First, the optimal error probability does not in general scale linearly in . This can be seen in the graphs as is the constraint and as shrinks,
does not change linearly. This is one reason it is advantageous for it to be computationally efficient to construct the optimal decision function for more general models. Second, it is highly sensitive to the null hypothesis. For example, we see that as the Poisson distribution broadens (asgoes from three to six) and because the ETI signal is additive, the number of meteors to signal with low risk of false negatives increases. This implies an increase in needed energy for this method, thereby allowing one to infer the feasibility/cost-benefit of this signal from the ETI’s perspective conditioned on their knowledge of the receiver’s local atmospheric conditions.
2.3 Inscribed Matter
A related but distinct approach to near-earth projectiles is inscribed matter [Rose04]. In [Rose04], the authors show that under many circumstances one can encode and transmit more information for less energy by encoding information densely into matter. It follows that the known advantage of inscribed matter only holds for messages carrying a lot of information, which is the opposite information-theoretic regime than the one-shot detection problem we consider here. Regardless, one can view communicating with inscribed matter as a signal for ETI detection that simply happens to have more data on-board— a strategy that seems quite reasonable and has been studied in other settings [Varshney13, Varshney19]. However, if one sends inscribed matter (that should get trapped in orbit about some planet or successfully crash land), one would expect the energy cost to largely be a function of the distinguishability from its new local environment, as was observed in our previous example. In particular, since increases under data processing, one expects one wants macroscopic design properties that would not get coarse-grained so as to be detectable, and this could lead to energy costs not considered in [Rose04].666In [hippke18inscribedmatter], they suggest that the optimal method is to send inscribed matter shielded in a long cylinder. It is not clear that this would be optimal for detection however. This question, at least from the sender side, is well suited for investigation via a (non-linear) variation of the one-shot hypothesis testing optimization problem:
where, is the Euclidean space the designed signal is defined over, is an energy cost function, is a constraint on the total energy, and is a linear map representing the noise introduced to the design during transmission. Therefore, we believe the one-shot hypothesis testing framework remains relevant for inscribed matter approach, because, while it makes a good case for it being more energy efficient, it does not escape the detection problem that one-shot hypothesis testing encompasses.
One final unorthodox method proposed for SETI is to look for signals of large extraterrestrially-engineered objects orbiting stars, referred to as artificial transits [Arnold05a, Arnold05b, Arnold13]. The initial motivation for this approach is that we might achieve such detections in our search for exoplanets because the stellar flux detected is dependent on the shape of the transit, and so an artificial transit with strange shape could be detected. This was numerically demonstrated in [Arnold05a] where the author compared various simulations of transit signals.
Like the previously mentioned methods, the primary limitation seems to be energy. Whereas the advantage of inscribed matter was the amount of data that can be sent as a function of energy, the advantage of artificial transits is both that they could be discovered in standard astronomical research and that they can stay in orbit for a long time. The advantage of this presented in [Arnold13] is that it allows for a signal (the pulsed stellar flux) over a much longer time scale than pulsed electromagnetic radiation from a laser, which is both limited by continuously generating power and the decline of a given civilization. Artificial transits also have the advantage of not needing to be aimed like the other methods generally would, as they continue to orbit around the star, i.e. they seem to be the best proposed broadcast signal to-date.
The duration of the orbiting process suggests another advantage of constructing transits beyond those given in [Arnold05a, Arnold05b, Arnold13], which is that it is the longest lasting i.i.d. signal and so the most promising to achieve the fundamental limit of hypothesis testing.777There is the previously noted caveat that the signal will only be approximated as i.i.d. if the noise is memory-less, but this problem is not unique to artificial transits. This only adds to the credibility of this possible method.
Finally, we note an open problem in the quantitative analysis of SETI through one-shot hypothesis testing that may be beneficial for future analysis. In [Arnold05a] it is noted that artificial transits whose projected cross-section is triangular produce a detected stellar flux waveform that is similar to the waveform generated by a planet with rings which could complicate at least that choice of cross-section’s ability to be detectable. Using the one-shot hypothesis testing optimization program on discretized waveforms could provide a stronger understanding of this particular complication. Of course, this could then be extended to analyze various signal forms, providing a quantitative benchmark for which signals forms seem more likely.
3 Conclusion & Outlook
In this work we have presented a new interpretation of SETI as one-shot hypothesis testing. The crux of the argument is that communication with ETI civilizations is a related but distinct task to that of detecting said civilizations. Specifically, detection is the answer to a ‘yes’ or ‘no’ question, ‘is this process natural or generated by extraterrestrial intelligence?,’ whereas communication is exchanging significantly more information. Moreover, we stress the one-shot aspect of SETI as the signals we are trying to detect may be brief and/or non-i.i.d. Using these insights we show how SETI can be formalized as one-shot hypothesis testing, and present the optimization problem which constructs the optimal decision function for a hypothesis test between an ETI process and a natural process, where optimality is in terms of minimum false negatives given some demand on the rarity of false positives for the decision function. In the special case of (mixtures of) periodic ETI signals, such as from a pulsed laser or an artificial transit, we use the generalized Quantum Stein’s lemma [generalizedQuantumStein] to recover the fundamental limit of hypothesis testing in these settings. This in turn dictates how the sender should design their signal, (5).
To clarify that viewing SETI as a case of one-shot hypothesis testing does not hold only in the abstract, we considered various proposals for ETI signals and explained how they relate to the one-shot hypothesis testing framework. Moreover, we presented a simple numerical example to illustrate how to analyze given proposals by making use of the data-processing property of hypothesis testing and that the one-shot hypothesis testing optimization problem (3) is a semidefinite program in the general case and a linear program in the common case where all data is classical.
As final remarks, we note where one could further this line of investigation. The most natural critique with this formulation is that we are considering binary hypothesis testing, and so one cannot make conclusions about multiple alternative hypotheses at the same time. Of course binary hypothesis testing stemmed from the argument that we are trying to detect a ‘yes’ or ‘no,’ question— we are not concerned with comparing different alternative hypotheses, just distinguishing whether a received signal is from an ETI civilization or not. This conceptual point is lost if we consider discriminating between alternative hypotheses. Secondly, we have shown the ease of the one-shot hypothesis testing optimization problem (3). Indeed, in the case where all the data is classical, the problem becomes a linear program and, given the modern state of linear programming, this could be implemented over realistic large data sets in reasonable time for practical benefit. However, the optimization problems pertaining to cost constraints (5), (7) are not so trivial as they are concave and nonlinear optimization problems respectively.888(5) is only a concave optimization program if the constraint set is convex. Otherwise it is a nonlinear optimization problem as well. Due to this, while we think the problems could be useful regardless and possibly even tractable some of the time, further investigation would be necessary for their application. Regardless, we believe viewing SETI as one-shot hypothesis testing can provide rigorous quantitative analysis techniques and lead to new insights in the field.
We acknowledge Kartik Kumar Kansal as a member of the initial project that led to this work.
4 Supplementary Information
4.1 Quantum Information Background and Previous Results
In this appendix we state the quantum information definitions, constructions, and properties that we will need. We refer to [WatrousBook, Wilde2011, Tomamichel2016] for further information.
Given a finite alphabet , there exists a complex Euclidean space . Denote the set of positive semidefinite operators over this space as . A quantum probability distribution over is such that . The set of quantum probability distributions over the space is denoted . We denote the set of classical probability distributions over the same basis by .
A quantum probability distribution is a generalization of a probability distribution as by the spectral decomposition theorem may always be diagonalized, and, as it is trace one and positive semidefinite, this diagonalization has all entries between zero and one such that they sum to one. In other words, the diagonalization of a quantum probability distribution is a classical probability distribution over some preferred basis. Furthermore, sampling i.i.d. from a quantum state times is the same as imagining you have access to copies of the state, i.e. considering the state is the quantum version of considering times for some distribution over , where is the Kronecker product.
Definition 2 (Bra-Ket Notation).
For a fixed basis of a complex Euclidean space of dimension , the basis vectors can be denoted by . Where is the column vector with a one in the element.
In the main text we make use of taking classical Cartesian products of sequences of elements of finite alphabets, to a vector space . This can be done in the following manner as is clear from the above definition. Imagine that is a -length sequence of elements from a finite alphabet . By overloading notation, define a complex Euclidean space whose basis vectors are . Then using the Kronecker product, , the complex Euclidean space is a vector space whose basis vectors are the -length sequences of the basis vectors . That is to say, is a vector space whose basis vectors are the possible sequences contained in .
Definition 3 (Born’s Rule).
Given a complex Euclidean space and the space of linear operators over to itself, , Born’s rule states that for an operator and a quantum probability distribution , the probability of observing the property corresponds to is given by , .
Given finite alphabets , a quantum measurement on with output over is a function that satisfies the constraint
This can be interpreted as saying the outcome of the measurement device are indexed by , and the measurements that result in said outcome are defined by the operators by Born’s rule. Note that a decoder that outputs classical symbols over a finite alphabet from an input linear space over is always a quantum measurement .
One-Shot Hypothesis Testing Background Results
Here we state the results relevant to the one-shot hypothesis testing: the generalized Quantum Stein’s Lemma [generalizedQuantumStein], its reduction to the standard Quantum Stein’s Lemma, and the Data Processing Inequality for one-shot hypothesis testing.
[generalizedQuantumStein] Let be a set of sets satisfy the following for all :
is closed and convex
contains where is full rank
If , for all
If , for all where is the unitary that permutes the copies of according to the permutation .
Under these conditions, it holds that for any , there exists a sequence of two-outomce POVMS , such that and for all , ,
Note this can be simplified to the standard (Quantum) Stein Lemma by letting for all as then the equation simplifies to , which implies a single equation to evaluate for the i.i.d. case.
The requirement that are closed convex sets allows this fundamental limit to hold when one wishes to consider mixtures of multiple i.i.d. alternative hypotheses at the same time. This cannot be handled by the standard Quantum Stein Lemma as convex combinations of i.i.d. signals are not i.i.d. in general. As mentioned in the main text, for non-repetitive signals, where permutation invariance over time does not hold for many signals, the limit of as time grows must be considered directly. We note that if one models the unknown arrival time of the signal by a random variable , the alternative hypothesis will not be permutation invariant over time. This is because the alternative hypothesis will be of the form
where is the distribution before the signal arrives at time , is the distribution of the signal, is the distribution after the signal ends at time , and is the partial trace over the random variable , constructing the hypothesis from which we sample. We note that there are asymptotic limit results that hold for the classical regime for handling Gauss-Markov processes [Sung06] and hypothesis testing for sets of null and alternative hypotheses in the ergodic settting [Luschgy93]. Moreover, in the quantum setting, if one can guarantee they are only interested in i.i.d. signals, [Berta21] may be used instead of the generalized Quantum Stein’s lemma.
In Section 4.2 below, we prove for a finite set of i.i.d. null hypotheses and a finite set of alternative hypotheses which end in finite time and return to the i.i.d. null hypothesis converge in finite time (Theorem 7). This applies for any finite length message as well as unknown arrival time distributions, so long as the unknown arrival time is assumed to take a maximum value such that with the longest finite time message , , where is the final time bin for the decision function. As the generalized Quantum Stein’s lemma implies convergence for i.i.d. signals, the only condition not handled by these two results is when the arrival time and message length do not satisfy the above conditions. However, in that case it is not obvious this would in general converge.
Finally, we present the following lemma which allows us to conclude that the error exponent can only increase under not only data-processing, but also post-selection of the data. We note this property has been proven in the literature previously [Wang19].
Lemma 3 (Data-Processing of -Hypothesis Testing for Trace Non-Increasing Maps).
Let be a completely-positive trace non-increasing map (i.e. if , and for all operators , ). Let , and . It follows,
We prove the property for the inequality, the other follows directly from the definition of in terms of . Let be feasible for , i.e. . By definition of the adjoint of a linear map, we have
Moreover, is completely-positive and sub-unital () as is completely-positive and trace non-increasing. As ,
The relations (9) and (10) imply that is a feasible point for . Therefore, using that is a minimization problem and that we considered arbitrary freasible (so we have included the minimizer) we may conclude . ∎
Proof of Proposition 1
Proof of Proposition 1.
The first half of the proof is an immediate consequence of Lemma 3 as can be seen in the following manner. Consider any classical distributions Recall these are diagonal matrices. Consider any quantum states such that the diagonal entries of are the same as respectively. Consider the pinching channel which zeros out every non-diagonal entry of a matrix. It is easy to verify this is trace-preserving and completely positive, and thus a channel. Thus by data-processing (Lemma 3), for any , Consider any
This proves we can always do at least just as well with quantum signals.
To show there exist cases where the quantum advantage is strict, we consider the following simple but physically reasonable example. Let and . This models the scenario where the null hypothesis is signal 1, whereas the alternative hypothesis contains either signal 1 or signal 2 with equal probabilities. Now, for any , the requirement that implies , where the first inequality follows from the fact that is positive. Therefore . However, if the alternative hypothesis is (i.e., a quantum superposition of signal 1 and 2), then we can choose (note that ). This particular detection gives , making it a feasible decision function, and if . This proves that there exist quantum strategies that achieves error probability strictly less than . ∎
4.2 Decision Function Convergence in Finite Time for Finite Sets of Finite Time Messages
In this section we prove that if all signals are finite time and the distribution returns to the i.i.d. null hypothesis after the signal ends, then the decision function can be guaranteed to converge. The primary lemma is to show when the alternative and null hypothesis become the same and independent of the message, the decision function will not improve. While intuitive, to the best of our knowledge, this has not been proven in the generality we consider. Our primary tool in proving this is strong duality for semidefinite programs which we summarize first. This is the presentation given in [WatrousBook], except that we have the primal problem be a minimization rather than dual. This is known to be equivalent.
Semidefinite Programs and Duality Theory
Let be a Hermitian-preserving map, , and . A semidefinite program is a triple , with the following associated optimization problems:
These sets are referred to as the feasible set of the primal problem and dual problem, respectively.
By weak duality, for all semidefinite programs, the optimal value of the primal problem, denoted by , is always greater than or equal to the optimal value to the dual problem, denoted by . If a semidefinite program has that , it is said to have strong duality. A sufficient condition to show strong duality for SDP is Slater’s condition.
(Slater’s Condition) For a semidefinite program , if is finite and there exists a Hermitian operator which strictly satisfies the dual problem, that is, , then and the optimal value is obtained in the primal problem.
4.2.1 Derivation of Result
First note that we could re-express in the following manner:
This obtains the same value as as for any feasible the optimal choice for is . It is then straightforward to generalize this to consider a finite number of null and alternative hypotheses. Let be finite sets with index alphabets respectively. Then we define the following optimization problem:
where it is clear that if , the problem simplifies to , which justifies the notation. Furthermore, if , then
is the optimal decision function such that the type I erroris less than for all null hypotheses and is the minimum type II error that will hold for all the alternative hypotheses at once. This confirms this is the definition we want.
where are all slack variables to satisfy the equality in the standard form (11), , means that the operator is defined over the whole space but for our purposes we only need to label these diagonal blocks, and diag means that the operator is block-diagonal.
With this, we need to obtain the adjoint map of . To do this, we first note that the adjoint map of is for Hermitian as can be verified:
where we used that we assumed is Hermitian. Using this, one can determine that the adjoint map is given by
where we have used . This allows one to write down the dual problem which after simplification is of the form
With the primal and dual problem specified, we can now prove strong duality of the problem.
We will use Slater’s condition (Theorem 4). First note is always finite as it is lower bounded by zero. Therefore all we need is to prove there exists a dual feasible point with strict feasibility. Let . Let for all and for all . Let , where
is the positive eigenspace of the Hermitian operator. Then all the inequalities hold strictly, and so this is a solution which is strictly feasible. ∎
Let be finite sets and . It holds
where and .
Let and be the optimizers for the primal and dual problem of respectively. They obtain the same value as strong duality holds under these settings (Lemma 5). That is to say, . Consider and . This is feasible for the primal problem of as
where we have used the multiplicativity of trace over tensor products and thathas unit trace. Next consider , , . This is feasible as
which is always true given being the optimizer of . Furthermore, by again using the unit trace of . Thus we have constructed primal and dual optimizers that achieve the same value, showing they are optimal, and obtain the same optimal value as , which completes the proof. ∎
With this property of the one-shot hypothesis testing SDP proven, we may prove that the optimal decision function converges in finite time for finite size messages if the null hypothesis is i.i.d. and the received signal returns to the null hypothesis after the end of the transmitted message. We note this final point is not limiting as if it were not to return to the null hypothesis, then in effect the message has not ended.
For all , given a finite set of finite length signals of maximal alphabet size and a finite set of possible i.i.d. null hypotheses, , the optimal asymptotic type II error is achieved in finite time and characterized by .
Let . We begin with a single null hypothesis. Consider a null hypothesis for any . Consider a finite set of possible signals, . Under the assumption that once the signal ends one starts sampling from the null hypothesis, the type II error is given by for all . By lemma 6, for all . Thus in this setting .
Now consider a finite set . We assume that once the message ends, it samples from the considered null hypothesis. Formally, this means the only null and alternative hypothesis pairs assumed acceptable are of the form where , and we stress that is the same in both parts of the pair. It follows for fixed , the relevant quantity is as this is the single null hypothesis case. This holds for all . It follows that the minimal asymptotic type II error which has at most type I error for is given by . That is, in this setting, . ∎
4.3 Models for Examples
In this section we provide complete calculations of the examples in the text.
Distinguishability Does Not Universally Necessitate Strong Signal
We consider a finite alphabet and sequence , where . The distribution over this sample space will be sampled an infinite number of times in an iid manner. As noted in the main text, the whole discussion will hold for sequences of any length , but we let for clarity. Following the exposition in the main text, we consider a signal generated by a laser which sends a pulse in one time bin of every five, but suffers from time jitter so that the pulse may be shifted by one time bin forward or backward with probability for both. This leads to an initial probability distribution
We assume that the noise during travel is loss-only and acts on every time bin identically by removing an amount of power , which is a function of the distance travelled and possibly the conditions over the travel path. The map may be defined on its action on the basis sequences as
for all sequences . We assume the noise at the receiver is the composition of two maps. First we assume the data is taken over a short enough time that the sun is additive power and can be described as the linear map
for all sequences . The second map assumes with some probability there is any given possible sequence. 999Technically, we assume the existence of this map so as to guarantee the support of the alternative hypothesis is contained in the support of the null hypothesis for the sake of our example. The ad-hoc introduction of such a map to guarantee this is largely an aspect of the simplicity of our model. However it is in general a rigorous way to guarantee both the null and alternative hypothesis are full rank so as to guarantee a finite value, and, by the data-processing inequality for relative entropy along with the Chernoff-Stein lemma [cover2006, Theorem 11.8.3], we know we can only have made the asymptotic error exponent worse by doing this. This may be defined as
where is the vector of all ’s and is any probability distribution over . Given these maps, and under the assumption , we have our null hypothesis and alternative hypothesis are:
where . Then using the definition the of the KL divergence along with the fact that only for the four specificied sequences, we have
where . Noting that was assumed, we see that the relative entropy is always a non-zero constant, which completes the derivation.
Overhead Meteor Numerics Derivation
As noted in the main text, a Poisson distribution has a countably infinite number of possible outcomes, but our numerical method requires finite dimensional distributions. This can be handled in the following manner: first we consider an alternate distribution which is the same as the Poisson distribution until an arbitrarily large but finite number . Then where is the cumulative distribution of the Poisson. Thus is a probability distribution on a finite number of mass points and can be made arbitrarily close to the original Poisson distribution. One could then define the alternative hypothesis with regards to . As is still too large, one chooses a maximum cut-off number and projects the distributions onto this truncated space without re-normalizing. This projection is a linear trace non-increasing map, and thus for a fixed , the error can only increase by constructing the decision function on this truncated distribution given Lemma 3. Using this truncated distribution we construct the results for our simple example without loss of rigour.