Agreement Implies Accuracy for Substitutable Signals

Inspired by Aumann's agreement theorem, Scott Aaronson studied the amount of communication necessary for two Bayesian experts to approximately agree on the expectation of a random variable. Aaronson showed that, remarkably, the number of bits does not depend on the amount of information available to each expert. However, in general the agreed-upon estimate may be inaccurate: far from the estimate they would settle on if they were to share all of their information. We show that if the experts' signals are substitutes – meaning the experts' information has diminishing marginal returns – then it is the case that if the experts are close to agreement then they are close to the truth. We prove this result for a broad class of agreement and accuracy measures that includes squared distance and KL divergence. Additionally, we show that although these measures capture fundamentally different kinds of agreement, Aaronson's agreement result generalizes to them as well.

Authors

• 18 publications
• 7 publications
• 20 publications
08/26/2020

Computing Information Agreement

Agreement measures are useful to both compare different evaluations of t...
01/21/2020

Explicit agreement extremes for a 2×2 table with given marginals

The problem of maximizing (or minimizing) the agreement between clusteri...
11/01/2018

Unique Information and Secret Key Agreement

The partial information decomposition (PID) is a promising framework for...
11/04/2021

Are You Smarter Than a Random Expert? The Robust Aggregation of Substitutable Signals

The problem of aggregating expert forecasts is ubiquitous in fields as w...
08/21/2020

Automating the assessment of biofouling in images using expert agreement as a gold standard

Biofouling is the accumulation of organisms on surfaces immersed in wate...
06/23/2020

Min-Mid-Max Scaling, Limits of Agreement, and Agreement Score

By using a new feature scaling technique, I devise a new measure of agre...
09/08/2021

AgreementLearning: An End-to-End Framework for Learning with Multiple Annotators without Groundtruth

The annotation of domain experts is important for some medical applicati...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Suppose that Alice and Bob are honest, rational Bayesians who wish to estimate some quantity—say, the unemployment rate one year from now. Alice is an expert on historical macroeconomic trends, while Bob is an expert on contemporary monetary policy. They convene to discuss and share their knowledge with each other until they reach an agreement about the expected value of the future unemployment rate. Alice and Bob could reach agreement by sharing everything they had ever learned, at which point they would have the same information, but the process would take years. How then should they proceed?

In the seminal work “Agreeing to Disagree,” Aumann aum76 observed that Alice and Bob can reach agreement simply by taking turns sharing their current expected value for the quantity. In addition to modeling communication between Bayesian agents, protocols similar to this one model financial markets: each trader shares partial information about their expected value on their turn (discussed in Section 5). A remarkable result by Scott Aaronson [aar05] shows that if Alice and Bob follow certain protocols of this form, they will agree to within

with probability

by communicating bits.111To ensure that each message is short, Alice and Bob share discretized versions of their estimates; we discuss this in Section 2. Notably, this bound only depends on the error Alice and Bob are willing to tolerate, and not on the amount of information available to them.

Absent from Aaronson’s results, however, is what Alice and Bob agree on. In particular, there is no guarantee that Alice and Bob will be accurate, meaning their agreed-upon estimate will be close (in e.g. expected squared distance) to what they would believe if they shared all of their information. In fact, they might agree on something highly inaccurate: suppose that Alice and Bob have independent, uniformly random bits , and wish to estimate the XOR . Alice and Bob agree from the onset, as from each of their perspectives, the expected value of is . Yet this expectation is far from the best estimate given their collective knowledge, which is either or . So while agreement is fundamental to understanding communication between Bayesians—in Aumann’s terms, they cannot “agree to disagree”—agreement is far from the whole story. An important open problem is therefore what assumptions guarantee that Alice and Bob are accurate once they agree.

We address this open problem by introducing a natural condition, called rectangle substitutes, under which agreement implies accuracy. Rectangle substitutes is a notion of informational substitutes: the property that additional information has diminishing marginal returns. The notion of substitutes is ubiquitous in optimization problems, and informational substitutes conditions have recently been used to analyze equilibria in markets [cw16]. We show that under the rectangle substitutes condition, any protocol leading to agreement will also lead to accuracy. We then extend these results beyond the case of squared error, to a broad family of measures of agreement and accuracy including KL divergence.

1.1 Overview of approach and results

In aar05, Alice and Bob are said to agree if the squared distance between their estimates is small. Likewise, we can say that Alice and Bob are accurate if the squared distance between each of their estimates and the truth is small. In Section 3 we present our first main result: under these definitions, if the information structure satisfies rectangle substitutes, then agreement implies accuracy. In other words, under this assumption, when two Bayesians agree—regardless of how little information they have shared—they necessarily agree on the truth.

The proof involves carefully partitioning the space of posterior beliefs induced by the protocol. Agreement is used to show that Alice and Bob usually fall into the same partition element, which means that Bob would not learn much from learning the partition element of Alice’s expectation. Then, the rectangle substitutes condition is used to show that if Bob were to learn Alice’s partition element, then he would be very close to knowing the truth.

Aaronson measures agreement in terms of squared error, yet other measurements like KL divergence may be better suited for some settings. For example, if Alice and Bob estimate the probability of a catastrophic event as and , respectively, then under squared error they are said to agree closely, but arguably they disagree strongly, as reflected by their large KL divergence. Motivated these different ways to measure agreement, we next ask:

1. Can Aaronson’s protocols be generalized to other notions of agreement, such that the number of bits communicated is independent of the amount of information available to Alice and Bob?

2. Do other notions of agreement necessarily imply accuracy under rectangle substitutes?

In Section 4, we give our second and third main results: the answer to both questions is yes. Specifically, the positive results apply when when measuring agreement and accuracy using Bregman divergences, a class of error measures that includes both squared distance and KL divergence.222The third result holds under an “approximate triangle inequality” condition on the Bregman divergence, which is satisfied by most or all natural choices; indeed, it is nontrivial to construct a Bregman divergence that does not satisfy this property.

Aaronson’s proof of his agreement theorem turns out to be specific to squared distance. Our agreement theorem (Theorem 4.11) modifies Aaronson’s protocol to depend on the particular Bregman divergence, i.e. the relevant error measure. It then proceeds in a manner inspired by Aaronson but using several new ideas. Our proof that agreement implies accuracy under rectangle substitutes for general Bregman divergences also involves some nontrivial changes to our proof for squared distance. In particular, the fact that the length of an interval cannot be inferred from the Bregman divergence between its endpoints necessitates a closer analysis of the partition of Alice’s and Bob’s beliefs.

We conclude in Section 5 with a discussion of connections between agreement protocols and information revelation in financial markets, and discuss an interesting potential avenue for future work.

1.2 Related Work

Our setting is related to but distinct from communication complexity. In that field (e.g. [rao2020communication]), the goal is for Alice and Bob to correctly compute a function of their inputs while communicating as few bits as possible and using any protocol necessary. By contrast, aar05 considered a goal of agreement, not correctness, and focused on specific natural protocols, which he showed achieve this goal in a constant number of bits. Our work focuses on Aaronson’s setting. We discuss how our results might be framed in terms of communication complexity in Appendix E.

Our introduction of the substitutes condition is inspired by its usefulness in prediction markets cw16. The “expectation-sharing” agreement protocols we study bear a strong similarity to dynamics of market prices. ostrovsky2012information introduced a condition under which convergence of prices in a market implies that all information is aggregated. This can be viewed as an “agreement implies accuracy” condition. We discuss these works and the connection of our work to markets in Section 5. Another similar definition of informational substitutes is used by [nr21b] in the context of robust aggregation of forecasts.

Finally, we note that the “agreement protocols” we study are not related to key agreement protocols in cryptography, where the goal is for two communicating parties to jointly construct a shared string for cryptographic use.

2 Preliminaries

2.1 Information Structures

We consider a set

of states of the world, with a probability distribution

over the world states. There are two experts, Alice and Bob. Alice learns the value of a random variable ; we call Alice’s signal and her signal set. Correspondingly, Bob learns the value of a random variable . These signals each convey partial information about the true state . Alice and Bob are interested in a third random variable . We use the term information structure to refer to the tuple .

We denote by the random variable that is equal to the expected value of conditioned on both Alice’s signal and Bob’s signal . We also define and . For a measurable set , we define ; we define analogously for . Additionally, for , we define , i.e. the expected value of conditioned on the particular value of and the knowledge that . If Alice knows that Bob’s signal belongs to (and nothing else about his signal), then the expected value of conditional on her information is ; we refer to this as Alice’s expectation. Likewise, for , we define . Finally, we define . This is the expectation of a third party who only knows that and .

In general we often wish to take expectations conditioned on (for some ). We will use the shorthand for in such cases.

2.2 Agreement Protocols

The notion of agreement between Alice and Bob is central to our work. We first define agreement in terms of squared error, and generalize to other error measures in Section 4.

Definition 2.1 (ϵ-agree).

Let and be Alice’s and Bob’s expectations, respectively ( and are random variables on ). Alice and Bob -agree if .

The constant makes the left-hand side represent Alice’s and Bob’s distance to the average of their expectations.

Our setting follows [aar05], which examined communication protocols that cause Alice and Bob to agree. In a (deterministic) communication protocol, Alice and Bob take turns sending each other messages. On Alice’s turns, Alice communicates a message that is a deterministic function of her input (i.e. her signal ) and all previous communication, and likewise for Bob on his turns. A rectangle is a set of the form where and . The transcript of the protocol at a time step (i.e. after messages have been sent) partitions into rectangles: for any given sequence of messages, there are subsets such that the protocol transcript at time is equal to this sequence if and only if . For a given communication protocol, we may think of and as random variables. Alice’s expectation at time (i.e. after the -th message has been sent) is and Bob’s expectation at time is . Finally, the protocol terminates at a certain time (which need not be known in advance of the protocol). While typically in communication complexity a protocol is associated with a final output, in this case we are interested in Alice’s and Bob’s expectations, so we do not require an output.

It will be convenient to hypothesize a third party observer, whom we call Charlie, who observes the protocol but has no other information. At time , Charlie has expectation . Charlie’s expectation can also be interpreted as the expectation of according to Alice and Bob’s common knowledge.

The following definition formalizes the relationship between communication protocols and agreement.

Definition 2.2 (ϵ-agreement protocol).

Given an information structure , a communication protocol causes Alice and Bob to -agree on if Alice and Bob -agree at the end of the protocol, i.e., if , where the expected value is over Alice’s and Bob’s inputs. We say that a communication protocol is an -agreement protocol if the protocol causes Alice and Bob to -agree on every information structure.

Aaronson defines and analyzes two -agreement protocols.333A minor difference to our framing is that aar05 focuses on probable approximate agreement: protocols that cause the absolute difference between Alice and Bob to be at most with probability all but . While the results as presented in this section are stronger than those in [aar05] (the original results follow from these as a consequence of Markov’s inequality), these results follow from a straightforward modification of his proofs. The first of these is the standard protocol, in which Alice and Bob take turns stating their expectations for a number of time steps that can be computed by Alice and Bob independently in advance of the protocol, and which is guaranteed to be at most .

The fact that exchanging their expectations for time steps results in -agreement is profound and compelling. However, the standard protocol may require an unbounded number of bits of communication, since Alice and Bob are exchanging real numbers. To address this, Aaronson defines another agreement protocol that is truly polynomial-communication (which we slightly modify for our purposes):

Definition 2.3 (Discretized protocol, [aar05]).

Choose . In the discretized protocol with parameter , on her turn (at time ), Alice sends “low” if her expectation is smaller than Charlie’s by more than , i.e. if ; “high” if her expectation is larger than Charlie’s by more than ; and “medium” otherwise. Bob acts analogously on his turn. At the start of the protocol, Alice and Bob use the information structure to independently compute the time that minimizes . The protocol ends at this time.

Theorem 2.4 ([aar05, Theorem 4]).

The discretized protocol with parameter is an -agreement protocol with transcript length bits.

In general, we refer to Aaronson’s standard and discretized protocols as examples of expectation-sharing protocols. We will define other examples in Section 4, similar to Aaronson’s discretized protocol but with different cutoffs for low, medium, and high. We also interpret expectation-sharing protocols in the context of markets in Section 5.

2.3 Accuracy and Informational Substitutes

Most of our main results give conditions such that if Alice and Bob -agree, then Alice’s and Bob’s estimates are accurate. By accurate, we mean that Alice’s and Bob’s expectations are close to , i.e., what they would believe if they knew each other’s signals. (After all, they cannot hope to have a better estimate of than ; for this reason we sometimes refer to as the “truth.”) Formally:

Definition 2.5 (ϵ-accurate).

Let be Alice’s expectation. Alice is -accurate if . We define -accuracy analogously for Bob.

One cannot hope for an unconditional result stating that if Alice and Bob agree, then they are accurate. Consider for instance the XOR information structure from the introduction: Alice and Bob each receive independent random bits as input, and is the XOR of these bits. Then from the start Alice and Bob agree that the expected value of is exactly , but this value is far from , which is either or .

Intuitively, this situation arises because Alice’s and Bob’s signals are informational complements: each signal is not informative by itself, but they are informative when taken together. On the other hand, we say that signals are informational substitutes if learning one signal is less valuable if you already know the other signal. An extreme example is if for any random variable . Here becomes useless upon learning and vice versa. In cw16,444We recommend the ArXiv version for the most up-to-date introduction to informational substitutes. the authors discuss formalizations of several notions of informational substitutes. All of these notions capture “diminishing marginal value,” in the sense that, roughly speaking, the value of partial information is a submodular set function. The various definitions proposed by cw16 only differ in how finely they allow decomposing and to obtain a marginal unit. Our definition has the same format, but uses a decomposition inspired by information rectangles in communication complexity. Recall that we write as shorthand for .

Definition 2.6.

An information structure satisfies rectangle substitutes if it satisfies weak substitutes on every sub-rectangle, i.e., if for every , we have

 E[(Y−μSτ)2∣S,T]−E[(Y−μστ)2∣S,T]≤E[(Y−μST)2∣S,T]−E[(Y−μσT)2∣S,T].

We will show that under rectangle substitutes, if Alice and Bob approximately agree, then they are approximately accurate.

Interpreting substitutes.

Both sides of the inequality in Definition 2.6 represent the “value” of learning as measured by a decrease in error. The left-hand side gives the decrease if one already knows and that ; the right-hand side gives the decrease if one only knows that . Substitutes thus says: the marginal value of learning is smaller if one already knows than if one does not. This statement should hold for every sub-rectangle . We remark that the inequality can be rearranged to focus instead on the marginal value of rather than . We also note that in the XOR information structure, the left-hand side of the inequality is while the right-hand side is zero: a large violation of the substitutes condition. In the example , the left side is always zero.

cw16 discusses three interpretations of substitutes, which motivate it as a natural condition. (1) Each side of the inequality measures an improvement in prediction error, here the squared loss, due to learning . Under substitutes, the improvement is smaller if one already knows . (2) Each side measures a decrease in uncertainty due to learning . Under substitutes, provides less information about if one already knows .555

Here, uncertainty is measured by variance of one’s belief. Under the KL divergence analogue covered in Section

4.1, uncertainty is measured in bits via Shannon entropy. (3) Each side measures the decrease in distance of a posterior expectation from the truth when learning . The distance to changes less if one already knows .

2.4 The Pythagorean Theorem

We will use the following fact throughout. We defer the proof to Appendix C, where we establish a more general version of this statement.

Proposition 2.7 (Pythagorean theorem).

Let be a random variable, where is a sigma-algebra, and be a random variable defined on . Then

 E[(A−C)2]=E[(A−B)2]+E[(B−C)2].

We use the phrase Pythagorean theorem in part because of its form, and in part because it is precisely the familiar Pythagorean theorem when the random variables are viewed as points in a Hilbert space666We do not make use of this abstraction in our work, but we refer the interested reader to [zidak57]. with inner product .

Informally, is a random variable, is the expected value of conditional on some partial information, and is a random variable that only depends on this information. So the theorem applies when is a coarse estimate of and is at least as coarse as , a scenario that often occurs in our setting.

One application of the Pythagorean theorem in our context takes , (the expected value of conditioned on the experts’ signals), and (Alice’s expected value, which only depends on her signal and thus on the signal pair). This particular application, along with the symmetric one taking , allows us to rewrite the rectangle substitutes condition in a form that we will find more convenient:

Remark 2.8.

An information structure satisfies rectangle substitutes if and only if

 E[(μστ−μSτ)2∣S,T]≤E[(μσT−μST)2∣S,T] (1)

for all .

3 Results for Squared Distance

Our main results show that, under the rectangle substitutes condition, any communication protocol that causes Alice and Bob to agree also causes them to be accurate. We now show the first of these results, which is specific to the squared distance error measure that we have been discussing.

3.1 Agreement Implies Accuracy

Theorem 3.1.

Let be an information structure that satisfies rectangle substitutes. For any communication protocol that causes Alice and Bob to -agree on , Alice and Bob are -accurate after the protocol terminates.

The crux of the argument is the following lemma.

Lemma 3.2.

Let be an information structure that satisfies rectangle substitutes. Let . Then

 E[(μστ−μτ)2]≤6ϵ1/3.

Let us first prove Theorem 3.1 assuming Lemma 3.2 is true.

Proof of Theorem 3.1.

Consider any protocol that causes Alice and Bob to -agree on . Let be the set of possible signals of Alice at the end of the protocol which are consistent with the protocol transcript, and define likewise for Bob. Intuitively, is the set of plausible signal pairs according to an external observer of the protocol. Observe that and are random variables, each a function of both and . We have

 E[(μστ−μSτ)2] =ES,T[E[(μστ−μSτ)2∣S,T]] ≤ES,T[6(E[(μσT−μSτ)2∣S,T])1/3] ≤6ES,T[E[(μσT−μSτ)2∣S,T]]1/3 =6E[(μσT−μSτ)2]1/3=6(4ϵ)1/3≤10ϵ1/3.

In the second step, we apply Lemma 3.2 to the information structure restricted to — that is, to , where and . (Note that we use the fact that if satisfies rectangle substitutes, then so does ; this is because a rectangle of is also a rectangle of .) The third step follows by the concavity of . Therefore, Bob is accurate (and Alice is likewise by symmetry). ∎

The proof of Lemma 3.2 relies on the following claim. We defer the proof of Lemma 3.2 (and Claim 3.3) to Appendix B, and instead sketch the proofs here.

Claim 3.3.

For any , it is possible to partition into intervals in a way so that each interval has length at most , and

 P[k(σ)≠k(τ)]≤√ϵN,

where denotes the such that , and is defined analogously.777For convenience we define and to be some number greater than .

Intuitively, Claim 3.3 is true because if is small, then and are likely to fall into the same interval.

We now sketch the proof of Lemma 3.2. To see why Claim 3.3 is relevant, recall that we wish to upper bound the expectation of . Let . By the Pythagorean theorem, we have

 E[(μστ−μτ)2]=E[(μστ−μS(k(σ))τ)2]+E[(μS(k(σ))τ−μτ)2].

By using the rectangle substitutes condition for for every , we find that

 E[(μσ−μS(k(σ)))2]≥E[(μστ−μS(k(σ))τ)2]. (2)

Therefore, we have

 E[(μστ−μτ)2]≤E[(μσ−μS(k(σ)))2]+E[(μS(k(σ))τ−μτ)2]. (3)

Claim 3.3 lets us argue that the first of these two terms is small (because and are always within of each other) and that the second term is also small (because conditioned on , is known with high probability). We find that choosing gives us the bound in Lemma 3.2.

Theorem 3.1 is a general result about agreement protocols. Applying the result to Aaronson’s discretized protocol gives us the following result.

Corollary 3.4.

Let be any information structure that satisfies universal rectangle substitutes. For any , Alice and Bob will be -accurate after running Aaronson’s discretized protocol with parameter (and this takes bits of communication).

Remark 3.5.

The discretized protocol is not always the most efficient agreement protocol. For example, Proposition B.1 shows that if the rectangle substitutes condition holds, agreement (and therefore accuracy) can be reached with just bits, an improvement on Corollary 3.4. We discuss communication complexity further in Appendix E. Even if more efficient protocols are sometimes possible, expectation-sharing protocols are of interest because they model naturally-occurring communication processes. For example, they capture the dynamics of prices in markets, which we also discuss in Section 5. More generally, we find it remarkable that Alice and Bob become accurate by running the agreement protocol (indeed any agreement protocol), despite such protocols being designed with only agreement in mind.

Finally, we observe the following important consequence of Theorem 3.1: once Alice and Bob agree, they continue to agree.

Corollary 3.6.

Let be an information structure that satisfies rectangle substitutes. Consider a communication protocol with the property that Alice and Bob -agree after round . Then Alice and Bob -agree on all subsequent time steps.

Proof.

If Alice and Bob -agree then they are -accurate, so in particular . Note that is a decreasing function of , since for any we have

 E[(μστ−μσTs1)2]=E[(μστ−μσTs2)2]+E[(μσTs2−μσTs1)2]

by the Pythagorean theorem. Therefore, for any , we have . Symmetrically, we have . Therefore, , which means that after round , Alice and Bob -agree. ∎

Corollary 3.6 stands in contrast to the more general case, in which it is possible that Alice and Bob “nearly agree for the first time steps, then disagree violently at the -th step” [aar05, §2.2]. Thus, while the main purpose of Theorem 3.1 is a property about accuracy, an agreement property falls out naturally: under the rectangle substitutes condition, once Alice and Bob are close to agreement, they will remain in relatively close agreement into the future.

3.2 Graceful Decay Under Closeness to Rectangle Substitutes

In a sense, the rectangle substitutes condition is quite strong: it requires that the weak substitutes condition be satisfied on every sub-rectangle. One might hope for a result that generalizes Theorem 3.1 to information structures that almost satisfy the rectangle substitutes condition but do not quite. Let us formally define a notion of closeness to rectangle substitutes.

Definition 3.7.

An information structure satisfies -approximate rectangle substitutes if for every partition of into rectangles,888There are partitions into rectangles that cannot arise from a communication protocol. Our results would apply equally if this condition were instead defined for every partition that could arise from a communication protocol, but we state this condition more generally so that it could be applicable in a broader context than the analysis of communication protocols. the rectangle substitutes condition holds in expectation over the partition, up to an additive constant of , i.e., if we have

 Eσ,τ[(μστ−μSσ,ττ)2]≤Eσ,τ[(μσTσ,τ−μSσ,τTσ,τ)2]+δ, (4)

where is the rectangle containing .

Remark 3.8.

The -approximate rectangle substitutes property is a relaxation of the rectangle substitutes property, in the sense that the two are equivalent if . To see this, first observe that if satisfies rectangle substitutes, then it satisfies Equation 4 with pointwise across all , and thus in expectation. In the other direction, suppose that satisfies -approximate rectangle substitutes. Let and consider the partition of into rectangles that contains and, separately, every other signal pair in its own rectangle. For this partition, Equation 4 reduces precisely to Equation 1 (the rectangle substitutes condition for and ).

Theorem 3.1 generalizes to approximate rectangle substitutes as follows.

Theorem 3.9.

Let be an information structure that satisfies -approximate rectangle substitutes. For any communication protocol that causes Alice and Bob to -agree on , Alice and Bob are -accurate after the protocol terminates.

Proof.

We first observe that Lemma 3.2 can be modified as follows.

Lemma 3.10.

Let be an information structure that satisfies -approximate rectangle substitutes. Let . Then

 E[(μστ−μτ)2]≤6ϵ1/3+δ.

The proof of Lemma 3.10 is exactly the same as that of Lemma 3.2, except that Equation 2 (Equation 6 in the full proof) includes an additive term on the left-hand side:

 E[(μσ−μS(k(σ)))2]+δ≥E[(μστ−μS(k(σ))τ)2].

This modified inequality follows immediately from the -approximate rectangle substitutes condition, noting that one partition of into rectangles is . The extra term produces the term in the lemma statement.

To prove the theorem, let be the set of possible signals of Alice at the end of the protocol which are consistent with the protocol transcript, and define likewise for Bob. Let be the minimum such that satisfies -approximate rectangle substitutes. Note that : otherwise, by taking the union over the worst-case partitions for each we would exhibit a partition of into rectangles that would violate the -approximate rectangle substitutes property. Therefore we have

 E[(μστ−μSτ)2] =ES,T[E[(μστ−μSτ)2∣S,T]] ≤ES,T[6(E[(μσT−μSτ)2∣S,T])1/3+δST] ≤6ES,T[E[(μσT−μSτ)2∣S,T]]1/3+δ =6E[(μσT−μSτ)2]1/3+δ=6(4ϵ)1/3+δ≤10ϵ1/3+δ.

As in the proof of Theorem 3.1, the second step follows by applying Lemma 3.2 to the information structure restricted to . ∎

4 Results for Other Divergence Measures

Squared distance is a compelling error measure because it elicits the mean. That is, if you wish to estimate a random variable and will be penalized according to the squared distance between and your estimate, the strategy that minimizes your expected penalty is to report the expected value of (conditional on the information you have). This is in contrast to e.g. absolute distance as an error measure, which would instead elicit the median of your distribution. The class of error measures that elicit the mean is precisely the class of Bregman divergences (defined below).

In this section, our main result is a generalization of Theorem 3.1 to (almost) arbitrary Bregman divergences (see e.g. Theorem 4.14). Additionally, we provide a generalization of Aaronson’s discretized protocol to arbitrary Bregman divergences (Theorem 4.11).

4.1 Preliminaries on Bregman Divergences

Definition 4.1.

Given a differentiable,999When we say “differentiable,” we mean differentiable on the interior of the interval on which is defined. strictly convex function , and , the Bregman divergence from to is

 DG(y∥x):=G(y)−G(x)−(y−x)G′(x).
Proposition 4.2 ([banerjee2005clustering]).

Given a random variable , the quantity is minimized by .

An intuitive formulation of Bregman divergence is that can be found by drawing the line tangent to at and computing how far below the point this line passes. We illustrate this in Figure 1. Note that the Bregman divergence is not in general symmetric in its arguments; indeed, is the only for which it is.

The Bregman divergence with respect to is precisely the squared distance. Another common Bregman divergence is the KL divergence, which corresponds to , the negative of Shannon entropy.

We generalize relevant notions such as agreement and accuracy to arbitrary Bregman divergences as follows. In the definitions below, is a differentiable, strictly convex function.

Definition 4.3.

Let be Alice’s expectation. Alice is -accurate if , and likewise for Bob.

We discuss our choice of the order of these two arguments (i.e. why we do not instead consider the expectation of ) in Appendix D. We now define -agreement, and to do so we first define the Jensen-Bregman divergence.

Definition 4.4.

For , the Jensen-Bregman divergence between and with respect to is

 JBG(a,b):=12(DG(a∥a+b2)+DG(b∥a+b2))=G(a)+G(b)2−G(a+b2).

The validity of the second equality can be easily derived from the definition of Bregman divergence. Note that the Jensen-Bregman divergence, unlike the Bregman divergence, is symmetric in its arguments. The Jensen-Bregman divergence is a lower bound on the average Bregman divergence from Alice and Bob to any other point (see Proposition C.1 1).

Definition 4.5.

Let and be Alice’s and Bob’s expectations, respectively. Alice and Bob -agree with respect to if .

In Appendix D we discuss alternative definitions of agreement and accuracy. The upshot of this discussion is that our definition of agreement is the weakest reasonable one, and our definition of accuracy is the strongest reasonable one. This means that the main result of this section—that under a wide class of Bregman divergence, agreement implies accuracy—is quite powerful: it starts with a weak premise and proves a strong conclusion.

Definition 4.6.

Given an information structure , a communication protocol causes Alice and Bob to -agree on with respect to if Alice and Bob -agree with respect to at the end of the protocol. A communication protocol is an -agreement protocol with respect to if the protocol causes Alice and Bob to -agree with respect to on every information structure.

We also generalize the notion of rectangle substitutes to this domain, following [cw16], which explored notions of substitutes for arbitrary Bregman divergences.

Definition 4.7.

Let be a differentiable, strictly convex function. An information structure satisfies rectangle substitutes with respect to if for every , we have

 E[DG(Y∥μSτ)∣S,T]−E[DG(Y∥μστ)∣S,T] ≤E[DG(Y∥μST)∣S,T]−E[DG(Y∥μσT)∣S,T].

The Pythagorean theorem (Proposition 2.7) generalizes to arbitrary Bregman divergences:

Proposition 4.8.

Let be a random variable, where is a sigma-algebra, and be a random variable defined on . Then

 E[DG(A∥C)]=E[DG(A∥B)]+E[DG(B∥C)].

Although the proof of this observation is fairly straightforward, to our knowledge Proposition 4.8 is original to this work. We provide a proof in Appendix C. Just as we did with squared error, this general Pythagorean theorem allows us to rewrite the rectangle substitutes condition for Bregman divergences.

Remark 4.9.

An information structure satisfies rectangle substitutes with respect to if and only if for all we have

 E[DG(μστ∥μSτ)∣S,T]≤E[DG(μσT∥μST)∣S,T]. (5)

Given the interpretation of Bregman divergences as measures of error, we can interpret the left side as Bob’s expected error in predicting the truth while the right side is Charlie’s expected error when predicting Alice’s expectation. Both sides measure a prediction error due to not having Alice’s signal, but from different starting points.

4.2 Generalizing the Discretized Protocol

Later in this work, we will show that under some weak conditions, protocols that cause Alice and Bob to agree with respect to also cause Alice and Bob to be accurate with respect to . However, this raises an interesting question: are there protocols that cause Alice and Bob to agree with respect to ? In particular, we are interested in natural expectation-sharing protocols. Aaronson’s discretized protocol is specific to , and it is not immediately obvious how to generalize it. We present the following generalization.

Definition 4.10.

Let be a differentiable, strictly convex function, and let . Choose . In the discretized protocol with respect to with parameter , on her turn (at time ), Alice sends “medium” if , and otherwise either “low” or “high”, depending on whether is smaller or larger (respectively) than . Bob acts analogously on his turn. At the start of the protocol, Alice and Bob use the information structure to independently compute the time that minimizes . The protocol ends at this time.

Theorem 4.11.

The discretized protocol with respect to with parameter is an -agreement protocol that involves bits of communication.

Our proof draws inspiration from Aaronson’s proof of the discretized protocol, but has significant differences. The key idea is to keep track of the monovariant . This is Charlie’s expected error (as measured by the Bregman divergence from the correct answer ) after time step —recall that Charlie is our name for a third-party observer of the protocol. Note that this quantity is at most and at least . Hence, if we show that the quantity decreases by at least some value every time Alice and Bob do not -agree, then we will have shown that Alice and Bob must -agree within time steps. We defer the proof to Appendix C.

4.3 Approximate Triangle Inequality

Our results will hold for a class of Jensen-Bregman divergences that satisfy an approximate version of the triangle inequality. Specifically, we will require to satisfy the following -approximate triangle inequality for some .

Definition 4.12.

Given a differentiable, strictly convex function and a positive number , we say that satisfies the -approximate triangle inequality if for all we have

 JBG(a,x)+JBG(x,b)≥cJBG(a,b).

It is possible to construct functions such that there is no positive for which satisfies the -approximate triangle inequality. However, satisfies the -approximate triangle inequality for some positive for essentially all natural choices of .

Proposition 4.13.

Let be a differentiable, strictly convex function.

1. [label=()]

2. If satisfies the triangle inequality, then satisfies the -approximate triangle inequality.

3. If (i.e.  is squared distance) or if (i.e.  is KL divergence), then satisfies the triangle inequality (and so satisfies the -approximate triangle inequality).

Proof.

Regarding Fact 1, suppose that satisfies the triangle inequality. Then for all we have . Squaring both sides and observing that completes the proof.

Fact 2 is trivial for , since is the absolute distance metric (times a constant factor). As for , we refer the reader to [es03]. ∎

The question of when satisfies the triangle inequality has been explored in previous work; we refer the interested reader to [abb13] and [ccr08].

4.4 Generalized Agreement Implies Generalized Accuracy

In all of the results in this subsection, we consider the following setting: is a differentiable convex function; is a positive real number such that satisfies the -approximate triangle inequality; and is an information structure that satisfies rectangle substitutes with respect to .

We prove generalizations of Theorem 3.1, showing that under the rectangle substitutes condition, if a protocol ends with Alice and Bob in approximate agreement, then Alice and Bob are approximately accurate. The first result we state assumes that is symmetric, but is otherwise quite general.

Theorem 4.14.

Assume that is symmetric about the line . For any communication protocol that causes Alice and Bob to -agree on , and for any , Alice and Bob are

 (8c2β+16(G(0)−G((ϵβ)1/(1−log2c))))-accurate

after the protocol terminates.

This result is not our most general, as it assumes that is symmetric, but this assumption likely holds for most use cases. To apply the result optimally, one must first optimize as a function of . For example, setting (with defined below) gives us the following corollary:101010Corollary 4.15 as stated (without the symmetry assumption) is actually a corollary of Theorem 4.18.

Corollary 4.15.

Assume that . For any communication protocol that causes Alice and Bob to -agree on , Alice and Bob are -accurate after the protocol terminates, where the constant hidden by depends on .

Remark 4.16.

Concretely, if is bounded then we can choose , in which case our bound simplifies to . If instead we assume that (as is the case if is a metric), then the bound is . If both of these are true, as is the case for , then the bound is , which recovers our result in Theorem 3.1.

For equal to the negative of Shannon entropy (i.e. the for which is KL divergence), setting in Theorem 4.14 gives us the following corollary.

Corollary 4.17.

If , then for any communication protocol that causes Alice and Bob to -agree on , Alice and Bob are -accurate after the protocol terminates.

Theorem 4.14 follows from our most general result about agreement implying accuracy:

Theorem 4.18.

Let be the maximum possible difference in -values of two points that differ by at most , and let be the concave envelope of , i.e.

 ~G∗(x):=max0≤a,b,w≤1:wa+(1−w)b=xw~G(a)+(1−w)~G(b).

For any communication protocol that causes Alice and Bob to -agree on , and for any , Alice and Bob are

 (8c2β+16~G∗((ϵβ)1/(1−log2c)))-accurate

after the protocol terminates.

Proof.

To prove Theorem 4.18, it suffices to prove the following lemma.

Lemma 4.19.

Let be a differentiable convex function on and be such that satisfies the -approximate triangle inequality. Let be an information structure that satisfies rectangle substitutes with respect to . Let . Then for any , we have

 E[DG(μστ∥μτ)]≤8c2β+16~G∗((ϵβ)1/(1−log2c)).

Let us first prove Theorem 4.18 assuming Lemma 4.19 is true.

Consider any protocol that causes Alice and Bob to -agree on . Let be the set of possible signals of Alice at the end of the protocol which are consistent with the protocol transcript, and define likewise for Bob.

Let . Note that

 ES,T[ϵST]=ES,T[E[JBG(μσT,μSτ)∣S,T]]=E[JBG(μσT,μSτ)]≤ϵ.

Therefore, for any we have

 E[DG(μστ∥μSτ)] ≤ES,T[8c2β+16~G∗((ϵSTβ)1/(1−log2c))] ≤8c2β+16~G∗(ES,T[(ϵSTβ)1/(1−log2c)]) ≤8c2β+16~G∗⎛⎝(ES,T[ϵST]β)1/(1−log2c)⎞⎠ ≤8c2β+16~G∗((ϵβ)1/(1−log2c)).

In the first step, we apply Lemma 4.19 to the information structure restricted to —that is, to , where and . The next two steps follow by the convexity of and , respectively. ∎

The basic outline of the proof of Lemma 4.19 is similar to that of Lemma 3.2. Once again, we partition into intervals. Analogously to Equation 3, and with defined analogously, we find that

 E[DG(μστ∥μτ)]≤E[DG(μσ∥μS(k(σ)))]+E[DG(μS(k(σ))τ∥μτ)].

As before, we wish to upper bound each summand. However, the fact that the Bregman divergence is now arbitrary introduces complications. First, it is no longer the case that we can directly relate the length of an interval to the Bregman divergence between its endpoints. Second, we consider functions that become infinitely steep near and (such as the negative of Shannon entropy), which makes matters more challenging. This means that we need to be more careful when partitioning into intervals: see Algorithm C.3 for our new approach. Additionally, bounding the second summand involves reasoning carefully about the behavior of the function , which is responsible for the introduction of into the lemma statement. We defer the full proof of Lemma 4.19 to Appendix C.

5 Connections to Markets

In this work, we established a natural condition on information structures, rectangle substitutes, under which any agreement protocol results in accurate beliefs. As we saw, a particularly natural class of agreement protocols are expectation-sharing protocols, where Alice and Bob take turns stating their current expected value, or discretizations thereof.

Expectation-sharing protocols have close connections to financial markets. In markets, the actions of traders reveal partial information about their believed value for an asset, i.e., their expectation. Specifically, a trader’s decision about whether to buy or sell, and how much, can be viewed as revealing a discretization of this expectation. In many theoretical models of markets (see e.g. [ostrovsky2012information]) traders eventually reach agreement. The intuition behind this phenomenon is that a trader who disagrees with the price leaves money on the table by refusing to trade. Our work thus provides a lens into a well-studied question:111111This is related to the efficient market hypothesis, the problem of when market prices reflect all available information, which traces back at least to fama1970efficient and hayek1945use. Modern models of financial markets are often based on kyle1985continuous; we refer the reader to e.g. [ostrovsky2012information] and references therein for further information. when are market prices accurate?

An important caveat, however, is that traders behave strategically, and may not disclose their true expected value. For example, a trader may choose to withhold information until a later point when doing so would be more profitable. Therefore, to interpret the actions of traders as revealing discretized versions of their expected value, one first has to understand the Bayes-Nash equilibria of the market. cw16 studies conditions under which traders are incentivized to reveal all of their information on their first trading opportunity. They call a market equilibrium all-rush if every trader is incentivized to reveal their information immediately. Their main result, roughly speaking, is that there is an all-rush equilibrium if and only if the information structure satisfies strong substitutes—a different strengthening of the weak substitutes condition. This result is specific to settings in which traders have the option to reveal all of their information on their turn—a setting that would be considered trivial from the standpoint of communication theory.

An exciting question for further study is therefore: under what information structure conditions and market settings is it a Bayes-Nash equilibrium to follow an agreement protocol leading to accurate beliefs? In other words, what conditions give not only that agreement implies accuracy, but also that the market incentivizes participants to follow the protocol? Together with cw16, our work suggests that certain substitutes-like conditions could suffice.

Appendix A Details Omitted From Section 2

Above we stated that the weak substitutes condition is not sufficient for agreement to imply accuracy. To see this, consider the following information structure:

• Nature flips a coin. Alice’s and Bob’s signals are each a pair consisting of the outcome of the coin flip and an additional bit:

• If the coin lands heads, Alice and Bob are given highly correlated bits as part of their signals: the bits are and with probability 45% each and and with probability 5% each.

• If the coin lands tails, Alice and Bob are given highly anticorrelated bits as part of their signals: the bits are and with probability 5% each and and with probability 45% each.

The value is the XOR of Alice’s and Bob’s bits. It can be verified that this information structure satisfies weak substitutes; the intuition is that a majority of the value comes from knowing the outcome of the coin flip, which can be inferred from Alice’s (or Bob’s) signal alone.

Alice and Bob 0-agree at the very start. However, they are not 0-accurate, since their expectations are either 10% or 90%, and the right answer is either 0 or 1. Therefore, weak substitutes alone is insufficient for agreement to imply accuracy.

Appendix B Details Omitted From Section 3

Proof of Claim 3.3.

We claim that in fact we can choose the ’s so that each is in . This ensures that each interval has length at most .

For , let be the probability that is between and , inclusive. Note that .

Observe that if is selected uniformly from , the expected value of is equal to , because both quantities are equal to the probability that is between and . Therefore, if is additionally chosen according to , we have

 Ex←[0,1][ρ(x)]=E[|μσ−μτ|]≤√E[(μσ−μτ)2]=√ϵ.

This means that

Thus, if each is selected uniformly at random from , the expected value of would be at most . In particular this means that there exist choices of the ’s such that . ∎

Proof of Lemma 3.2.

Fix a large positive integer (we will later find it optimal to set ). Consider a partition of into intervals satisfying the conditions of Claim 3.3. Let