# A Lower Bound on the Interactive Capacity of Binary Memoryless Symmetric Channels

The interactive capacity of a channel is defined in this paper as the maximal rate at which the transcript of any interactive protocol can be reliably simulated over the channel. It is shown that the interactive capacity of any binary memoryless symmetric (BMS) channel is at least 0.0302 its Shannon capacity. To that end, a rewind-if-error coding scheme for the simpler binary symmetric channel (BSC) is presented, achieving the lower bound for any crossover probability. The scheme is based on extended-Hamming codes combined with randomized error detection. The bound is then shown to hold for any BMS channel using extremes of the Bhattacharyya parameter. Finally, it is shown that the public randomness required for error detection can be reduced to private randomness in a standard fashion, and can be extracted from the channel without affecting the overall asymptotic rate. This gives rise to a fully deterministic interactive coding scheme achieving our lower bound over any BMS channel.

• 5 publications
• 17 publications
• 22 publications
• 20 publications
07/07/2020

### Tightness of the Asymptotic Generalized Poor-Verdú Error Bound for the Memoryless Symmetric Channel

The generalized Poor-Verdú error lower bound for multihypothesis testing...
09/26/2021

### The DNA Storage Channel: Capacity and Error Probability

The DNA storage channel is considered, in which M Deoxyribonucleic acid ...
12/31/2019

### On the Interactive Capacity of Finite-State Protocols

The interactive capacity of a noisy channel is the highest possible rate...
12/30/2017

### Shannon Capacity is Achievable for Binary Interactive First-Order Markovian Protocols

We address the problem of simulating an arbitrary binary interactive fir...
07/15/2020

### Coding Theorems for Noisy Permutation Channels

In this paper, we formally define and analyze the class of noisy permuta...
12/07/2020

### Causal Posterior Matching and its Applications

We consider the problem of communication over the binary symmetric chann...
04/17/2018

### Asymptotic Achievable Rate of Two-Dimensional Constraint Codes based on Column by Column Encoding

In this paper, we propose a column by column encoding scheme suitable fo...

## I Introduction

In the classical Shannon one-way communication problem, a transmitter (Alice) wishes to send a message reliably to a receiver (Bob) over a memoryless noisy channel. She does so by mapping her message into a sequence of channel inputs (codeword) in a predetermined way, which is corrupted by the channel and then observed by Bob, who tries to recover the original message. The Shannon capacity of the channel, which is the maximal number of message bits per channel use that Alice can convey to Bob with vanishingly low error probability, quantifies the most efficient way to do so. In the two-way channel setup [1], both parties draw independent messages and wish to exchange them over a two-input two-output memoryless noisy channel, and the Shannon capacity (region) is defined similarly. Unlike the one-way case, both parties can now employ adaptive coding by incorporating their respective observations of the past channel outputs into their transmission processes. However, just as in the one-way setup, the messages they wish to exchange are determined before communication begins. In other words, if Alice and Bob had been connected by a noiseless bit pipe, they could have simply sent their messages without any regard to the message of their counterpart.

In a different two-way communication setup, generally referred to as interactive communication, the latter assumption is no longer held true. In this interactive communication setup, Alice and Bob do not necessarily wish to disclose all their local information. What they want to tell each other depends, just like in human conversation, on what the other would tell them. A simple instructive example (taken from [2]) is the following. Suppose that Alice and Bob play chess remotely, by announcing their moves over a communication channel (using, say, bits per move, which is clearly sufficient). If the moves are conveyed without error, then both parties can keep track of the state of the board, and the game can proceed to its termination. The sequence of moves occurring over the course of this noiseless game is called a transcript, and it is dictated by the protocol of the game, which constitutes Alice and Bob’s respective strategies determining their moves at any given state of the board.

Now, assume that Alice and Bob play the game over a noisy two-way channel, yet wish to simulate the transcript as if no noises were present. In other words, they would like to communicate back and forth in a way that ensures, once communication is over, that the transcript of the noiseless game can be reproduced by to both parties with a small error probability. They would also like to achieve this goal as efficiently as possible, i.e., with the least number of channel uses. One direct way to achieve this is by having both parties describe their entire protocol to their counterpart, i.e., each and every move they might take given each and every possible state of the board. This reduces the interactive problem to a non-interactive one, with the protocol becoming a pair of messages to be exchanged. However, this solution is grossly inefficient; the parties now know much more than they really need in order to simply reconstruct the transcript. At the other extreme, Alice and Bob may choose to describe the transcript itself by encoding each move separately on the fly, using a short error correcting code. Unfortunately, this code must have some fixed error probability and hence an undetected error is bound to occur at some unknown point, causing the states of the board held by the two parties to diverge, and rendering the remainder of the game useless. It is important to note that if Alice and Bob had wanted to play sufficiently many games in parallel, then they could have used a long error-correcting code to simultaneously protect the set of all moves taken at each time point, which in principle would have let them operate at the one-way Shannon capacity (which is the best possible). The crux of the matter therefore lies in the fact that the interactive problem is one-shot, namely, only a single instance of the game is being played.

In light of the above, it is perhaps surprising that it is nevertheless possible to simulate any one-shot interactive protocol using a number of channel uses that is proportional to the length of the transcript, or in other words, that there is a positive interactive capacity whenever the Shannon capacity is positive. This fact was originally proved by Schulman [3], who was also the first to introduce the notion of interactive communication over noisy channels. However, the interactive capacity has never been quantified; it is only known to be some nonzero fraction of the Shannon capacity. In this paper, we show that for a large class of channels, the interactive capacity is at least of the Shannon capacity.

The rest of the paper is organized as follows. In Section II we present the problem formulation and a high level description of the techniques. In Section III we put our work in context of existing results in the literature. We provide some necessary preliminaries in Section IV, and then state the main results in Section V. The coding scheme used in the proof for the binary symmetric channel (BSC) is presented and analyzed in Sections VI and VII respectively, and then generalized to binary memoryless symmetric (BMS) channels in Section VIII. Finally, in Section IX, we explain how the randomized coding scheme can be modified to be fully deterministic.

## Ii Problem Formulation and the Main Contribution

In this paper, a length- interactive protocol is the triplet , where

 ϕAlice ≜{ϕAlicei:{0,1}i−1↦{0,1}}ni=1 (1) ϕBob ≜{ϕBobi:{0,1}i−1↦{0,1}}ni=1 (2) ψ ≜{ψi:{0,1}i−1↦{Alice,Bob}}ni=1. (3)

The functions are known only to Alice, and the functions are known only to Bob. The speaker order functions are known to both parties. The transcript associated with the protocol is sequentially generated by Alice and Bob as follows

 τi ={ϕAlicei(τi−1)σi=AliceϕBobi(τi−1)σi=Bob (4)

where is the identity of the speaker at time , which is given by:

 σi =ψi(τi−1). (5)

In the interactive simulation problem Alice and Bob would like to simulate the transcript , by communicating back and forth over a noisy memoryless channel . Specifically, we restrict our discussion to channels with a binary input alphabet , and a general (possibly continuous) output alphabet . Note that while the order of speakers in the interactive protocol itself might be determined on the fly (by the sequence of functions ), we restrict the simulating protocol to use a predetermined order of speakers, due to the fact that our physical channel model does not allow simultaneous transmissions.

To achieve their goal, Alice and Bob employ a length- coding scheme that uses the channel times. The coding scheme consists of a disjoint partition where (resp. ) is the set of time indices where Alice (resp. Bob) speaks. This disjoint partition can be a function of , but not of . At time (resp. ), Alice (resp. Bob) sends some Boolean function of ) (resp. )), and of everything she has received so far from her counterpart. The rate of the scheme is bits per channel use. When communication terminates, Alice and Bob produce their simulations of the transcript , denoted by and respectively. The error probability attained by the coding scheme is the probability that either of these simulations is incorrect, i.e.,

 Pe(Σ,π)≜Pr(^τA(Σ,ϕAlice,ψ)≠τ∨^τB(Σ,ϕBob,ψ)≠τ). (6)

A rate is called achievable if there exists a sequence of length- coding schemes with rates , such that

 limn→∞maxπ of length nPe(Σn,π)=0, (7)

where the maximum is taken over all length- interactive protocols. Accordingly, we define the interactive capacity as the maximum of all achievable rates for the channel . Note that this definition parallels the definition of maximal error capacity in the one-way setting, as we require the error probability attained by the sequence of coding schemes to be upper bounded by a vanishing term uniformly for all protocols.

It is clear that at least bits need to be exchanged in order to reliably simulate a general protocol, hence the interactive capacity satisfies . In the special case of a noiseless channel, i.e., where the output deterministically reveals the input bit, and assuming that the order of speakers is predetermined (namely contains only constant functions), this upper bound can be trivially achieved; Alice and Bob can simply evaluate and send sequentially according to (4) and (5). Note however, that if the order of speakers is general, then this is not a valid solution, since we required the order of speakers in the coding scheme to be fixed in advance. Nevertheless, any general interactive protocol can be sequentially simulated using the channel times with alternating order of speakers, where each party sends a dummy bit whenever it is not their time to speak. Conversely, a factor two blow-up in the protocol length in order to account for a non pre-determined order of speakers is also necessary. To see this, consider an example of a protocol where Alice’s first bit determines the identity of the speaker for the rest of time; in order to simulate this protocol using a predetermined order of speakers, it is easy to see that at least channel uses must be allocated to each party in advance. We conclude that under our restricting capacity definition, the interactive capacity of a noiseless channel is exactly .

When the channel is noisy, a tighter trivial upper bound holds:

 CI(PY|X)≤12CSh(PY|X), (8)

where is the Shannon capacity of the channel. To see this, consider the same example given above, and note that each party must have sufficient time to reliably send bits over the noisy channel. Hence, the problem reduces to a pair of one-way communication problems, in which the Shannon capacity is the fundamental limit. We remark that it is reasonable to expect the bound (8) to be loose, since general interactive protocols cannot be trivially reduced to one-way communication as the parties cannot generate their part of the transcript without any interaction. However, the tightness of the bound remains a wide open question. We note in passing that if we had considered simulating only protocols with a predetermined order of speakers, the corresponding upper bound would have been .

### Channel Models and Capacity Lower Bounds

The first noisy channel model we consider is the memoryless binary symmetric channel with crossover probability , BSC(). The input to output relation of the BSC() is given by

 Y=X⊕Z (9)

where , denotes addition over . is statistically independent of with . We denote its Shannon capacity by

 CSh(ε)≜1−h(ε), (10)

where is the binary entropy function, and . We also use to denote the interactive capacity of the BSC().

A richer channel model which is commonly used in the coding literature is the binary memoryless symmetric (BMS) channel [4, 5, 6, 7, 8]. While several equivalent definitions exist, we choose to define a BMS channel as a collection of BSC with various crossover probabilities [9] as follows:

###### Definition 1.

[BMS channels] A memoryless channel with binary input output and a conditional distributions is called binary memoryless symmetric channel (BMS()) if there exists a sufficient statistic of for : , where are statistically independent of ,

is a binary random variable with

, and with probability one.

Consequently, the Shannon capacity of BMS() channel is

 CSh(PY|X)=1−Eh(T). (11)

Important BMS channels other than the BSC, include the binary erasure channel (BEC), the binary additive white Gaussian noise (BiAWGN) among others, as elaborated in Section VIII.

The main contribution of this paper is the following bound for the ratio between the interactive capacity and the Shannon capacity for any BMS channel.

###### Theorem 1.

For any BMS() channel with positive Shannon capacity and interactive capacity

 CI(PY|X)CSh(PY|X)≥0.0302. (12)

Theorem 1 is proved by first analyzing the BSC special case stated in the following theorem, and then extending the result for a general BMS channel.

###### Theorem 2.

For any BSC with crossover probability , Shannon capacity the and interactive capacity the following bound holds:

 CI(ε)CSh(ε)≥0.0302. (13)

The first step in the proof is standardly symmetrizing the order of speakers by possibly adding dummy transmissions, such that Alice speaks at odd times, and Bob speaks at even times. In the sequel we refer to this order of speakers as

bit-vs.-bit. This reduces the rate by a factor of two at most. Theorem 2 is then proved by using a rewind-if-error scheme in the spirit of [3, 10] designed for simulating the transcript of protocols with an alternating order of speakers. As mentioned in the chess game example, in the general case, the transcript bits of an interactive protocol should be decoded instantaneously, which implies that error correction codes (that typically use long blocks) cannot be straightforwardly used. Instead, rewind-if-error scheme are based on uncoded transmission followed by error detection and retransmission. Namely, the transcript is simulated in blocks, assuming no errors are present. Then, an error detection phase takes place, initiating the retransmission of the block whenever errors are detected. Since the probability of error in a block increases with the block length, such schemes assume that the channel is almost error free, namely, that is very small compared to the reciprocal of the block length. The scheme presented in Sections VI and VII of this paper is based on a layered error detection and retransmission. The error detection is implemented by an extended-Hamming code in the first layer, and by a standard randomized error detection algorithm [11] at higher layers. As will be shown in the sequel, the total rate of the proposed scheme is mostly effected by the efficiency of the error detection in the first layer. For this reason, for the first layer the error detection is preformed using an extended-Hamming code which is known to be highly efficient for errors generated by a BSC [12].

We analyze the rate of the scheme for a fixed small . For BSC with larger values of , we reduce the crossover probability via repetition coding and account for the incurred rate loss. The result is then generalized in Section VIII to the case of general BMS channel recalling that the BSC has the largest Bhattacharyya parameter among all the BMS channels with the same Shannon capacity [8]. This property implies that the BSC requires the largest number of repetitions per target crossover probability among all BMS channels with the same capacity and can therefore be regarded as the worst case BMS channel with a given capacity, for the proposed scheme.

The requirements for randomness in the scheme are discussed in Section IX. For simplicity of exposition, the scheme described in Section VI uses randomness for the error detection. In Section IX we show that the requirement for randomness can be circumvented. This is done by first reducing the number of required random bits to and then standardly extracting them from the noisy channels [13, 14] without reducing the overall rate of the scheme.

## Iii Connections to the Existing Work

The interactive communication problem introduced by Schulman [3, 15] is motivated by Yao’s communication complexity scenario [16]. In this scenario, the input of a function is distributed between Alice and Bob, who wish to compute with negligible error by exchanging (noiseless) bits using some interactive protocol. The length of the shortest protocol achieving this is called the communication complexity of , and denoted by . In the interactive communication setup, Alice and Bob must achieve their goal by communicating through a pair of independent BSC(). The minimal length of an interactive protocol attaining this goal is now denoted by .

In [10], Kol and Raz defined the interactive capacity as

 CKRI(ε)≜limn→∞minf:CC(f)=nnCCε(f), (14)

and proved that

 CKRI(ε)≥1−O(√h(ε)) (15)

in the limit of , under the additional assumption that the communication complexity of is computed with the restriction that the order of speakers is predetermined and has some fixed period. The former assumption on the order of speakers is important. Indeed, consider again the example where the function is either Alice’s input or Bob’s input as decided by Alice. In this case, the communication complexity with a predetermined order of speakers is double that without this restriction, and hence considering such protocols renders . For further discussion on speaking order impact as well as channel models that allow collisions, see [17]. For a fixed nonzero , the coding scheme presented in [3] (which precedes [10]) already showed that , but the constant has not been computed (and to the best of our knowledge, has not been computed for any scheme hitherto). Both [3] and [10] based their proofs on rewind-if-error coding schemes, i.e., schemes based on a hierarchical and layered error detection and appropriate retransmissions, which is also the approach we take in this paper. Note that our definition of the BSC interactive capacity is stricter than (14), at least in principle, as it requires reconstruction of the entire transcript. For this reason, , hence our lower bound applies to as well (and also achieves the asymptotic behavior (14) when simulating bit-vs.-bit protocols). Our capacity definition further enjoys the property of being decoupled from any source coding problem such as function communication complexity.

Another aspect we would like to discuss is the type of randomness used by a coding scheme, which can be either public, private or none. The scheme in [3] requires only private randomness, while [10] requires public randomness. It is interesting to note that Schulman’s tree code scheme [15] is not randomized. However, it is not designated to be rate-wise efficient, and does not achieve the lower bound in (15). A non-random coding scheme was recently proposed by Gelles et. al. [18] which is based on a concatenation of a de-randomized interactive coding scheme and a tree-code.

The rewind-if-error scheme presented in this paper is inspired by the scheme in [10] but its error detection is not based on random hashes but rather on extended-Hamming codes and randomized (yet structured) error detection. Our deterministic coding scheme presented in Section IX is not based on de-randomization of a randomized coding scheme as [18] but rather on adapting the error detection so it requires a relatively small number of random bits and then standardly extracting them from the noisy channels at hand.

To summarize the discussion above, there are various setups one may consider in interactive communication. Our scheme, and its corresponding lower bound, are based on the most restrictive set of assumptions: the order of speakers can be adaptive in the simulated protocol but is predetermined in the simulating protocol, no private or public randomness are allowed, and the entire transcript must be reconstructed by both parties. Thus, our capacity lower bounds remain valid for any other set of standard assumptions.

The current paper extends the preliminary results presented in [19] in the following aspects: i) The error detection in the scheme is structured and is not based on random hashes. ii) The rate of the resulting scheme is improved and consequently the lower bound for the ratio between the interactive capacity and Shannon’s capacity is also improved. iii) The scheme described in this paper can be modified to operate on private randomness, which can be fully extracted from the channels and not on public randomness. iv) The results are generalized for BMS channels.

## Iv Preliminaries

Let

denote the Kullback-Leibler Divergence between the distributions

and . Let denote the Kullback-Leibler Divergence between two Bernoulli random variables with probabilities and . In the sequel we use to denote the indicator function, which equals one if the condition is satisfied and zero otherwise.

The following simple results are used throughout the paper:

###### Lemma 1 (Repetition coding over BSC).

Let a bit be sent over BSC() using repetitions and decoded by a majority vote (if is even, ties are broken by tossing a fair coin). The decoding error probability can be upper bounded by

 Pe≤βρ=2−ρ⋅d(12||ε), (16)

where is the Bhattacharyya parameter respective to the BSC(). The induced channel from the input bit to its decoded value is thus a BSC.

The proof is standard (see for example [5]) and can be regarded as special case of Lemma 8 stated and proved in Section VIII. Note that the random tie breaking is done in order to simplify the scheme and its analysis. It does, however, assume private randomness at both parties. In Section IX we show how the random tie breaking can be circumvented.

We now introduce two error detection methods that would be used in the coding scheme. The first one assumes the error are generated by BSC’s and is based on error correction codes:

###### Definition 2 (Error detection using an extended-Hamming code).

Let and

be binary (row) vectors of length

held by Alice and Bob respectively. Let be the parity check matrix of an extended-Hamming code with parameters . Let be a variable set to one if the parties decide that and set to zero otherwise, calculated according to the following algorithm:

1. Alice calculates her syndrome vector

2. Bob calculates his syndrome vector

3. Alice sends ( bits) to Bob

4. Bob calculates

5. Bob sends ( bit) to Alice

The overall number of bits communicated between Alice and Bob is .

The performance of this scheme over a BSC() is given in the following lemma:

###### Lemma 2.

Assume that

 XA=XB⊕Z. (17)

where is an i.i.d vector. The probability of a mis-detected error of the scheme in Definition 2 is given by

 Pr(NEQ=0,XA≠XB)=12k(1+2(k−1)(1−2ε)k2+(1−2ε)k)−(1−ε)k. (18)

The corresponding probability of a false error detection is

 Pr(NEQ=1,XA=XB)=0. (19)
###### Proof.

First, it is clear that for any we have with probability one, so the probability of false error detection is . For the probability of error mis-detection, note that . Therefore, the event is identical to the event in which , i.e., is a codeword in . All in all

 Pr(NEQ=0,XA≠XB) =Pr(ZHT=0T,Z≠0T) (20) =12k(1+2(k−1)(1−2ε)k2+(1−2ε)k)−(1−ε)k, (21)

where (21) is standardly calculated using the dual code [12, p. 52]. ∎

The second error detection scheme is a randomized scheme based on [11, p. 30], which applies for arbitrary vectors:

###### Definition 3 (Randomized error detection using polynomials).

Let and be arbitrary binary vectors of length held by Alice and Bob respectively. Let , where . Let be a prime number such that (by Bertrand’s postulate such a number must exist). Let be a variable set to one if the parties decides that and set to zero otherwise, calculated according to the following algorithm:

1. Alice uniformly draws

2. Alice calculates

3. Alice sends Bob and

4. Bob calculates

5. Bob calculates

6. Bob sends to Alice

All in all, Alice needs to send at most bits for the representation of , and at most bits for the representation of . Bob sends Alice one bit.

###### Lemma 3.

The error detection scheme of Definition 3 obtains an error mis-detection probability of

 Pr(NEQPoly=0∣XA≠XB)≤1γ, (22)

and a false error detection probability of

 Pr(NEQPoly=1∣XA=XB)=0. (23)
###### Proof.

Note that and are the evaluation at point of two polynomials over whose (binary) coefficients are the elements of and respectively. Clearly, if , then for every value of hence . On the other hand, if , implies that is a root of the polynomial

 ℓ∑i=1(XAi−XBi)Ui−1(mod q). (24)

Since the degree of the polynomial is at most , there are at most such roots, so

 Pr(NEQ=0∣XA≠XB)≤ℓ−1q<ℓγℓ=1γ. (25)

## V Main results

We first lower bound for . Our bound is stated in the following theorem:

###### Theorem 3.

The transcript of any protocol with n bit-vs.-bit order of speakers (i.e. Alice sends a bit on odd times and Bob sends a bit on even times), can be reliably simulated over BSC() in the following rate

 RBSC(ε,k)≜1−kε−(3+logk)β~a−k2k−1(Pe1+3βa+4klogk2−β2k(1−β2k)2)−3βa+4k2logk2−β2k(1−β2k)21+~a(3+logk)k+3logk[a(2k−1)(k−1)2+4k(k−1)3+4k−2k(k−1)2] (26)

where

 Pe1≤12k(1+2(k−1)(1−2ε)k2+(1−2ε)k)−(1−ε)k+(3+logk)β~a. (27)

Let , , and . is can be take as any integer a power of two satisfying .

Using this theorem, for protocols with a bit-vs.-bit order of speakers and for protocols with a general (possibly adaptive) order of speakers.

The proof of Theorem 3 is by the construction and analysis of a rewind-if-error scheme and appears in Sections VI and VII. We note that the presented scheme is randomized and in Section IX we explain how to modify it to be deterministic.

The following corollary proved in Appendix A states that the scheme obtains the rate lower bound (15) from [10]:

###### Corollary 1.

For

 maxkRBSC(ε,k)≥1−O(√h(ε)) (28)

As stated before, the presented rewind-if-scheme is designed for BSC with a sufficiently small . For larger values of , the channel can be converted to a BSC() with using repetitions followed by a majority vote according to Lemma 1. The following lemma bounds the interactive capacity by using an interactive coding scheme augmented by a repetition code:

###### Lemma 4.

For every and

 CI(ε)CSh(ε)≥CI(δ)log1δ+1. (29)
###### Proof.

Let be the smallest integer such that , where is the Bhattacharyya parameter of the BSC() as above. By Lemma 1, using repetitions, the BSC() can be converted to a BSC() with . Normalizing by and noting that we obtain

 CI(ε)CSh(ε)≥CI(δ)ρ(ε,δ)CSh(ε). (30)

By Lemma 1

 ρ≤ρ(ε,δ)≜log1δlog1β+1, (31)

where ‘’ accounts for rounding to the nearest larger integer. Furthermore,

 ρ(ε,δ)CSh(ε) =⎛⎜⎝log1δlog1β+1⎞⎟⎠CSh(ε) (32) ≤I(X;Y)L(X;Y)log1δ+I(X;Y), (33)

where is the input of a BSC() channel and is its respective output,

 I(X;Y)=D(PXY||PXPY)=CSh(ε) (34)

is the mutual information between and and

 L(X;Y)=D(PXPY||PXY)=D(12||ε)=log1β (35)

is the lautum information between and [20]. Using the facts that for the BSC, [20, Theorem 12] and that trivially , concludes the proof. ∎

Theorem 2 now follows by using Theorem 3 with and in order to calculate , dividing the rate by two in order to symmetrize the order of speakers and finally applying Lemma 4.

## Vi Description of the coding scheme for the BSC

The rewind-if-error scheme is based on two concepts: uncoded transmission and retransmissions based on error detection. The uncoded transmission is motivated by the fact that in a general interactive protocol, even in a noise-free environment, the parties cannot predict the transcript bits to be output by their counterpart, and hence might not always know some of their own future outputs. For this reason, long blocks of bits, which are essential for efficient block codes, cannot be generated.

The concept of retransmissions based on error detection can be viewed as an extension of the classic example of the one-way BEC with feedback [5, p. 506]. In this simple setup, channel errors occur independently with probability and errors are detected and marked as erasures, whose locations are immediately revealed to both parties. The coding scheme is simply resending the erased bits, yielding an average rate of , which is exactly Shannon’s capacity for the BEC. In addition, since all the channel errors are marked as erasures, the probability of decoding error is zero.

In the interactive communication setup for a general BMS channel (other than the BEC), channel errors are not necessarily marked as erasures and perfect feedback is not present. However, the fact that the parties have (a noisy) two-way communication link, enables them to construct a coding scheme in a similar spirit as follows. The parties start by simulating the transcript in a window (or a block) of consecutive bits, operating as if the channel is error free. The probability of error in the window can be upper bounded using the union bound by , and this number is assumed to be small. Next, the parties exchange bits in order to decide if the window is correct, i.e., no errors occurred, which would lead to the simulation of the consecutive window, or incorrect, i.e., some errors occurred, which would lead to retransmission (i.e. re-simulation of the window).

Unfortunately, error detection using less than bits of communication has an inherent failure probability. In addition, performing the error detection over a noisy channel can cause further errors, including a disagreement between the parties regarding the mere presence of the errors. For this purpose, the error detection is done in a hierarchical and layered fashion. Namely, after windows are simulated, error detection is applied on all of them, including on the outcome of the previous error detections, possibly initiating their entire retransmission. After windows are simulated, error detection is applied on all of them, and so on. An illustrated example for this concept for is given in Table I.

We are now ready to describe the coding scheme. We note that it can be viewed both as a sequential algorithm and as a recursive algorithm. For sake of clarity and simplicity of exposition, we chose the sequential interpretation for the description and the recursive interpretation for the analysis.

### Vi-a Building blocks

In the sequel we assume that the order of speakers is alternating, Alice speaking at odd times and Bob speaking at even times. We denote the input of a the channel by and its corresponding output by . The following notions are used as the building blocks of the scheme:

• The uncoded simulation of the transcript is a sequence of bits, generated by the parties and the channel, using the transmission functions in and disregarding the channel errors. Alice’s and Bob’s uncoded simulation vectors are for odd : , and respectively. For even they are , and respectively.

• The cursor variables indicate the time indexes of the transmission functions (i.e. the appropriate function in ) used by Alice or Bob in the previous transmission. We denote Alice’s and Bob’s cursors by and respectively. We note that and are random variables and may not be identical.

• The rewind bits are the result of the error detection procedure and are calculated at predetermined points throughout the scheme. They determine whether the simulation of the transcript should proceed forward, or rewind. We denote and separate the rewind bits into layers : . At layer there are rewind bits, denoted by for Alice and for Bob. The value of Alice’s and Bob’s rewind bits might differ in the general case. The rewind bits and are calculated after exactly bits of uncoded simulation, and are calculated according to their respective rewind windows. In the sequel we use the term active to denote that a rewind bit is set to one, and inactive if it is set to zero.

• The rewind window of Alice (resp. of Bob) contains the bits according to which (resp. ) is calculated. It contains the uncoded simulation bits of the respective party, between times and . In addition it contains all the rewind bits of levels the party has calculated between these times.

We note, that at every point of the simulation, having the uncoded simulation bits and the rewind bits calculated so far, both parties can calculate their cursors and and their simulations of the transcript. We denote these simulation vectors by: and for Alice and Bob respectively. We are now ready to introduce the coding scheme.

### The coding scheme

#### Initialization

. . , , where denotes an empty vector.

#### Iteration

• Simulate the transcript for consecutive times, disregarding the channel errors, as follows. The parties start by advancing and their respective cursors, by one. At odd cursors Alice sends , and at even cursors Bob sends . At odd cursors Alice updates her uncoded simulation vector by and her simulation of the transcript by whereas Bob updates and . The update for even cursors is done similarly with appropriate replacements. We note that since the block length is a power of two (and is therefore even) and since rewinding is done in full blocks, the parties will agree on the parity of the cursors even in the case where their cursors differ. Thus, the parties will always agree which one of them transmits, at every time point.

• For to , if for some integer , then rewind windows and have ended. Alice computes her rewind bit according to the procedure explained in the sequel. If she does nothing. If she rewinds to the value it had at the beginning of and deletes the corresponding values from . She also sets all the bits of in her uncoded simulation vector to zero, so they will not be re-detected as errors in the future. Bob does the same with the appropriate replacements.

### Calculation of the rewind bits

For the first layer, , the rewind bits are calculated using the algorithm for error detection using an extended-Hamming code, described in Definition 2. The reason for the choice of this procedure is the fact that in the first layer the difference between and is only the channel noise, which is i.i.d. Bernoulli(), and the fact that the extended-Hamming code is a good error detection code for such a noise. In particular, this code is proper [12], which means that the probability of error mis-detection is monotonically increasing for . As the probability of mis-detection for is equal to that of random hashing with the same number of bits, for we obtain favorable performance without randomness. The procedure is implemented as follows:

1. Alice calculates the syndrome vector as explained in Definition 2 according to appropriate rewind window . She then sends to Bob over the channel using repetitions per bit.

2. Bob decodes Alice’s syndrome using a majority vote for every bit. He then calculates his syndrome according to and sets his rewind bit to .

3. Bob sends Alice using repetitions per bit. Alice sets according to the respective majority vote.

For all other layers, , the procedure is implemented according to the polynomial based randomized error detection scheme from Definition 3. We start by assuming that the parties agree on the prime number for every layer . We also assume for simplicity of exposition, that for every rewind window, the parties commonly and independently draw a test point using a common random string. We denote the set comprising all the test points used by the scheme by , which contains elements. In Section IX we show how the common randomness assumption can be relaxed. The error detection is implemented as follows:

1. Alice uses the appropriate test point and the bits of the rewind window to calculate . She then sends the bits representing to Bob over the channel using repetitions per bit.

2. Bob decodes and calculates .

3. Bob calculates his rewind window to and sends it to Alice using repetitions.

4. Alice sets according to her respective majority vote.

Let us now bound the number of bits required for this procedure. First, we generously bound the number of bits in a rewind window of layer , which contains all the uncoded simulation bits and the nested rewind bits of the previous layers, by . For layer , the parties set to be the first prime number between and . Therefore, a number in can be represented by no more than bits. All in all the procedure described above required bits for layer . For simplicity of calculation, from this point on, we bound this number by

 3+(2+l)logk<3llogk, (36)

which applies for any and .

## Vii Analysis of the coding scheme : A Proof of Theorem 3

We start by giving the following notation:

• is the minimum between Alice’s and Bob’s cursor at any moment

• denote the respective values of at the end of the simulation

• and denote the first bits of Alice’s and Bob’s simulations of the transcript respectively, at the end of the simulation. We also assume that if or then the parties proceed the protocol by transmitting zeros

• We denote . Namely, it is defined as the disjunction between Alice’s and Bob’s respective rewind bits

The following two error events will be analyzed

• is the event in which

• is the event in which either or

The simulation error event is included in and we would like it to vanish with .

We start by analyzing and do it by lower bounding . We recall that by construction of the scheme, (resp. ) will rewind (resp. ) to the value it had at the beginning of the rewind window. Namely (resp. ) will be reduced by at most . It is now instrumental to use the definitions of and and observe that if either or (namely, if ) then the minimal among and (namely, ) will be reduced by at most . Recalling that we can now write

 j(T)≥T−L∑l=1kL−l∑m=1bl(m)kl=T(1−L∑l=1¯¯¯¯bl), (37)

where

 ¯¯¯¯bl≜∑kL−lm=1bl(m)kL−l (38)

denotes the average number of active (i.e., non-zero) rewind bits at level . We note that by construction of the scheme (including its use of randomness), the processes of the error generation and detection are identical for all blocks at level . For this reason, the probability of having an active rewind bit is also identical for all the blocks at level . We denote this probability by

 Pbl=Pr(bl(1)=1)=...=Pr(bl(kL−l)=1). (39)

Taking the expectation over (37) yields

 Ej(T)≥T(1−L∑l=1Pbl). (40)

In order to proceed with the calculation of , we define as the probability that either