    # Coalgebraic Tools for Randomness-Conserving Protocols

We propose a coalgebraic model for constructing and reasoning about state-based protocols that implement efficient reductions among random processes. We provide basic tools that allow efficient protocols to be constructed in a compositional way and analyzed in terms of the tradeoff between latency and loss of entropy. We show how to use these tools to construct various entropy-conserving reductions between processes.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

In low-level performance-critical computations—for instance, data-forwarding devices in packet-switched networks—it is often desirable to minimize local state in order to achieve high throughput. But if the situation requires access to a source of randomness, say to implement randomized routing or load-balancing protocols, it may be necessary to convert the output of the source to a form usable by the protocol. As randomness is a scarce resource to be conserved like any other, these conversions should be performed as efficiently as possible and with a minimum of machinery.

In this paper we propose a coalgebraic model for constructing and reasoning about state-based protocols that implement efficient reductions among random processes. Efficiency is measured by the ratio of entropy produced to entropy consumed. The efficiency cannot exceed the information-theoretic bound of unity, but it should be as close to unity as can be achieved with simple state-based devices. We provide basic tools that allow efficient protocols to be constructed in a compositional way and analyzed in terms of the tradeoff between latency and loss of entropy.

We use these tools to construct the following reductions between processes, where is the latency parameter:

• -uniform to -uniform with loss

• -uniform to arbitrary rational with loss

• -uniform to arbitrary with loss

• arbitrary to -uniform with loss

• to -uniform with loss

### 1.1 Related Work

Since von Neumann’s classic paper showing how to simulate a fair coin with a coin of unknown bias , many authors have studied variants of this problem. Our work is heavily inspired by the work of Elias 

, who studies entropy-optimal generation of uniform distributions from known sources. The definition of conservation of entropy and a concept related to latency are defined there. Mossel, Peres, and Hillar

 characterize the set of functions for which it is possible to simulate an -biased coin with a -biased coin when is unknown. Peres  shows how to iterate von Neumann’s procedure for producing a fair coin from a biased coin to approximate the entropy bound. Blum 

shows how to extract a fair coin from a Markov chain. Pae and Loui

[23, 22, 21] present several simulations for optimal conversions between discrete distributions, known and unknown. The main innovation in this paper is the coalgebraic model that allows compositional reasoning about such reductions.

There is also a large body of related work on extracting randomness from weak random sources (e.g. [20, 19, 29, 28, 8]). These models typically work with imperfect knowledge of the input source and provide only approximate guarantees on the quality of the output. Here we assume that the statistical properties of the input and output are known completely, and simulations must be exact.

## 2 Definitions

Informally, a reduction from a stochastic process to another stochastic process is a deterministic protocol that consumes a finite or infinite stream of letters from an alphabet and produces a finite or infinite stream of letters from another alphabet . If the letters of the input stream are distributed as , then the letters of the output stream should be distributed as . Of particular interest are reductions between Bernoulli processes, in which the letters of the input and output streams are independent and identically distributed according to distributions on and on , respectively. In this case, we say that the procedure is a reduction from to .

To say that the protocol is deterministic

means that the only source of randomness is the input stream. It makes sense to talk about the expected number of input letters read before halting or the probability that the first letter emitted is

, but any such statistical measurements are taken with respect to the distribution of the input stream.

There are several ways to formalize the notion of a reduction. One approach, following , is to model a reduction as a map that is monotone with respect to the prefix relation on strings; that is, if and is a prefix of , then is a prefix of . Monotonicity implies that can be extended uniquely by continuity to domain and range . The map would then constitute a reduction from the stochastic process to . To be a reduction from to , it must be that if the are independent and identically distributed as , and if is the value of the th letter of , then the are independent and identically distributed as .

In this paper we propose an alternative state-based approach in which protocols are modeled as coalgebras , where is a (possibly infinite) set of states. This approach allows a more streamlined treatment of common programming constructions such as composition, which is perhaps more appealing from a programming perspective.

### 2.1 Protocols and Reductions

Let , be finite alphabets. Let denote the set of finite words and the set of -words (streams) over . We use for elements of and for elements of . The symbols and denote the prefix and proper prefix relations, respectively.

If is a probability measure on , we endow with the product measure in which each symbol is distributed as . The notation for an event refers to this measure. The measurable sets of are the Borel sets of the Cantor space topology whose basic open sets are the intervals for , and , where .

A protocol is a coalgebra where . We can immediately extend to domain by coinduction:

 \dd(s,\eps) =(s,\eps) \dd(s,ax) =\letin(t,y)\dd(s,a)\letin(u,z)\dd(t,x)(u,yz).

Since the two functions agree on , we use the same name. It follows that

 \dd(s,xy) =\letin(t,z)\dd(s,x)\letin(u,w)\dd(t,y)(u,zw).

By a slight abuse, we define the length of the output as the length of its second component as a string in and write for , where .

A protocol also induces a partial map by coinduction:

 \ds(s,aα) =\letin(t,z)\dd(s,a)z⋅\ds(t,α).

It follows that

 \ds(s,xα) =\letin(t,z)\dd(s,x)z⋅\ds(t,α).

Given , this defines a unique infinite string in except in the degenerate case in which only finitely many output letters are ever produced. A protocol is said to be productive (with respect to a given probability measure on input streams) if, starting in any state, an output symbol is produced within finite expected time. It follows that infinitely many output letters are produced with probability 1.

Now let be a probability measure on . Endow with the product measure in which each symbol is distributed as , and define

 ν(a1a2⋯an)=ν(a1)ν(a2)⋯ν(an),  ai∈Γ.

We say that a protocol with start state is a reduction from to if for all ,

 Pr(y\prefeq\ds(s,α)) =ν(y), (1)

where the probability is with respect to the product measure on . This implies that the symbols of are independent and identically distributed as .

### 2.2 Restart Protocols

A prefix code is a subset such that every element of has at most one prefix in . Thus the elements of a prefix code are -incomparable. A prefix code is exhaustive (with respect to a given probability measure on input streams) if . By König’s lemma, if every has a prefix in , then is finite.

A restart protocol is protocol of a special form determined by a function , where is an exhaustive prefix code. Here is a designated start state. Intuitively, starting in , we read symbols of from the input stream until encountering a string , output , then return to and repeat. Note that we are not assuming to be finite.

Formally, we can take the state space to be

 S =\setu∈\Ssx⧸\prefequ for any x∈A

and define by

 \dd(u,a) ={(ua,\eps),ua∉A,(\eps,z),ua∈A and f(ua)=z

with start state . Then for all , .

As with the more general protocols, we can extend to a function on streams, but here the definition takes a simpler form: for ,

 \ds(\eps,xα) =f(x)⋅\ds(\eps,α),x∈A, α∈\So.

A restart protocol is positive recurrent (with respect to a given probability measure on input streams) if, starting in the start state , the probability of eventually returning to is , and moreover the expected time before the next visit to is finite. All finite-state restart protocols are positive recurrent, but infinite-state ones need not be.

### 2.3 Convergence

We will have the occasion to discuss the convergence of random variables. There are several notions of convergence in the literature, but for our purposes the most useful is

convergence in probability. Let and , be bounded nonnegative random variables. We say that the sequence converges to in probability and write if for all fixed ,

 Pr(\lenXn−X>δ)=o(1).

Let denote the expected value of and

its variance.

###### Lemma 1
1. If and , then with probability 1.

2. If and , then and .

3. If and is bounded away from 0, then .

4. If and for all , then .

###### Proof

For (iv), by the Chebyshev bound , for all fixed ,

 Pr(\lenXn−e>δ)<δ−2\VarXn,

and the  is by assumption. ∎

See [5, 12, 13, 9, 15, 16, 10] for a more thorough introduction.

### 2.4 Efficiency

The efficiency of a protocol is the long-term ratio of entropy production to entropy consumption. Formally, for a fixed protocol , , and , define the random variable

 En(α)=\len\dd(s,αn)n⋅H(ν)H(μ), (2)

where is the Shannon entropy

 H(\seqp1n) =−n∑i=1pilogpi

(logarithms are base if not otherwise annotated), and are the input and output distributions, respectively, and is the prefix of of length . Intuitively, the Shannon entropy measures the number of fair coin flips the distribution is worth, and the random variable measures the ratio of entropy production to consumption after steps of starting in state . Here (respectively, ) is the contribution along to the production (respectively, consumption) of entropy in the first steps. We write when we need to distinguish the associated with different protocols and start states.

In most cases of interest, converges in probability to a unique constant value independent of start state and history. When this occurs, we call this constant value the efficiency of the protocol and denote it by . Notationally,

 En\cpr\Eff\dd.

One must be careful when analyzing infinite-state protocols: The efficiency is well-defined for finite-state protocols, but may not exist in general. For restart protocols, it is enough to measure the ratio for one iteration of the protocol.

In §3.2 we will give sufficient conditions for the existence of that is satisfied by all protocols considered in §4.

### 2.5 Latency

The latency of a protocol from a given state is the expected consumption before producing at least one output symbol, starting from state . This is proportional to the expected number of input letters consumed before emitting at least one symbol. The latency of a protocol is finite if and only if the protocol is productive. All positive recurrent restart protocols that emit at least one symbol are productive. We will often observe a tradeoff between latency and efficiency.

Suppose we iterate a positive recurrent restart protocol only until at least one output symbol is produced. That is, we start in the start state and choose one string in the prefix code randomly according to . If at least one output symbol is produced, we stop. Otherwise, we repeat the process. The sequence of iterations to produce at least one output symbol is called an epoch

. The latency is the expected consumption during an epoch. If

is the probability of producing at least one output symbol in one iteration, then the sequence of iterations in a epoch forms a Bernoulli process with success probability . The latency is thus , the expected stopping time of the Bernoulli process, times the expected consumption in one iteration, which is finite due to the assumption that the protocol is positive recurrent.

## 3 Basic Results

Let be a protocol. We can associate with each and state a prefix code in , namely

 \pcdy ={minimal-length strings x∈\Ss such that y\prefeq\dd(s,x)}.

The string is generated as a prefix of the output if and only if exactly one is consumed as a prefix of the input. These events must occur with the same probability, so

 ν(y)=Pr(y\pref\ds(s,α))=μ(\pcdy). (3)

Note that need not be finite.

###### Lemma 2

If is a prefix code, then so is , and

 ν(A)=μ(⋃y∈A\pcdy).

If is exhaustive, then so is .

###### Proof

We have observed that each is a prefix code. If and are -incomparable, and if and , then and are -incomparable, thus is a prefix code. By (3), we have

 ν(A)=∑y∈Aν(y)=∑y∈Aμ(\pcdy)=μ(⋃y∈A\pcdy).

If is exhaustive, then so is , since the events both occur with probability 1 in their respective spaces. ∎

###### Lemma 3
1. The partial function is continuous, thus Borel measurable.

2. is almost surely infinite; that is, .

3. The measure on is the push-forward measure .

###### Proof

(i) Let . The preimage of , a basic open set of , is open in :

 \ds(s,−)−1(\setβy\prefβ) =\setαy\pref\ds(s,α)=⋃x∈\pcdy\setαx\prefα.

(ii) We have assumed finite latency; that is, starting from any state, the expected time before the next output symbol is generated is finite. Thus the probability that infinitely many symbols are generated is 1.

(iii) From (i) and (3) we have

 (μ∘\ds(s,−)−1)(\setβy\prefβ) =μ(⋃x∈\pcdy\setαx\prefα) =μ(\pcdy)=ν(y)=ν(\setβy\prefβ).

Since and agree on the basic open sets , they are equal. ∎

###### Lemma 4

If is a reduction from to , then the random variables defined in (2) are continuous and uniformly bounded by an absolute constant depending only on and .

###### Proof

For , let be the string of output symbols produced after consuming . The protocol cannot produce from with greater probability than allowed by , thus

 (mina∈Σμ(a))\lenx≤μ(x)≤ν(y)≤(maxb∈Γν(b))\leny.

Taking logs, , thus we can choose .

To show continuity, for ,

 E−1n(\setxx

an open set. ∎

### 3.1 Composition

Protocols can be composed sequentially as follows. If

 \dd1:S×Σ→S×\Gs \dd2:T×Γ→T×\Ds,

then

 (\dd1\cmp\dd2):S×T×Σ→S×T×\Ds (\dd1\cmp\dd2)((s,t),a)=\letin(u,y)\dd1(s,a)\letin(v,z)\dd2(t,y)((u,v),z).

Intuitively, we run for one step and then run on the output of . The following theorem shows that the map on infinite strings induced by the sequential composition of protocols is almost everywhere equal to the functional composition of the induced maps of the component protocols.

###### Theorem 3.1

The partial maps and of type are defined and agree on all but a -nullset.

###### Proof

We restrict inputs to the subset of on which is defined and produces a string in on which is defined. These sets are of measure 1. To show that , we show that the binary relation

 βRγ \Iff ∃α∈Σω ∃s∈S ∃t∈T  β=(\dd1\cmp\dd2)ω((s,t),α)∧γ=\ds2(t,\ds1(s,α))

on is a bisimulation. Unwinding the definitions,

 (\dd1\cmp\dd2)ω((s,t),aα) =\letin((u,v),z)(\dd1\cmp\dd2)((s,t),a)z⋅(\dd1\cmp\dd2)ω((u,v),α) =\letin(u,y)\dd1(s,a)\letin(v,z)\dd2(t,y)z⋅(\dd1\cmp\dd2)ω((u,v),α) \ds2(t,\ds1(s,aα)) =\letin(u,y)\dd1(s,a)\letinζ\ds1(u,α)\ds2(t,yζ) =\letin(u,y)\dd1(s,a)\letinζ\ds1(u,α)\letin(v,z)\dd2(t,y)z⋅\ds2(v,ζ) =\letin(u,y)\dd1(s,a)\letin(v,z)\dd2(t,y)z⋅\ds2(v,\ds1(u,α)),

so if and , then

 (\dd1\cmp\dd2)ω((s,t),aα) =z⋅(\dd1\cmp\dd2)ω((u,v),α) \ds2(t,\ds1(s,aα)) =z⋅\ds2(v,\ds1(u,α)).

Since the left-hand sides satisfy the relation , so do the pair and . ∎

###### Corollary 1

If is a reduction from to and is a reduction from to , then is a reduction from to .

###### Proof

By the assumptions in the statement of the corollary, and . By Theorem 3.1,

 ο =μ∘\ds1(s,−)−1∘\ds2(t,−)−1=μ∘(\ds2(t,−)∘\ds1(s,−))−1 =μ∘(\ds2(t,\ds1(s,−)))−1=μ∘((\dd1\cmp\dd2)ω((s,t),−))−1.

###### Theorem 3.2

If is a reduction from to and is a reduction from to , and if and exist, then exists and

 \Eff\dd1\cmp\dd2 =\Eff\dd1⋅\Eff\dd2.
###### Proof

Let , say with . Let and .

 \len(\dd1\cmp\dd2)((s,t),αn)n⋅H(ο)H(μ) =\len\dd2(t,\dd1(s,αn))n⋅H(ο)H(μ) =\len\dd2(t,\dd1(s,αn))\len\dd1(s,αn)⋅\len\dd1(s,αn)n⋅H(ο)H(ν)⋅H(ν)H(μ) =(\len\dd1(s,αn)n⋅H(ν)H(μ))(\len\dd2(t,βm)m⋅H(ο)H(ν)) =E\dd1,sn(α)⋅E\dd2,tm(β).

By Lemma 1(ii), this quantity converges in probability to , so this becomes . ∎

In the worst case, the latency of compositions of protocols is also the product of their latencies: if the first protocol only outputs one character at a time, then the second protocol may have to wait the full latency of the first protocol for each of the characters it needs to read in order to emit a single one.

### 3.2 Serial Protocols

Consider a sequence , of positive recurrent restart protocols defined in terms of maps , where the are exhaustive prefix codes, as described in §2.2. These protocols can be combined into a single serial protocol that executes one iteration of each , then goes on to the next. Formally, the states of are the disjoint union of the , and is defined so that for , and within behaves like .

Let and be the number of input symbols consumed and produced, respectively, in one iteration of the component protocol starting from . Let be the index of the component protocol in which the -th step of the combined protocol occurs. These are random variables whose values depend on the input sequence . Let and .

To derive the efficiency of serial protocols, we need a form of the law of large numbers (see

[5, 12]). Unfortunately, the law of large numbers as usually formulated does not apply verbatim, as the random variables in question are bounded but not independent, or (under a different formulation) independent but not bounded. Our main result, Theorem 3.3 below, can be regarded as a specialized version of this result adapted to our needs.

Our version requires that the variances of certain random variables vanish in the limit. This holds under a mild condition (4) on the growth rate of , the maximum consumption in the th component protocol, and is true for all serial protocols considered in this paper. The condition (4) is satisfied by all finite serial protocols in which is bounded, or and is unbounded.

###### Lemma 5

Let denote the variance of . Let . If

 mn=o(n−1∑i=0ci), (4)

then

 \Var∑ni=0Ci∑ni=0ci =o(1) \VarCn∑n−1i=0ci =o(1). (5)

 \Var∑ni=0Pi∑ni=0pi =o(1) \VarPn∑n−1i=0pi =o(1). (6)
###### Proof

Given , choose such that for all , then choose such that for all . Then

 \Var∑ni=0Ci∑ni=0ci =n∑i=0\VarCi(∑nj=0cj)2≤n∑i=0\ExpC2i(∑nj=0cj)2 =m−1∑i=0\ExpCi∑nj=0cj⋅Cin∑j=0cj+n∑i=m\ExpCi∑nj=0cj⋅Ci∑nj=0cj ≤m−1∑i=0\Expmi∑nj=0cj⋅Ci∑nj=0cj+n∑i=m\Expmi∑i−1j=0cj⋅Ci∑nj=0cj ≤m−1∑i=0\Exp\epsCi∑nj=0cj+n∑i=m\Exp\epsCi∑nj=0cj = n∑i=0\epsci∑nj=0cj = \eps \VarCn∑n−1i=0ci ≤\ExpC2n(∑n−1j=0cj)2≤m2n(∑n−1j=0cj)2≤\eps2<\eps.

As was arbitrary, (5) holds. If in addition , then since by Lemma 4, we also have

 \Var∑ni=0Pi∑ni=0pi =O(\Var∑ni=0Ci∑ni=0ci) \VarPn∑n−1i=0pi =O(\VarCn∑n−1i=0ci),

thus (6). ∎

The following is our main theorem.

###### Theorem 3.3

Let be a serial protocol with finite-state components satisfying (4). If the limit

 ℓ =limn∑ni=0pi∑ni=0ci (7)

exists, then the efficiency of the serial protocol exists and is equal to .

###### Proof

The expected time in each component protocol is finite, thus is unbounded with probability 1. By definition of , we have

 e(n)−1∑i=0Ci ≤n≤e(n)∑i=0Ci e(n)−1∑i=0Pi ≤\ds(s,αn)≤e(n)∑i=0Pi,

therefore

 ∑e(n)−1i=0Pi∑e(n)i=0Ci ≤En≤∑e(n)i=0Pi∑e(n)−1i=0Ci. (8)

The condition (7) implies that . By Lemma 5, the variance conditions (5) and (6) hold. Then by Lemma 1(iv),

 ∑ni=0Ci∑ni=0ci \cpr1 ∑ni=0Pi∑ni=0pi \cpr1 Cn∑n−1i=0ci \cpr0 Pn∑n−1i=0pi \cpr0.

Using Lemma 1(i)-(iii), we have

 ∑ni=0Pi∑n−1i=0Ci =(Pn∑n−1i=0pi+∑n−1i=0Pi∑n−1i=0pi)⋅∑n−1i=0pi∑n−1i=0ci⋅∑n−1i=0ci∑n−1i=0Ci\cprℓ

and similarly . The conclusion now follows from (8). ∎

## 4 Reductions

In this section we present a series of reductions between distributions of certain forms. Each example defines a sequence of positive recurrent restart protocols (§2.2) indexed by a latency parameter with efficiency tending to 1. By Theorem 3.3, these can be combined in a serial protocol (§3.2) with asymptotically optimal efficiency, albeit at the cost of unbounded latency.

### 4.1 Uniform \Imp Uniform

Let . In this section we construct a family of restart protocols with latency mapping -uniform streams to -uniform streams with efficiency . The Shannon entropy of the input and output distributions are and , respectively.

Let . Then . It follows that

 cmdk =Θ(1) 1−Θ(1)

Let the -ary expansion of be

 dk =m∑i=0aici, (10)

where , .

The protocol is defined as follows. Do calls on the -uniform distribution. For each , for of the possible outcomes, emit a -ary string of length , every possible such string occurring exactly times. For outcomes, nothing is emitted, and this is lost entropy, but this occurs with probability . After that, restart the protocol.

By elementary combinatorics,

 m−1∑i=0(m−i)aici ≤m−1∑i=0(m−i)(c−1)ci=c(cm−1)c−1−m. (11)

In each run of , the expected number of -ary digits produced is

 m∑i=0iaicid−k =d−k(m∑i=0maici−m∑i=0(m−i)aici) ≥m−d−k(c(cm−1)c−1−m) by (10) and (11) =m−Θ(1) by (9),

thus the entropy production is at least . The number of -ary digits consumed is , thus the entropy consumption is . The efficiency is

 mlogc−Θ(1)klogd≥1−Θ(k−1).

The output is uniformly distributed, as there are equal-probability outcomes that produce a string of length or greater, and each output letter appears as the th output letter in equally many strings of the same length, thus is output with equal probability.

### 4.2 Uniform \Imp Rational

Let . In this section, we will present a family of restart protocols mapping -uniform streams over to streams over a -symbol alphabet with rational symbol probabilities with a common denominator , e.g. , , . Unlike the protocols in the previous section, here we emit a fixed number of symbols in each round while consuming a variable number of input symbols according to a particular prefix code . The protocol has latency at most and efficiency , exhibiting a similar tradeoff to the previous family.

To define , we will construct an exhaustive prefix code over the source alphabet, which will be partitioned into pairwise disjoint sets associated with each -symbol output word . All input strings in the set will map to the output string .

By analogy with , let denote the probability of the word in the output process. Since the symbols of are chosen independently,