The Slepian-Wolf (SW) coding system  is one of famous source coding systems with many terminals. In this coding system (see Fig. 1), two encoders independently encode source sequences from two correlated sources into codewords, and the decoder reconstructs both source sequences from the codewords. For this coding system, Slepian and Wolf  characterized the achievable rate region for discrete stationary memoryless sources (DMSs), where the achievable rate region is the set of rate pairs of encoders such that the decoding error probability vanishes as the blocklength tends to infinity.
Discrete-time source symbols are regarded as discrete-time samples of a discrete-time process such as coin flips or a continuous-time process such as a wave. In the above SW coding system, it is assumed that these processes are sampled at the encoders without delay. Thus, encoders can encode a pair of source sequences with an expected correlation. In other words, two encoders are synchronous. However, in practice, the encoders are not always synchronous. It is natural to assume that these processes are sampled at encoders with some unknown delays. In other words, the encoders are asynchronous. There are two reasons (i) and (ii) to justify this assumption: (i) Since the encoders are independent in the system and cannot exchange any information, it is difficult to completely adjust the time to start sampling among the encoders. Thus, in general, uncontrollable unknown delays occur. (ii) Even if the encoders can adjust the time to start sampling by exchanging information or referring a shared clock, the encoders become asynchronous when processes arrive late to the encoders. An example of this case is as follows: Observatories (encoders) on islands sample wave heights (source sequences) per unit time caused by breeze, an earthquake, a typhoon, etc. Since islands are separated, a wave reaches an island later than it reaches the other island. The observatories send the sequences to a weather center (decoder) on a coast city distant from there. In this example, the observatories (and also the weather center) do not know the actual delay of the wave in advance because there are many uncertainties such as the direction of breeze, the point of the earthquake center, shielding on the sea, etc. Thus, even if observatories can adjust the time to start sampling, they sample the wave with some unknown delay.
To make matters worse, unknown delays may cause uncertainty of statistical properties of sources. To justify this, we give an example where discrete-time source symbols are quantized and sampled version of a continuous signal. Let us consider correlated wave signals shown in Fig. 2 (subfig: wave 1). The upper wave is merely a constant wave in which sometimes large changes occur. The lower wave is its noisy version where a Gaussian process is added. In this example, we assume that each wave is sampled at each encoder per unit time at dotted lines and quantized with some resolution. We also assume that a large change does not affect two unit times. These sampled and quantized signals can be regarded as discrete-time and discrete-valued source symbols. If waves are sampled without any delay (see Fig. 2 (subfig: wave 1)), the encoders are synchronous. In that case, the correlation between two symbols is characterized by only one channel induced by a Gaussian process. On the other hand, if waves are sampled with a tiny delay as shown in Fig. 2 (subfig: wave 2), the correlation between two symbols no longer corresponds to that of the case without delay. When the delay is unknown, this causes uncertainty of statistical properties of sources, and now the properties are characterized by a set of channels rather than a singleton i.e., only one channel. We also note that when the delay corresponds to a unit time as shown in Fig. 2 (subfig: wave 3), the encoders encode source sequences with integer-valued delays as shown in Fig. 3.
As a consequence of the discussion so far, the asynchronous SW coding system can be represented as the SW coding system with integer-valued delays (Fig. 3), where the delays and statistical properties of sources are unknown. Thus, in what follows, a delay refers to an integer-valued delay unless otherwise stated. We note that when a continuous signal is not assumed behind source symbols, the case where delays are unknown but statistical properties are known is also worth considering. We also note that, in general, uncertainty of statistical properties does not only come from unknown (real-valued) delays.
There are some related studies to the asynchronous SW coding system in the case where statistical properties of sources are known in advance. Willems  considered the situation in which delays are unknown to encoders but known to the decoder. He showed that the achievable rate region for DMSs coincides with that of the synchronous SW coding system. In the same assumption as , Rimoldi and Urbanke  and Sun et al.  gave coding schemes based on source splitting. In these studies [4, 5, 6], it is implicitly assumed that for a given finite blocklength, the encoders continue to transmit codewords infinitely and the decoder has infinitely large memory to receive those infinitely many codewords. Most importantly, this assumption eliminates the effect of delays because the decoder can wait infinitely long time until receiving correlated codewords even if the blocklength is finite. However, in practice, the decoder cannot have infinitely large memory and wait for decoding infinitely long time. Moreover, when delays are very large, the decoding delay is also very large regardless of the blocklength. This justifies considering the coding system encoding only one pair of source sequences for a given blocklength as the system shown in Fig. 3
. In this coding system, the decoder outputs an estimation from a pair of codewords and does not wait for the next pair. Obviously, this system includes the case where a codeword consists of sub-codewords. Thus, the decoder waits until it receives a pair of codewords or finite pairs of sub-codewords, and does not wait infinitely long time for a given finite blocklength. Here, for fairness among encoders and simplicity of the system, the blocklength of source sequences are assumed to be the same. Oki and Oohama considered this coding system, i.e., the system shown in Fig. 3, where statistical properties of sources are known in advance. They assumed that delays are unknown to encoders and also the decoder, but maximum and minimum values of possible delays are bounded and known to the decoder. They showed that the achievable rate region for DMSs coincides with that of the synchronous SW coding system.
In this paper, we also consider the asynchronous SW coding system. Specifically, we consider the SW coding system with delays under the following two assumptions (i) and (ii): (i) Delays are unknown but maximum and minimum values of possible delays are known to encoders and the decoder. In the above example of islands, the maximum delay depends on the distance between the islands and is naturally known to the observatories and the weather center because the distance is known in advance. (ii) Sources are DMSs and the probability mass function (PMF) of the sources is unknown but a set of PMFs including it is known to encoders and the decoder. Unlike the assumption in , we allow delays to be unbounded and maximum and minimum values of possible delays to be subject to change by the blocklength. This allows us more detailed analyses such as the case where delays affect a half of source sequences, the case where a delay always occurs, etc. This can also be seen as the following situation: Each encoder has a FIFO (or LIFO) memory (i.e., a source sequence is always new or old). Since the encoding and decoding delay of a preceding sequence has an order depending on the blocklength, which sequence is stored in the memory depends on the blocklength. Consequently, possible delays also depend on the blocklength.
For this asynchronous SW coding system, we clarify the achievable rate region and show that the region does not always coincide with that of the synchronous SW coding system. This result is completely different from results of the above related studies. We use a usual information-theoretic technique as in  to the proof of the converse part. On the other hand, the direct part is the challenging part, and its proof is somewhat different from the usual one. Instead of directly dealing with coding for sources with delays, we deal with coding for a mixed source given by the mixture of all sources with possible delays and employ Gallager’s random coding techniques  and  to the mixed source. We note that given encoders and a decoder are universal in the sense that they can encode and decode (asymptotically) correctly even if statistical parameters such as the delays and the PMF of DMSs are unknown. We used an analogous technique using a mixed source in  to prove the existence of a code. However, since we could not use Gallager’s techniques to the mixed source in , we did not give an exponential bound in it. We also give an extension of our coding scheme: If possible delays are bounded, our coding scheme can be extended to a scheme which does not require knowledge of the actual bound of delays. We note that this is also an extension of the result by Oki and Oohama .
The rest of this paper is organized as follows. In Section II, we give some notations and the formal definition of the asynchronous SW coding system and the achievable rate region. In Section III, we show the achievable rate region and some properties of it. In Sections IV and V, we show the converse part and the direct part to clarify the achievable rate region, respectively. In Section VI, we give the extension of our coding scheme. In Section VII, we conclude the paper.
In this section, we provide some notations and the precise definition of the asynchronous SW coding system.
We will denote an -length sequence of symbols by , a sequence of symbols by , and a pair of sequences of symbols , , , by . If , we assume . For any finite sets and , we will denote the set of all PMFs over by , and the set of all conditional PMFs over given elements of by
. Unless otherwise stated, the PMF of a random variable (RV)on will be denoted by , and the conditional PMF of on given will be denoted by . We will denote the th power of a PMF by , i.e., , and the th power of a conditional PMF by , i.e., . In what follows, all logarithms and exponentials are taken to the base 2.
We assume that and are finite sets. We will denote a general source (i.e., a sequence of -length RVs) by the corresponding boldface letter (cf. ). Since a pair of DMSs is represented by a sequence of independent copies of a pair of RVs , we simply write it as .
In the asynchronous SW coding system, two -length sequences from DMSs are independently encoded by encoder 1 and encoder 2, respectively. Hence, for positive integers and , encoder 1 and encoder 2 are defined by the mappings
and the rates of these encoders are defined as
respectively. Since the encoders are asynchronous, encoder 1 might encode a source sequence while encoder 2 might encode a source sequence . In general, encoder 1 and encoder 2 encode sequences and , respectively, where is an integer which represents a relative222Suppose that encoder 1 and encoder 2 run with delays and , respectively. Then, for a source sequence encoded by encoders, it holds that . Hence, we may only consider a relative delay. delay (see Tables II and II).
Without loss of generality, we assume that because, for any or , is independent of . We denote by for the sake of brevity. Note that
where is the PMF of the pair of DMSs , and are the marginal PMFs of , and
We denote DMSs with a delay simply by which implies the sequence of RVs . We introduce the maximum and the minimum of possible delays, and denote the sequence by . Hence, any possible delay satisfies for any blocklength . We allow the maximum and the minimum of delays to be changed with the blocklength.
The decoder receives two codewords and , and outputs an estimate of the pair of sequences . Hence, the decoder is defined by the mapping
Then, for DMSs and a delay , the error probability is defined as
More generally, we will denote the error probability for a general source by , i.e.,
We will sometimes omit in the notation of when it is clear from the context.
In this coding system, we assume that the actual delay is unknown but the bound of delays is known to the encoders and the decoder. Furthermore, we assume that the PMF of the pair of DMSs is an element of a given set of PMFs and that the PMF is unknown but the set of PMFs is known to the encoders and the decoder. More precisely, the code is independent of and , but is allowed to be dependent on and . Hence, as we mentioned earlier, the code is universal in the sense that we require that they can encode and decode (asymptotically) correctly even if statistical parameters and are unknown.
We now define achievability and achievable rate region for the asynchronous SW coding system for a given set of PMFs of sources and a bound of delays.
Definition 1 (Achievability).
A pair is called achievable for a set of PMFs and a bound of delays if and only if there exists a sequence of codes satisfying
where is the set of possible delays.
Definition 2 (Achievable rate region).
For a set of PMFs and a bound of delays, the achievable rate region is defined by
When is a singleton, we simply write as . Moreover, when there is no delay, i.e, , we simply write as .
When is a singleton and , our coding system corresponds to the usual synchronous SW coding system. Hence, denotes the achievable rate region of the synchronous SW coding system.
Oki and Oohama  considered the case where is a singleton and delays are bounded in the sense that for a constant , . They assume that the delay is unknown to encoders and the decoder but is known to the decoder. For this special case, they showed that the achievable rate region coincides with that of the synchronous SW coding system, i.e.,
where denotes the entropy of , and denotes the conditional entropy of given . We will also denote by , and by to clear their PMFs.
In what follows, we sometimes use the following notations for the sake of simplicity: , , , and . Due to these notations, (4) is also written as
Iii Achievable Rate Region
In this section, we show the achievable rate region of the asynchronous SW coding system. We also show that the obtained region does not always coincide with that of the synchronous SW coding system.
The next theorem clarifies the achievable rate region (see also Fig. 4).
For a set of PMFs of DMSs and a bound of delays, we have
By the definition of , it holds that .
The proof of this theorem is given later in Sections IV and V. In this theorem, denotes the ratio of the maximum delay to the blocklength. For example, for an arbitrarily fixed constant , suppose that and . Then , and hence we have for any subset ,
Since , the rate region does not coincide with , i.e., that of the synchronous SW coding system, where we assume that and . This means that if delays have a significant influence in terms of the blocklength like this example, we cannot achieve a pair of rates of the synchronous SW coding system. As shown in a later section, this is because delays affect the first-order of rates. Although this fact is very important, it is not clear from previous studies  and .
More generally, we have the following corollaries.
Let . Then, if and only if for some .
If it holds that for some , we have
For such , we choose a boundary point such that . Then, according to Theorem 1, we have
This means that , and hence . This gives the if part.
On the other hand, if it holds that for all , it immediately holds that . This completes the proof. ∎
Let be a singleton. Then, if and only if and .
According to Corollary 1, if and only if
This holds if and only if and . ∎
Iv Converse Part
In this section, we give a proof of the converse part of Theorem 1.
Before the proof of the converse part, we show the next fundamental lemma.
For any and , we have
According to (1), for any and delay , we have
Similarly, we have
In order to emphasize the affect of delays, we give a converse bound for a given finite blocklength in the next theorem.
For any and any code , we have
where , , and
According to this theorem, delay affects the first-order of rates as the term . Hence, the larger becomes, the greater the difference between rates of synchronous SW coding systems and that of asynchronous SW coding systems. As an instance, we consider the case were delay occurs a ratio of for a given blocklength , i.e., and . Since the error probability must be small, this theorem implies that any pair of rates must satisfy
Now, we prove the converse part.
Suppose that . By definition of the achievability, there exists a sequence of codes such that
According to Theorem 2, for this sequence of codes and any , we have
where we use the fact that
Since this inequality holds for any , we have
Since this holds for any , we have
This completes the proof of the converse part of Theorem 1. ∎
V Direct Part and a Universal Coding Scheme
In this section, we give a proof of the direct part of Theorem 1. To this end, we will show a universal coding scheme for asynchronous SW coding systems.
The universal coding scheme using the minimum entropy decoder  is well known. However, since it is too much specialized to DMSs without delay, we cannot use it to our coding system straightforwardly. A natural extension of the minimum entropy decoder may be as follows: