 # Exponential Strong Converse for Successive Refinement with Causal Decoder Side Information

We revisit the successive refinement problem with causal decoder side information considered by Maor and Merhav (2008) and strengthen their result by deriving an exponential strong converse theorem. To be specific, we show that for any rate-distortion tuple outside the rate-distortion region of the successive refinement problem with causal decoder side information, the excess-distortion probability approaches one exponentially fast. Our proof follows by judiciously adapting the recently proposed strong converse technique by Oohama using the information spectrum method, the variational form of the rate-distortion region and Hölder's inequality. The lossy source coding problem with causal decoder side information considered by El Gamal and Weissman is a special case of the current problem. Therefore, the exponential strong converse theorem for the El Gamal and Weissman problem follows as a corollary of our result.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

We revisit the successive refinement problem with causal decoder side information shown in Figure 1, which we refer to as the causal successive refinement problem. There are two encoders and two decoders. The decoders aim to recover the source sequence based on the encoded symbols and causally available private side information sequences. Specifically, given the source sequence , encoders and compress into codewords and respectively. At time , decoder aims to recover the -th source symbol using the codeword from encoder and side information up to time , i.e., . Similarly, at time , decoder aims to recover the -th source symbol as . Finally, at time , for , decoder

outputs source estimate

that, under a distortion measure , is required to be less than or equal to a specified distortion level . Throughout the paper, it is required that since the decoder at each time has both codewords while decoder has access to only.

This problem was first considered by Maor and Merhav in  who fully characterized the rate-distortion region for the problem. Maor and Merhav showed that, unlike the case with non-causal side information , no special structure e.g., degradedness, is required between the side information and . However, Maor and Merhav only presented a weak converse in . In this paper, we strengthen the result in  by providing an exponential strong converse theorem, which states that the joint excess-distortion probability approaches one exponentially fast if the rate-distortion tuple falls outside the rate-distortion region derived by Maor and Merhav.

### I-a Related Works

We first briefly summarize existing works on the successive refinement problem. The successive refinement problem was first considered by Equitz and Cover  and by Koshelev  who considered necessary and sufficient conditions for a source-distortion triple to be successively refinable. Rimoldi  fully characterized the rate-distortion region of the successive refinement problem under the joint excess-distortion probability criterion while Kanlis and Narayan 

derived the excess-distortion exponent in the same setting. The second-order asymptotic analysis of No and Weissman

, which provides approximations to finite blocklength performance and implies strong converse theorems, was derived under the marginal excess-distortion probabilities criteria. This analysis was extended to the joint excess-distortion probability criterion by Zhou, Tan and Motani . Other frameworks for successive refinement decoding include [9, 10, 11, 12].

The study of source coding with causal decoder side information was initiated by Weissman and El Gamal in  where they derived the rate-distortion function for the lossy source coding problem with causal side information at the decoders (see also [14, Chapter 11.2]). Subsequently, Timo and Vellambi  characterized the rate-distortion regions of the Gu-Effros two-hop network  and the Gray-Wyner problem  with causal decoder side information; Maor and Merhav  derived the rate-distortion region for the successive refinement of the Heegard-Berger problem  with causal side information available at the decoders; Chia and Weissman  considered the cascade and triangular source coding problem with causal decoder side information. However, to the best of our knowledge, no strong converse theorems exist for these problems.

As the information spectrum method will be used in this paper to derive an exponential strong converse theorem for the causal successive refinement problem, we briefly summarize the previous applications of this method to network information theory problems. In [21, 22, 23], Oohama used this method to derive exponential strong converses for the lossless source coding problem with one-helper [24, 25] (i.e., the Wyner-Ahlswede-Körner (WAK) problem), the asymmetric broadcast channel problem , and the Wyner-Ziv problem  respectively. Furthermore, Oohama’s information spectrum method was also used to derive exponential strong converse theorems for content identification with lossy recovery  by Zhou, Tan, Yu and Motani  and for Wyner’s common information problem under the total variation distance measure  by Yu and Tan .

### I-B Main Contribution and Challenges

We revisit the causal successive refinement problem and present an exponential strong converse theorem. For given rates and blocklength, define the joint excess-distortion probability as the probability that either decoder incurs a distortion level greater than the specified distortion level (see (4)) and define the non-excess-distortion probability as the probability that both decoders satisfy the specified distortion levels (see (24)). Our proof proceeds as follows. First, we derive a non-asymptotic converse (finite blocklength upper) bound on the non-excess-distortion probability of any code for the causal successive refinement problem using the information spectrum method. Subsequently, by using Cramér’s inequality and the variational formulation of the rate-distortion region, we show that the non-excess-distortion probability decays exponentially fast to zero as the blocklength tends to infinity if the rate-distortion tuple falls outside the rate-distortion region of the causal successive refinement problem.

As far as we are aware, this paper is the first to establish a strong converse theorem for any lossy source coding problem with causal decoder side information. Furthermore, our methods can be used to derive exponential strong converse theorems for other lossy source coding problems with causal decoder side information discussed in Section I-A. In particular, since the lossy source coding problem with causal decoder side information  is a special case of the causal successive refinement problem, the exponential strong converse theorem for the problem in  follows as a corollary of our result.

In order to establish the strong converse in this paper, we must overcome several major technical challenges. The main difficulty lies in the fact that for the causal successive refinement problem, the side information is available to the decoder causally instead of non-causally. This causal nature of the side information makes the design of the decoder much more complicated and involved, which complicates the analysis of the excess-distortion probability. We find that classical strong converse techniques like the image size characterization  and the perturbation approach  cannot lead to a strong converse theorem due to the above-mentioned difficulty. However, it is possible that other approaches different from ours can be used to obtain a strong converse theorem for the current problem. For example, it is interesting to explore whether two recently proposed strong converse techniques in [34, 35] can be used for this purpose considering the fact that the methods in [34, 35] have been successfully applied to problems including the Wyner-Ziv problem  and the Wyner-Ahlswede-Körner (WAK) problem [24, 25].

## Ii Problem Formulation and Existing Results

### Notation

Random variables and their realizations are in upper (e.g., ) and lower case (e.g., ) respectively. Sets are denoted in calligraphic font (e.g., ). We use to denote the complement of and use

to denote a random vector of length

. We use and to denote the set of positive real numbers and integers respectively. Given a real number , we often use the shorthand . Given two integers and , we use to denote the set of all integers between and and use to denote

. The set of all probability distributions on

is denoted as

and the set of all conditional probability distributions from

to is denoted as . For information-theoretic quantities such as entropy and mutual information, we follow the notation in ., we follow 

. In particular, when the joint distribution of

is , we use and interchangeably.

### Ii-a Problem Formulation

Let be a joint probability mass function (pmf) on the finite alphabet with its marginals denoted in the customary way, e.g., , . Throughout the paper, we consider memoryless sources , which are generated i.i.d. according to . Let be the alphabet of the reproduced source symbol at decoder where . Recall the encoder-decoder system model for the causal successive refinement problem illustrated in Figure 1.

A formal definition of a code for the causal successive refinement problem is as follows.

###### Definition 1.

An -code for the causal successive refinement problem consists of

• two encoding functions

 fk:Xn→Mk:={1,…,Mk}, k∈, (1)
• and decoding functions: for any

 ϕ1,i: M1×Zi→^X1, (2) ϕ2,i: M1×M2×Yi→^X2. (3)

For , let be two distortion measures. Given the source sequence and a reproduced version , we measure the distortion between them using the additive distortion measure . To evaluate the performance of any code for the causal successive refinement problem, given distortion specified levels , we consider the following joint excess-distortion probability

 P(n)e(D1,D2):=Pr{d1(Xn,^Xn1)>D1 or d2(Xn,^Xn2)>D2}. (4)

Given , the -rate-distortion region for the causal successive refinement problem is defined as follows.

###### Definition 2.

Given any , a rate-distortion tuple is said to be -achievable if there exists a sequence of -codes such that

 limsupn→∞1nlogM1 ≤R1, (5) limsupn→∞1nlogM2 ≤R2−R1, (6) limsupn→∞P(n)e(D1,D2) ≤ε. (7)

The closure of the set of all -achievable rate-distortion tuples is called the -rate-distortion region for the causal successive refinement problem and is denoted as .

Note that in Definition 2, is the sum rate of the two decoders. Using Definition 2, the rate-distortion region for the problem is defined as

 R :=⋂ε∈(0,1)R(ε). (8)

### Ii-B Existing Results

In this section, we recall the characterization of the rate-distortion region by Maor and Merhav [1, Theorem 1]. For , let be a random variable taking values in finite alphabet . For simplicity, throughout the paper, we let

 T :=(X,Y,Z,W1,W2,^X1,^X2), (9)

and let be a particular realization of and its alphabet set, respectively.

Define the following set of joint distributions:

 P∗ :={QT∈P(T):|W1|≤|X|+3, |W2|≤|X|(|X|+3)+1,QXYZ=PXYZ, (W1,W2)−X−(Y,Z), ^X1=ϕ1(W1,Z) for some ϕ1:W1×Z→^X1, ^X2=ϕ2(W1,W2,Y) for some ϕ2:W1×W2×Y→^X2}. (10)

Given any joint distribution , define the following set of rate-distortion tuples

 R(QT):={(R1,R2,D1,D2): R1 ≥I(QX,QW1|X),R2−R1≥I(QX|W1,QW2|XW1|QW1) D1 ≥E[d1(X,ϕ1(W1,Z))], D2≥E[d2(X,ϕ2(W1,W2,Y))]}. (11)

Maor and Merhav  defined the following information theoretical sets of rate-distortion tuples

 R∗:=⋃QT∈P∗R(QT). (12)
###### Theorem 1.

The rate-distortion region for the causal successive refinement problem satisfies

 R=R∗. (13)

We remark that in , Maor and Merhav considered the average distortion criterion

 limsupn→∞E[dk(Xn,^Xnk)]≤Dk, k∈, (14)

instead of the joint excess-distortion probability criterion (see (7)) in Definition 2. However, with slight modification to the proof of , it can be verified (see Appendix -A) that the rate-distortion region under the joint excess-distortion probability criterion, is identical to the rate-distortion region derived by Maor and Merhav under the average distortion criterion.

Theorem 1 implies that if a rate-distortion tuple falls outside the rate-distortion region, i.e., , then the excess-distortion probability is bounded away from zero. We strengthen the converse proof of Theorem 1 by showing that if , the excess-distortion probability approaches one exponentially fast as the blocklength tends to infinity.

## Iii Main Results

### Iii-a Preliminaries

In this subsection, we present necessary definitions and a key lemma before stating our main result.

Define the following set of distributions

 Q :={QT∈P(T):|W1|≤|X||Y||Z||^X1||^X2|, |W2|≤(|X||Y||Z||^X1||^X2|)2}. (15)

Recall that, for any number , we use and interchangeably. Given any , for any , define the following linear combination of log likelihoods

 ω(μ,α,β,γ)QT(t) :=logQX(x)PX(x)+logQYZ|XW1W2(y,z|x,w1,w2)PYZ|X(y,z|x)+logQXYW2|ZW1^X1(x,y,w2|z,w1,^x1)QXYW2|ZW1(x,y,w2|z,w1) +logQ^X2|XYZW1W2^X1(^x2|x,y,z,w1,w2,^x1)Q^X2|YW1W2(^x2|y,w1,w2)+μα¯βlogQX|W1(x|w1)PX(x) +μαβlogQX|W1W2(x|w1,w2)QX|W1(x|w1)+μ¯α(d1(x,^x1)+γd2(x,^x2)). (16)

Given any and any , define the cumulant generating function of as

 Ω(θ,μ,α,β,γ)(QT) :=−logEQT[exp(−θω(μ,α,β,γ)QT(T))]. (17)

Furthermore, define the minimal cumulant generating function over distributions in as

 Ω(θ,μ,α,β,γ) :=minQT∈QΩ(θ,μ,α,β,γ). (18)

Finally, given any rate-distortion tuple , define the exponent functions

 F(θ,μ,α,β,γ)(R1,R2,D1,D2) :=Ω(θ,μ,α,β,γ)−θμ(α(¯βR1+β(R2−R1))+¯α(D1+γD2))1+6θ+θμα(1+β), (19) F(R1,R2,D1,D2) :=sup(θ,μ,α,β,γ)∈R2+×[0,1]3F(θ,μ,α,β,γ)(R1,R2,D1,D2). (20)

With the above definitions, we have the following lemma establishing the properties of the exponent function .

###### Lemma 2.

The following claims hold.

1. For any rate-distortion tuple outside the rate-distortion region, i.e., , we have

 F(R1,R2,D1,D2)>0, (21)
2. For any rate-distortion tuple inside the rate-distortion region, i.e., , we have

 F(R1,R2,D1,D2)=0. (22)

The proof of Lemma 2 is inspired by [23, Property 4], [29, Lemma 2] and is given in Section V. As will be shown in Theorem 3, the exponent function is a lower bound on the exponent of the probability of non-excess-distortion probability for the causal successive refinement problem. Thus, Claim (i) in Lemma 2 is crucial to establish the exponential strong converse theorem which states that the excess-distortion probability (see (4)) approaches one exponentially fast with respect to the blocklength of the source sequences.

### Iii-B Main Result

Define the probability of non-excess-distortion as

 P(n)c(D1,D2) :=1−P(n)e(D1,D2) (23) =Pr{d1(Xn,^Xn1)≤D1, and d2(Xn,^Xn2)≤D2}. (24)
###### Theorem 3.

For any -code for the causal successive refinement problem such that

 logM1≤nR1,logM2≤n(R2−R1), (25)

we have the following non-asymptotic upper bound on the probability of non-excess-distortion

 P(n)c(D1,D2) ≤7exp(−nF(R1,R2,D1,D2)). (26)

The proof of Theorem 3 is given in Section IV. Several remarks are in order.

First, our result is non-asymptotic, i.e., the bound in (26) holds for any . In order to prove Theorem 3, we adapt the recently proposed strong converse technique by Oohama  to analyze the probability of non-excess-distortion probability. We first obtain a non-asymptotic upper bound using the information spectrum of log-likelihoods involved in the definition of (see (16)) and then apply Cramér’s bound on large deviations (see e.g., [29, Lemma 13]) to obtain an exponential type non-asymptotic upper bound. Subsequently, we apply the recursive method  and proceed similarly as in  to obtain the desired result. Our method can also be used to establish similar results for other source coding problems with causal decoder side information [15, 20, 18].

Second, we believe that classical strong converse techniques including the image size characterization  and the perturbation approach  cannot lead to the strong converse theorem for the causal successive refinement problem. The main obstacle is that the side information is available causally and thus complicates the decoding analysis significantly.

Invoking Lemma 2 and Theorem 3, we conclude that the exponent on the right hand side of (26) is positive if and only if the rate-distortion tuple is outside the rate-distortion region, which implies the following exponential strong converse theorem.

###### Theorem 4.

For any sequence of -codes satisfying the rate constraints in (25), given any distortion levels , we have that if , then the probability of correct decoding decays exponentially fast to zero as the blocklength of the source sequences tends to infinity.

As a result of Theorem 4, we conclude that for every , the -rate distortion region (see Definition 2) satisfies that

 R(ε)=R, (27)

i.e., strong converse holds for the causal successive refinement problem. Using the strong converse theorem and Marton’s change-of-measure technique , similarly to [29, Theorem 5], we can also derive an upper bound on the excess-distortion probability. Furthermore, applying the one-shot techniques in , we can also establish a non-asymptotic achievability bound. Applying the Berry-Esseen theorem to the achievability bound and analyzing the non-asymptotic converse bound in Theorem 3, similarly to , we conclude that the backoff from the rate-distortion region at finite blocklength scales on the order of . However, nailing down the exact second-order asymptotics [39, 40] is challenging and is left for future work.

Our main results in Lemma 2, Theorems 3 and 4 can be specialized to the lossy source coding problem with causal decoder side information  because it is a special case of the causal successive refinement problem.

## Iv Proof of the Non-Asymptotic Converse Bound (Theorem 3)

### Iv-a Preliminaries

Given any -code with encoding functions and and decoding functions , we define the following induced conditional distributions:

 PSk|Xn(sk|xn) :=1{sk=fk(xn)}, k∈, (28) P^Xn1|S1Zn(^xn1|s1,zn) :=∏i∈[n]1{^x1,i=ϕ1,i(s1,zi)}, (29) P^Xn2|S1S2Yn(^xn2|s1,s2,yn) :=∏i∈[n]1{^x2,i=ϕ2,i(s1,s2,yi)}. (30)

For simplicity, in the following, we let

 G :=(Xn,Yn,Zn,S1,S2,^Xn1,^Xn2),, (31)

and let be a particular realization and the alphabet of respectively. With above definitions, we have that the distribution satisfies that for any ,

 PG(g) :=PnXYZ(xn,yn,zn)PS1|Xn(s1|xn)PS2|Xn(s2|xn)P^Xn1|S1Zn(^xn1|s1,zn)P^Xn2|S1S2Yn(^xn2|s1,s2,yn). (32)

In the remaining part of this section, all distributions denoted by are induced by the joint distribution .

Let auxiliary random variables be and for all . Note that as a function of , the sequence

is a Markov chain under

. Throughout the paper, for each , we let

 Ti :=(Xi,Yi,Zi,W1,i,W2,i,^X1,i,^X2,i), (33)

and let be a particular realization and the alphabet of , respectively. For , let be arbitrary distributions and let , , , , and be induced distributions of . Given any positive real number , define the following subsets of :

 B1 :={g:0≥1n∑i∈[n]logQXi(xi)PX(xi)−η}, (34) B2 :={g:0≥1n∑i∈[n]logQYiZi|XiW1,iW2,i(yi,zi|xi,w1,i,w2,i)PYZ|X(yi,zi|xi)−η}, (35) B3 :={g:0≥1n∑i∈[n]logQXiYiW2,i|ZiW1,i^X1,i(xi,yi,w2,i|zi,w1,i,^x1,i)PXiYiW2,i|ZiW1,i(xi,yi,w2,i|zi,w1,i)−η}, (36) B4 :={g:0≥1n∑i∈[n]logQ^X2,i|XiYiZiW1,iW2,i^X1,i(^x2,i|xi,yi,zi,w1,i,w2,i,^x1,i)P^X2,i|YiW1,iW2,i(^x2,i|yi,w1,i,w2,i)−η}, (37) B5 :={g:R1≥1n∑i∈[n]logQXi|W1,i(xi|w1,i)PX(xi)−η}, (38) B6 :={g:R2−R1≥1n∑i∈[n]logQXi|W1,iW2,i(xi|w1,i,w2,i)PXi|W1,i(xi|w1,i)−η}, (39) B7 :={g:D1≥1n∑i∈[n]logexp(d1(xi,^x1,i))}, (40) B8 :={g:D2≥1n∑i∈[n]logexp(d2(xi,^x2,i))}. (41)

### Iv-B Proof Steps

###### Lemma 5.

For any -code for the causal successive refinement problem satisfying (25), given any distortion levels , we have

 P(n)c(D1,D2)≤Pr{⋂i∈Bi}+6exp(−nη). (42)

The proof of Lemma 5 is given in Appendix -B and divided into two steps. First, we derive a -letter non-asymptotic upper bound which holds for certain arbitrary -letter auxiliary distributions. Subsequently, we single-letterize the derived bound by proper choice of auxiliary distributions and careful decomposition of induced distributions of .

For simplicity, in the following, we will use to denote and use to denote . Given any and any , define

 f(μ,α,β,γ)Qi,Pi(ti) :=QXi(xi)PX(xi)QYiZi|XiW1,iW2,i(yi,zi|xi,w1,i,w2,i)PYZ|X(yi,zi|xi)QXiYiW2,i|ZiW1,i^X1,i(xi,yi,w2,i|zi,w1,i,^x1,i)PXiYiW2,i|ZiW1,i(xi,yi,w2,i|zi,w1,i) ×Q^X2,i|XiYiZiW1,iW2,i^X1,i(^x2,i|xi,yi,zi,w1,i,w2,i,^x1,i)P^X2,i|YiW1,iW2,i(^x2,i|yi,w1,i,w2,i)Qμα¯βXi|W1,i(xi|w1,i)Pμα¯βX(xi)QμαβXi|W1,iW2,i(xi|w1,i,w2,i)PμαβXi|W1,i(xi|w1,i) ×exp(μ¯α(d1(xi,^x1,i))+γd2(x,^x2,i)). (43)

Furthermore, given any non-negative real number , define

 Ω(λ,μ,α,β,γ)({Pi,Qi}i∈[n]) :=−logE[exp(−λ∑i∈[n]logf(μ,α,β,γ)Qi,Pi(Ti))]. (44)

Finally, given , define the following linear combination of rate-distortion tuples:

 κ(λ,μ,α,β,γ) :=λμ(α(¯βR1+β(R2−R1))+¯α(D1+γD2)). (45)

Using Cramér’s bound [29, Lemma 13], we obtain the following non-asymptotic exponential type upper bound on the probability of non-excess-distortion, whose proof is given in in Appendix -D.

###### Lemma 6.

For any -code satisfying the conditions in Lemma 5, given any distortion levels , we have

 P(n)c(D1,D2) ≤7exp(−n1nΩ(λ,μ,α,β,γ)({Pi,Qi}i∈[n])−κ(λ,μ,α,β,γ)1+λ(4+μα)). (46)

Furthermore, let

 Ω––(λ,μ,α,β,γ)({Pi}i∈[n] :=infn∈Nsup{Qi}i∈[n]Ω(λ,μ,α,β,γ)({Pi,Qi}i∈[n], (47)

and given any such that , let

 θ :=λ1−2λ−λμαβ. (48)

Then we have

 λ=θ1+2θ+θμαβ. (49)

The following lemma which relates