 # The Structure of the Realizations of the Causal Information Rate-Distortion Function for Markovian Sources: Realizations with Densities

The main purpose of this note is to show that in a realization (x_1^n, y_1^n) of the causal information rate-distortion function (IRDF) for a κ-th order Markovian source x_1^n, under a single letter sum distortion constraint, the smallest integer ℓ for which y_k y_1^k-1,x_k-ℓ+1^k x_1^k-ℓ holds is ℓ=κ. This result is derived under the assumption that the sequences (x_1^n,y_1^n) have a joint probability density function.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

Consider the causal information rate-distortion function (IRDF) for a random source , defined as

 Ritc,n(D)\eq1ninfI(\rvaxn1;\rvayn1), (1)

where the minimization is over all conditional PDFs satisfying the distortion constraint

 1n\Expe\sumfromtoi=1nρ(\rvaxi,\rvayi)≤D (2)

and the causality Markov chains

 \rvayi1⟷\rvaxi1⟷\rvayni+1,\fspacei=1,…,n. (3)

If the infimum is achieved by some conditional distribution, the associated pair of sequences is called a realization of . Here we assume that such distribution exists and that the corresponding realization has a joint PDF. This assumption is satisfied if, for example, is Gaussian and .

The first purpose of this note is to show that in a realization of the causal IRDF for a -th order Markovian source , under the average distortion constraint (2), and supposing that in such realization the sequences have a joint PDF, it holds that

 f\rvayk|\rvaxn1,\rvayk−11(yk|xn1,yk−11) =\expo−sρ(xk,yk)˘Fk(xkk−κ+1,yk1)∫\expo−sρ(xk,yk)˘Fk(xkk−κ+1,yk1)dyk (4a) where f\rvaxn1 is the PDF of \rvaxn1 and ˘Fk(xkk−κ+1,yk1) =\expo∫ln(∫\expo−sρ(xk+1,yk+1)˘Fk+1(xk+1k−κ+2,yk+11)dyk+1)f\rvaxnk+1|\rvaxkk−κ+1(xnk+1|xkk−κ+1)dxnk+1 (4b)

The expressions given in (4) are a special case of the ones given by [1, equations (16),(17),(18)] for abstract spaces, where their derivation is not included. The value of our first result resides in that

• We provide a proof for the validity of (4) (absent in ).

• In this proof, we pose the causal IRDF optimization problem with as the decision variable (instead of the collection as would be the case in  for probability measures having an associated PDF). Accordingly, we impose an explicit causality constraint on , instead of enforcing causality structurally by restricting to be the product of , as done in [1, 2].

The second (and main) goal of this document is to note that from (4a) it is clear that

 \rvayk⟷\rvayk−11,\rvaxkk−κ+1⟷\rvaxk−κ1 (5)

holds, and that

 \rvayk⟷\rvayk−11,\rvaxk⟷\rvaxk−11 (6)

does not hold, except for . Crucially, (6) does not become true by supposing that the joint PDF of is stationary, thus contradicting [2, Remark IV.5] and what is stated in the discussion paragraph at the end of [1, Section V].

## Ii Proof

The causal IRDF under the above conditions is yielded by the solution to the following optimization problem:

 minimize: I(\rvaxn1;\rvayn1) (7a) subject to: (∫f\rvayn1|\rvaxn1(yn1|xn1)dyn1−1)f\rvaxn1(xn1)=0,\fspace∀xn1 (7b) ∬f\rvayn1|\rvaxn1(yn1|xn1)f\rvaxn1(xn1)\sumfromtok=1nρ(xk,yk)dyn1dxn1≤D (7c) (f\rvayk1|\rvaxk1(yk1|xk1)−f\rvayk1|\rvaxn1(yk1|xn1))f\rvaxn1(xn1)=0,\fspace∀yk1,xn1,k=1,…,n. (7d)

where the minimization is over the conditional PDF . Notice that (7d) is an explicit causality constraint equivalent to (3).

Let be any conditional PDF, and define

 g\rvayn1|\rvaxn1 \eq(f′\rvayn1|\rvaxn1−f\rvayn1|\rvaxn1) (8) g\rvayn1(yn1) \eq∫g\rvayn1|\rvaxn1(yn1|xn1)f\rvaxn1(xn1)dxn1 (9) f\e\rvayn1|\rvaxn1 \eqf\rvayn1|\rvaxn1+\eg\rvayn1|\rvaxn1 (10) f\e\rvayn1(yn1) \eq∫f\e\rvayn1|\rvaxn1(yn1|xn1)f\rvaxn1(xn1)dxn1 (11)

where .

Before writing the Lagrangian and taking its Gateaux differential, let us obtain the Gateaux differential of in the direction , given by

 dI(\rvaxn1;\rvayn1)d\e∣∣\e=0 =dd\e⎡⎢⎣∬f\e\rvayn1|\rvaxn1(yn1|xn1)f\rvaxn1(xn1)ln⎛⎜⎝f\e\rvayn1|\rvaxn1(yn1|xn1)f\e\rvayn1(yn1)⎞⎟⎠dyn1dxn1⎤⎥⎦∣∣∣\e=0 (12) =∬g\rvayn1|\rvaxn1(yn1|xn1)f\rvaxn1(xn1)ln(f\rvayn1|\rvaxn1(yn1|xn1)f\rvayn1(yn1))dyn1dxn1+R (13)

where

 R \eq∬f\rvayn1|\rvaxn1(yn1|xn1)f\rvaxn1(xn1)(g\rvayn1|\rvaxn1(yn1|xn1)f\rvayn1|\rvaxn1(yn1|xn1)−g\rvayn1(yn1)f\rvayn1(yn1))dyn1dxn1 (14) =∬g\rvayn1|\rvaxn1(yn1|xn1)f\rvaxn1(xn1)dyn1dxn1−∬f\rvayn1,\rvaxn1(yn1,xn1)g\rvayn1(yn1)f\rvayn1(yn1)dyn1dxn1 (15) =∫g\rvayn1(yn1)dyn1−∫g\rvayn1(yn1)f\rvayn1(yn1)(∫f\rvayn1,\rvaxn1(yn1,xn1)dxn1)dyn1 (16) =0 (17)

On the other hand, for each , the causality constraint (7d) appears in the Lagrangian as

 ∬ λi(xn1,yi1)[f\rvayi1|\rvaxi1(yi1|xi1)−f\rvayi1|\rvaxn1(yi1|xn1)]f\rvaxn1(xn1)dyi1dxn1 (18) = ∬λi(xn1,yi1)(∫[f\rvayn1|\rvaxi1(yn1|xi1)−f\rvayn1|\rvaxn1(yn1|xn1)]dyni+1)f\rvaxn1(xn1)dyi1dxn1 (19) = ∫(∫λi(xn1,yi1)f\rvayn1|\rvaxi1(yn1|xi1)f\rvaxn1(xn1)dxn1−∫λi(xn1,yi1)f\rvayn1|\rvaxn1(yn1|xn1)f\rvaxn1(xn1)dxn1)dyn1 (20)

It will be convenient to manipulate this expression so as to give it a structure similar to the other terms in the Lagrangian. For this purpose, notice that

 ∫λi(xn1,yi1)f\rvayn1|\rvaxi1(yn1|xi1) f\rvaxn1(xn1)dxn1 (21) =∫λi(xn1,yi1)f\rvayn1,\rvaxi1(yn1,xi1)f\rvaxni+1|\rvaxi1(xni+1|xi1)dxn1 (22) =∫f\rvayn1,\rvaxi1(yn1,xi1)(∫λi(xn1,yi1)f\rvaxni+1|\rvaxi1(xni+1|xi1)dxni+1)dxi1 (23) =∫f\rvayn1,\rvaxi1(yn1,xi1)¯λ(xi1,yi1)dxi1 (24) =∫(∫f\rvayn1,\rvaxn1(yn1,xn1)dxni+1)¯λ(xi1,yi1)dxi1 (25) =∫f\rvayn1|\rvaxn1(yn1|xn1)f\rvaxn1(xn1)¯λ(xi1,yi1)dxn1 (26)

where

 ¯λi(xi1,yi1)\eq∫λi(xn1,yi1)f\rvaxni+1|\rvaxi1(xni+1|xi1)dxni+1,\fspacei=1,…,n. (27)

Substituting this into (20) we obtain

 ∫ λi(xn1,yi1)(f\rvayi1|\rvaxi1(yi1|xi1)−f\rvayi1|\rvaxn1(yi1|xn1))f\rvaxn1(xn1)dyi1dxn1 (28) =∫(¯λi(xi1,yi1)−λi(xn1,yi1))f\rvayn1|\rvaxn1(yn1|xn1)f\rvaxn1(xn1)dyn1dxn1 (29)

We can now write the Lagrangian associated with optimization problem (7) as

 \Lsp(f\rvayn1|\rvaxn1) \eqI(\rvaxn1;\rvayn1)+∫η(xn1)(∫f\rvayn1|\rvaxn1(yn1|xn1)dyn1−1)f\rvaxn1(xn1)dxn1 (30) +s(∫f\rvayn1|\rvaxn1(yn1|xn1)f\rvaxn1(xn1)(\sumfromtoi=1nρ(xi,yi))dxn1dyn1−D) (31) +\Sumfromtoi=1n∫(¯λi(xi1,yi1)−λi(xn1,yi1))f\rvayn1|\rvaxn1(yn1|xn1)f\rvaxn1(xn1)dyn1dxn1 (32)

From the theory of Lagrangian optimization on vector spaces

, is a solution to Optimization Problem (7) only if

 0 =dd\e\Lsp(f\e\rvayn1|\rvaxn1)∣∣\e=0 (33) =\Sumoveryn1,xn1[ln(f\rvayn1|\rvaxn1(yn1|xn1)f\rvayn1(yn1))+η(xn1)+\sumfromtoi=1n(sρ(xi,yi)+¯λi(xi1,yi1)−λi(xn1,yi1))] \fspace\fspace\fspace\fspace\fspace\fspace\fspace\fspace\fspace\fspace\fspace\fspace\fspace\fspace\fspace\fspace\fspace×g\rvayn1|\rvaxn1(yn1|xn1)f\rvaxn1(xn1) (34)

for every function as defined in (8), i.e., for every conditional PDF . This holds if and only if for every :

 ln(f\rvayn1|\rvaxn1(yn1|xn1)f\rvayn1(yn1)) =−η(xn1)−\sumfromtoi=1n(sρ(xi,yi)+¯λi(xi1,yi1)−λi(xn1,yi1)) (35) ⟺f\rvayn1|\rvaxn1(yn1|xn1) =\expo−η(xn1)−\sumfromtoi=1n(sρ(xi,yi)+¯λi(xi1,yi1)−λi(xn1,yi1))f\rvayn1(yn1) (36)

The Lagrange multiplier function must enforce the constraint (7b). Hence,

 f\rvayn1|\rvaxn1(yn1|xn1) =\expo−\sumfromtoi=1n(sρ(xi,yi)+¯λi(xi1,yi1)−λi(xn1,yi1))f\rvayn1(yn1)K1(xn1), (37)

where

 K1(xn1) \eq∫\expo−\sumfromtoi=1n(sρ(xi,yi)+¯λi(xi1,yi1)−λi(xn1,yi1))f\rvayn1(yn1)dyn1 (38)

Marginalizing over we obtain

 f\rvayk1|\rvaxn1(yk1|xn1) =\expo−\sumfromtoi=1k(sρ(xi,yi)+¯λi(xi1,yi1)−λi(xn1,yi1))∫\expo−\sumfromtoi=k+1n(sρ(xi,yi)+¯λi(xi1,yi1)−λi(xn1,yi1))f\rvayn1(yn1)dynk+1K1(xn1) (39)

Using Bayes’ rule we can write

 f\rvayk|\rvaxn1,yk−11(yk|xn1,yk−11) =f\rvayk1|\rvaxn1(yk1|xn1)f\rvayk−11|\rvaxn1(yk−11|xn1) (40) =\expo−sρ(xk,yk)Fk(xn1,yk1)∫\expo−sρ(xk,yk)Fk(xn1,yk1)dyk (41)

where

 Fk(xn1,yk1) \eq\expo−(¯λk(xk1,yk1)−λk(xn1,yk1))∫\expo−\sumfromtoi=k+1n(sρ(xi,yi)+¯λi(xi1,yi1)−λi(xn1,yi1))f\rvayn1(yn1)dynk+1 (42)

These functions can be written recursively as

 Fn(yn1) =f\rvayn1(yn1) (43a) Fk(xn1,yk1) =\expo−(¯λk(xk1,yk1)−λk(xn1,yk1))∫\expo−sρ(xk+1,yk+1)Fk+1(xn1,yk+11)dyk+1 (43b)

In order attain causality in (41), the functions must depend only on and . Since for each , the function does not depend on terms with , the causality constraint is met if and only if we choose in (43b) such that, for each

 Fk(xn1,yk1)=\expo−(¯λi(xk1,yk1)−λi(xn1,yk1))∫\expo−sρ(xk+1,yk+1)Fk+1(xn1,yk+11)dyk+1=˘Fk(xk1,yk1) (44)

for some function .

For , the causality constraint is satisfied automatically since (see (43a)).111 This reflects the fact that there is no need to enforce the causality constraint for , since there are no source samples for time . Suppose now that (44) (i.e., causality) is satisfied for , for some . In such case, one can replace in (44) by and, defining

 Kk+1(xk+11,yk1)\eq∫\expo−sρ(xk+1,yk+1)˘Fk+1(xk+11,yk+11)dyk+1,

write (44) as

 ¯λk(xk1,yk1)−λk(xn1,yk1) =lnKk+1(xn1,yk1)−ln˘Fk(xk1,yk1) (45)

Multiplying both sides by and integrating over we obtain

 0 =∫(¯λk(xk1,yk1)−λk(xn1,yk1))f\rvaxnk+1|\rvaxk1(xnk+1|xk1)dxnk+1 (46) =∫(lnKk+1(xn1,yk1)−ln˘Fk(xk1,yk1))f\rvaxnk+1|\rvaxk1(xn