I Introduction
The information RDF (IRDF) for a given onesided random source process can be defined as the infimum of the mutual information rate [gray11, Section 8.2]
(1) 
between source and reconstruction such that a given fidelity criterion does not exceed a distortion value [berger71, covtho06, gray11]. If one adds to this definition the restriction that the decoder output can only depend causally upon the source, one obtains what is known as the causal [neugil82, derost12], nonanticipative [gorpin73, pingor87, chasta14] or sequential IRDF [tatiko03, tatsah04, tankim15]. All these are equivalent and will be denoted as , defined in terms of the mutual information [gray11, covtho06] as
(2) 
where the infimum is taken over all joint distributions of
givensuch that the causality Markov chains (which will be referred to as the
short causality constraint)(3) 
hold and which yield distortion not greater than , for some fidelity criterion. Notice that, if one is given a twosided random source process instead, and one is interested only in encoding and reconstructing the samples , then the causality constraints may be stated as
(4) 
as done in [gorpin73, pingor87, derost12]. This notion of causality will be referred to as the long causality constraint.
The motivation for considering in this work onesided instead of twosided sequences (and thus (3) instead of (4)) arises from the aim of building encoderdecoder systems which operate with zero delay (the same motivation behind the causality constraint). To see this, notice that the causality constraint (4) for twosided sources corresponds to the situation in which source samples in the infinite past exist and are available to the encoder. This may require an infinite delay before actually beginning to encode and decode. By contrast, the causality constraint (3) describes the case when the source is a onesided process and depends only upon (as in [chasta14, woolin17]).
Remark 1.
It is important to highlight at this point that even though the causality condition (3) can also be applied to a twosided source process , it would not ensure causality in that case. To see why, consider the situation in which is a binary i.i.d. source where each takes the values or
with equal probability. Suppose
is built as , where denotes the exclusive “OR” operator. It is easy to see that satisfies (3), even though depends noncausally on .The above observation reveals that if the source is two sided but only the samples are encoded and the decoded process is onesided (), then one needs to impose instead the (more general) causality constraint
(5) 
which implies (3). Besides causality, these Markov chains guarantee that even if the source is a twosided process, its encoding and reconstruction proceeds as if it were a onesided process.
As we shall see in sections III and IV, this situation, where at time the encoder can take only as input, entails significant challenges due to the unavoidable need to deal with transient phenomena.
The operational significance of stems from its relation to the causal operational RDF (ORDF), denoted as . The latter is defined as the infimum of the average datarates which are achievable by a sequence of causal encoderdecoder functions [neugil82, derost12] yielding a distortion not greater than . Characterizing is important because every zerodelay source code (suitable for applications such as lowdelay streaming [chenwu15] or networked control [naifag07, silder16]) must be causal.
An IRDF is said to be achievable if it equals the ORDF under the same constraints [berger71, covtho06]. As far as the authors are aware, the achievability of has not been demonstrated yet, for any source and distortion measure, and thus the gap between and is unknown in general. However, it is known that [derost12, Section II]
(6) 
and for Gaussian sources it is possible to construct causal codes with an operational data rate exceeding by less than (approximately) 0.254 bits/sample (1.254 bits/sample for zerodelay codes), once the statistics which realize the latter are known [derost12]. This underlines the importance of studying the causal IRDF .
To the best of the authors’ knowledge, no closedform expressions are known for , except when considering meansquarederror (MSE) distortion and for Gaussian i.i.d. or Gaussian autoregressive (AR)1 sources, either scalar [derost12, Section IV]
or vector valued
[staost17]^{1}^{1}1Although for i.i.d. sources and for a singleletter distortion criterion a realization of the (noncausal) RDF satisfies causality [covtho06, berger71], the formulas available in the literature for expressing it require numerical iterative procedures and cannot be regarded as “closedform” except for the Gaussian case and MSE distortion.. However, there exist various structural properties of the causal IRDF that have been found in literature when admits (or is assumed to admit) a stationary realization.Indeed, the stationarity of the realizations of the causal IRDF has played a crucial role in simplifying the computation of for Gaussian 1st order Markovian sources and MSE distortion in [tanaka15]. It has also been a key implicit assumption in [tatsah04], and an explicit assumption in works such as [chasta14] and [derost12]. In particular, for a stationary twosided random source , [derost12, Definition 6] introduced the stationary causal IRDF
(7) 
where the infimum is taken over all distributions of given which yield a onesided reconstruction processes jointly stationary with , satisfying (4) and an asymptotic average MSE distortion constraint on . For the case of a Gaussian source, it was shown in [derost12] that an operational datarate exceeding by less than bits/sample was achievable using a entropycoded subtractively dithered uniform quantizer (ECSDUQ) surrounded by linear timeinvariant (LTI) filters operating in steady state. These examples illustrate the relevance of determining whether (or in which cases) the causal IRDF admits a stationary realization.
To the best of our knowledge, the only work which has given an answer to this question in a general framework is [gorpin73]. Under a set of assumptions (discussed in Section II below), it is shown in [gorpin73, Theorem 4] that the search for the causal IRDF for a large class of twosided sources and distortion criteria can be restricted to reconstructions which are jointly stationary with the source. Unfortunately, as we show in Section IIB, the assumptions on the fidelity criteria utilized in [gorpin73] leave out some common distortions (such as the family of asymptotic average singleletter fidelity criteria), and the statement of [gorpin73, Theorem 4] contains an assumption whose validity has to be proved. More importantly, the entire analysis of [gorpin73] is built for twosided processes (using the causality constraint (4)), which opens the question of whether its results could apply to onesided processes as well, with the causality constraint (3).
In this paper we give an answer to these questions and use the results to prove some novel properties of the causal IRDF associated with the stationarity of its realizations. Specifically, our main contributions are the following:

We show in Theorem 2 that if a pair of onesided random processes is jointly stationary, with the latter depending causally on the former according to (5) (but otherwise arbitrarily distributed), then it must also satisfy the Markov chains
(8) which is a fairly restrictive condition. In particular, as we show in Theorem 3, if are jointly Gaussian and depends causally upon , then joint stationarity implies is an i.i.d. or 1storder Markovian process. This stands in stark contrast with what was shown in [gorpin73] for twosided stationary processes and constitutes a counterexample of what is stated in [stakou15, Theorem III.6].

Despite the above, we show in Theorem 4 that for any th order Markovian onesided stationary source and a large class of distortion constraints, the search for the causal IRDF (as defined in (2)) can be restricted to output sequences causally related to the source and jointly stationary with it after samples, and such that . We refer to such pairs of processes as being quasijointly stationary (QJS) (this notion is formally introduced in Definition 2 below). A consequence of this result is that for any th order twosided Markovian stationary source , equals for the corresponding onesided stationary source . The relevance of this finding is that for Gaussian stationary sources and asymptotic MSE distortion, an operational data rate exceeding (and thus ) by less than approximately bits/sample, when operating causally, and bit/sample, in zerodelay operation, is achievable by using a scalar ECSDUQ as in [derost12].
The remainder of this paper begins with Section II, in which the assumptions leading to [gorpin73, Theorem 4] are revisited and the limitations of that theorem are discussed. In Section III we prove that, in general, it is not possible to have two onesided processes which are jointly stationary and, at the same time, satisfy the causality constraint (3). Section IV presents our main theorem (Theorem 4), which shows that the search for the causal IRDF for onesided th order Markovian stationary sources can be restricted to QJS processes. Finally, Section LABEL:sec:Conclusions draws the main conclusions of this work. All proofs are presented in section LABEL:sec:appendix (the Appendix), which also contains some technical lemmas required by these proofs.
Notation
denotes the real numbers, denotes the integers, is the set of natural numbers (positive integers), and . For every , the ceiling operator yields the smallest integer not less than
. We use nonitalic letters for scalar random variables, such as
. Random sequences are denoted as . For a random (onesided) process we will sometimes use the shorthand notation wherever this meaning is clear from the context. When convenient, we write a random sequence , , as the column vector (the indices and are swapped so that the smallest index goes above the largest one, thus mimicking the usual index order in a column vector). The entry on the th row and th column of a matrix is denoted as , with being the submatrix of containing its rows to , .For a random element in a given alphabet (set) , we write to denote a sigmaalgebra associated with and
to denote its probability distribution (or probability measure). We write
to describe the fact that has the same probability distribution as , and to state that and are independent. We write the condition in which two random elements are independent given a third random element using the Markov chain notation . If is a set of probability distributions, then denotes the set of all random elements whose probability distribution belongs to . The expectation operator is denoted as . We write as a shorthand for . The mutual information between two random elements is defined as [gray11, Lemma 7.14](9) 
where the supremum is over all quantizers and of and , and , and , are the joint and marginal distributions of and , respectively. If have joint and marginal probability density functions (PDFs) , and , respectively, then [covtho06]
The conditional mutual information
is defined via the chainrule (cr) of mutual information
. The mutual information rate between two processes and is defined as in (1). The variance of a realvalued random variable
is denoted as . The autocorrelation function of a random process is denoted , , .The following properties of the mutual information involving any random elements will be utilized and referred to throughout this work:
P 1.
, with equality if and only if .
P 2.
, with equality if and only if .
We will also make use of the following fact:
Fact 1.
Let be three random elements with an arbitrary joint distribution. Then, there exists a random element (equivalently, a joint distribution ) such that
(10)  
(11) 
Ii Revisiting [gorpin73] and its Inapplicability to OneSided Sources
In order to assess whether (or to what extent) [gorpin73, Theorem 4] could provide support to the stationarity assumptions made in, e.g. [tatsah04, dersil08, chasta14, stakou15], it is necessary to take a closer look at the assumptions made in [gorpin73] and the statement of its Theorem 4. For that purpose, the first part of this section is an exposition of the definitions and assumptions leading to [gorpin73, Theorem 4].^{2}^{2}2We believe this reexposition of [gorpin73] to be valuable in itself since on the one hand, it selects the minimal set of notions required to formulate and understand its Theorem 4, and on the other hand, it provides an arguably clearer presentation than the one found in [gorpin73] (an English translation from Russian), which is not easy to read due to its notation, some mathematical typos and the low resolution of its available digitized form. The second part is an analysis which reveals the limitations of [gorpin73, Theorem 4] and its inapplicability to the case in which the source and reconstruction are onesided processes. At the same time, this section also introduces definitions and part of the notation to be utilized in the remainder of this paper (for convenience, a summary of these is presented in Table I below).
Iia A Brief Review of [gorpin73]
Throughout [gorpin73], the search in the infimizations associated with various types of “nonanticipatory” (i.e., causal) ratedistortion functions is stated over sets of joint probability distributions between source and reconstruction (as opposed to the usual definitions, in which the search is over conditional distributions, see (2) and [covtho06, Chapter 10], [berger71]). Since the distribution of the source is given, it is required that for every , all the joint distributions to be considered yield having the same (given) distribution of the source for the corresponding block, say . This requirement can be formalized as requiring that , for a set of admissible joint distributions defined as
(12) 
where and are, respectively, the alphabets to which and belong. In [gorpin73], this admissibility requirement is embedded in the definition of the sets of distributions which meet the distortion constraint, described next.
The fidelity criterion for every pair of integers^{3}^{3}3The analysis in [gorpin73] considered both discrete and continuoustime processes, but here we only refer to the discretetime scenario. is expressed in [gorpin73] as requiring to belong to a nonempty set of distributions (hereafter referred to as distortionfeasible set) , a condition written as . In this definition, the number represents an admissible distortion level. Notice that such general formulation of a fidelity criteria does not need a distortion function and does not necessarily involve an expectation.
As mentioned above, the admissibility requirement is expressed in the distortionfeasible sets in [gorpin73, eqn. (2.1)]. The latter equation can be written as
(13) 
In [gorpin73, eqs. (2.4) and (2.5)], the distortionfeasible sets are assumed to satisfy the “concatenation” condition
(14) 
With this, [gorpin73, eqn. (2.9)] defined the “nonanticipatory epsilon entropy” of the set of distributions^{4}^{4}4The actual term employed in [gorpin73] is “nonanticipatory epsilon entropy of the message ” where the term “message” refers to the random ensembles in . as
(15) 
where the infimum is taken over all pairs of random sequences such that the causality Markov chains
(16) 
are satisfied. Then [gorpin73, eq. (2.13)] defines the “nonanticipatory message generation rate” as
(17) 
(when the limit exists). An alternative “nonanticipatory message generation rate” is also considered in [gorpin73] by defining the set of distortionadmissible process distributions as follows:
Definition 1.
The set consists of all twosided random process pairs for which there exist integers such that and
(18) 
With this, [gorpin73, eq. (2.12)] defines
(19) 
(when the limit exists), where the infimum is taken over all pairs of processes satisfying the causality Markov chains
(20) 
Notice that these Markov chains imply (4) and differ from the latter in that here the reconstruction is a twosided random process.
Now assume that and , for all , for some alphabets and . Define, for any given nonnegative sequence , such that , the distribution
(21) 
We can now restate Theorem 4 in [gorpin73] as follows:
Theorem 1 (Theorem 4 in[gorpin73]).
Suppose that

is stationary.

Stationary distortionfeasible sets: For every , , and are identical sets.^{5}^{5}5In [gorpin73] this condition together with the stationarity of is referred to as “a stationary source” (see its description between (2.8) and (2.9) in [gorpin73]).

The concatenation condition (14) holds.

.

For every set of nonnegative numbers , such that ,
(22) where the processes are distributed according to (21).
Then, the analysis of the lower bound in (19) can be confined to jointly stationary pairs of random processes satisfying the causality constraint (20).
For convenience, Table I presents a summary of the definitions and notation described so far, together with some which will be defined in the following sections.
The joint probability distribution of  

The set of all joint distributions such that the associated marginal distribution equals the given distribution of the source sequence , i.e., (see (12)).  
Distortionfeasible set. The set of all joint distributions which satisfy a given constraint given by (see comments before (12)).  
The set of all pairs of sequences such that . (See also the Notation subsection at the end of Section I.)  
Generic distortionfeasible set of probability distributions for pairs of onesided processes . In this paper, we state some minimal conditions on in Assumption 1 and some additional structural properties in Assumption 2.  
,  The set of all joint distributions of pairs of onesided random processes such that are jointly stationary and (see Definition 2). 
and  The sets of causally related onesided pairs of sequences (see Definition 3). 
The set of onesided pairs of processes causally related according to the short causality constraint (3) (see Definition 3).  
The set of causal distributions for processes of the form . Such processes satisfy the long causality constraint (4) (see Definition LABEL:def:Overline_RCitd_redef). 
IiB Analysis of Theorem 1 and its Inapplicability to OneSided Sources
We now discuss three limitations of Theorem 1 which are relevant when trying to establish whether the causal IRDF of a onesided stationary source admits a stationary realization.
Limitation 1
The first obvious limitation is that even if source and reconstruction are twosided processes, every distortion criterion which considers only their “positivetime” part cannot be expressed by a distortionfeasible set given by Definition 1 if the sets satisfy condition ii) in Theorem 1. To see this, notice that if , then such distortion criterion (which neglects nonpositive times) would require to admit all joint probability distributions satisfying (13). Combining this with condition ii) in Theorem 1 yields that every set with , which amounts to imposing no restriction on the distortion at all.
It is natural to think that such elemental shortcoming could be avoided by simply replacing condition ii) in Theorem 1 by a onesided version of the form:
For every , such that : and are identical sets.  (23) 
Leaving aside the fact that this alternative condition is not sufficient for Theorem 1 to hold, it is worth pointing out that using (23), the commonly utilized family of asymptotic singleletter fidelity criteria [berger71] can not be expressed by a distortionfeasible set given by Definition 1, as the following lemma shows (its proof can be found in Appendix LABEL:proof:of_lem_Wsp_D_cannot_represent_asymptotic_SLDC).
Lemma 1.
Let be any given distortion functional which takes as argument a joint distribution and yields a nonnegative real value. Let be the set of all pairs of processes where is stationary, with pairwise distributions which satisfy the asymptotic singleletter fidelity criterion
(24) 
Then, there doesn’t exist an infinite collection of distortionfeasible sets satisfying (23) such that the associated given by Definition 1 satisfies .
Limitation 2
The second limitation associated with Theorem 1 is that its application requires one to prove its condition iv), i.e., the unproven supposition that , holds. The only work we are aware of which builds upon Theorem 1 is [stakou15], and, accordingly, [stakou15] provides [stakou15, Theorem III.5], which states that a similar equality holds. Unfortunately, as shown in [derpic17], the proof of [stakou15, Theorem III.5] is flawed.
Limitation 3
The third limitation of Theorem 1 for its applicability to onesided sources is the fact that the entire framework built in [gorpin73] is stated for twosided processes (and, crucially, for the corresponding causality restriction given by Markov chain (20)). This difference cannot be simply neglected while expecting Theorem 1 to remain valid. Indeed, as we show in the next section (Theorem 2), a pair of random processes can be jointly stationary and at the same time satisfy the causality Markov chain (3) only if is independent of when is given. Moreover, we prove that joint stationarity and causality are incompatible when the source is a th order Markovian Gaussian onesided process with .
Iii Conditions for Joint Stationarity and Causality to Hold Together
In this section we address the question of whether there exists a onesided reconstruction process jointly stationary with a source and which also satisfies the causality constraint (3).
Each source random sample belongs to some given set (source alphabet) and is allowed to have an arbitrary distribution. Recall that a random process , where is the reconstruction alphabet and , is said to be jointly stationary with if and only if, for every , the distribution of does not depend on , for .
The next theorem shows that, for such onesided processes, jointstationarity and causality may hold together only if is independent of when is given.
Theorem 2.
If and are jointly stationary and is causally related to according to (3), then
(25) 
Proof.
To illustrate how restrictive condition (25) is, the next theorem shows that, for a Gaussian th order Markovian stationary source , causality and joint stationarity is possible only if is i.i.d. () or . Recall that a random (vector or scalar valued) process is th order Markovian if is the smallest nonnegative integer such that
(27) 
Theorem 3.
Suppose is a zeromean Gaussian stationary process, and assume that, for some , are jointly Gaussian and jointly stationary, with being causally related to according to (3). Then is th order Markovian with .
Proof.
Since and are jointly Gaussian and the latter depends causally upon the former, it holds that
(28) 
for some lower triangular matrix having entries . On the other hand, the fact that and are jointly stationary implies that and are Toeplitz matrices. From (28), considering the entries on the first and second rows of and defining
this Toeplitz condition implies that
Therefore, , which for a Gaussian stationary sequence implies that , . For Gaussian random variables the latter is equivalent to the Markov chains , which defines a 1st order Markovian process (if ) or an i.i.d. process (if ). This completes the proof. ∎
Iv The Set of QuasiJointly Stationary Realizations is Sufficient
In this section we show that for any th order Markovian onesided stationary source the search for the causal IRDF (as defined in (2) and for a large class of distortion criteria) can be restricted to output sequences causally related to the source, jointly stationary with it after samples, and such that . We refer to such pairs of processes as being quasijointly stationary (QJS), and define the set which contains them as follows:
Definition 2 (Set of quasijointly stationary process).
The set of QJS distributions is composed of all joint distributions of pairs of onesided random processes which satisfy
are jointly stationary  
Notice that corresponds to the set of joint distributions associated with all jointly stationary onesided process pairs.
As in [gorpin73], we write when the distribution of belongs to the distortionfeasible set , defined as in (13).
One can define a distortionfeasible set for pairs of onesided processes , say , from the finitelength distortionfeasible sets , in more than one manner. A minimal condition we shall require for such definition is the following.
Assumption 1.
The distortionfeasible set of distributions for pairs of onesided processes satisfies the following:

If , then has the given probability distribution of the source process, say . That is, (see (12)).

If is any given pair of onesided processes, and there exists an infinite collection of increasing integers such that, for all , , then .

For any pair of sequences , , and if , then the concatenated processes , satisfy .
Notice that if satisfies this assumption and if the integers in Definition 1 were restricted to be positive, then we would have (see Definition 1). However, the oneway implications in Assumption 1 allow to be larger than .
We now define the sets of causally related pairs of sequences and processes.
Definition 3 (Set of Causal Distributions).
Define