In the literature, there are various measures available to quantify the strength of dependence between two random variables, including the Pearson correlation coefficient, the correlation ratio, the maximal correlation coefficient, etc. The Pearson correlation coefficient is such a well-known measure that quantifies the linear dependence between two real-valued random variables. For real-valued random variablesand , it is defined as
The correlation ratio was introduced by Pearson (see e.g. ), and studied by Rényi [15, 14]. For a real-valued random variable and an arbitrary random variable , the correlation ratio of on is defined by
where the supremum is taken over all Borel-measurable real-valued functions such that . It was shown that
Another related dependence measure is Hirscbfeld-Gebelein-Rényi maximal correlation (or simply maximal correlation), which measures the maximum possible (Pearson) correlation between square integrable real-valued random variables generated by either of two random variables. For two arbitrary random variables and , the maximal correlation of and is defined by
where the supremum is taken over all Borel-measurable real-valued functions such that . This measure was first introduced by Hirschfeld  and Gebelein , then studied by Rényi , and recently it has been exploited to study some interesting problems in information theory, such as measuring non-local correlations , maximal correlation secrecy , deriving a converse result for distributed communication , etc. Furthermore, the maximal correlation also indicates the existence of Gács-Körner’s or Wyner’s common information [7, 17].
be a probability space. Let
be a real-valued random vector, wheredenotes the Borel -algebra on . For a random variable (or random vector) , we denote the probability measure (a.k.a. distribution) as . If is discrete, then we use to denote the probability mass function (pmf). If is absolutely continuous (the distribution is absolutely continuous respect to the Lebesgue measure), then we use
to denote the probability density function (pdf).
In the following, we define several conditional correlations, including the conditional (Pearson) correlation, conditional correlation ratio, and conditional maximal correlation.
The conditional (Pearson) correlation111Here does not need to be real-valued. But for brevity, we assume it is. Similarly, in the following, does not need to be real-valued in the definition of conditional correlation ratio, and does not need to be real-valued in the definition of conditional maximal correlation. of and given is defined by
The conditional correlation ratio of on given is defined by
where the supremum is taken over all Borel-measurable real-valued functions such that .
The conditional maximal correlation of and given is defined by
where the supremum is taken over all Borel-measurable real-valued functions such that , .
If is degenerate, then these three conditional correlations reduce to the unconditional versions.
Note that and , but in general . That is, the conditional correlation and conditional maximal correlation are symmetric, but the conditional correlation ratio is not.
By the definitions, it is easy to verify that
where the supremum is taken over all Borel-measurable real-valued functions such that .
Note that the unconditional versions of correlation coefficient, correlation ratio, and maximal correlation were well studied in the literature; see [15, 14]. The conditional version of maximal correlation was first introduced by Ardestanizadeh et al. . Later Beigi and Gohari  used it to study the problem of non-local correlations. In this paper, we study these conditional correlations, especially the conditional maximal correlation, and derive some useful properties. Furthermore, to state our results clearly, we also need to define event conditional correlations as follows.
Given an event , denote the conditional distribution of given as . Assume is a pair of random variables satisfying . Then we define as the event conditional correlations of and given , where and denotes the corresponding unconditional correlation of and .
Obviously, event conditional correlations are special cases of corresponding conditional correlations. Moreover, if the distribution of is the same as the conditional distribution of given , then the unconditional correlations of respectively equal the corresponding event conditional correlations of given , i.e., where . If the distribution of satisfies for some , then the conditional correlations of given respectively equal the corresponding event conditional correlations of given , i.e., where .
3.1 Basic Properties: Other Characterizations, Continuity, and Concavity
In this subsection, we provide other characterizations for the conditional correlation ratio and conditional maximal correlation, and then study continuity (or discontinuity) and concavity properties of the conditional maximal correlation. First by the definitions, we have the following basic properties.
For any random variables , the following inequalities hold.
Next we characterize the conditional correlation ratio and conditional maximal correlation by ratios of variances.
(Characterization by the ratio of variances). For any random variables , the following properties hold.
The correlation ratio is also closely related to the Minimum Mean Square Error (MMSE). The optimal MMSE estimator is, hence the variance of the MMSE for estimating given is
The unconditional version of Theorem 2 was proven by Rényi . Theorem 2 can be proven by a proof similar to that in , and hence omitted here. Next we characterize conditional correlations by event conditional versions.
(Characterization by event conditional correlations). For any random variables
where and respectively denote the essential supremum and the essential infimum of .
It is worth noting that does not hold in general. This can be seen from the following example. Assume are three numbers such that . Suppose that is a pair of random variables such that and . (It is obvious that there are many random variable pairs satisfying the conditions.) Denote the distribution of as . Now we consider a triple of random variables such that and and . Then we have . Hence . However, . Hence for this example.
is an absolutely continuous random variable, then
where denotes the pdf of .
Finally, we prove (6). Similarly as in the proof above, we denote and . Hence for any ; for any ; and . Therefore, to prove (6), we only need to show . On one hand, by derivations similar as (8)-(11), we can upper bound as follows.
On the other hand, we assume is a function such that for each , where and . The existence of follows from the definition of . According to the definition of , we have that , and for each ,
Set . Then
Combining the two points above, we have . ∎
For discrete with finite supports, without loss of generality, the supports of and are assumed to be and , respectively. For this case, denote
as the second largest singular value of the matrixwith entries
For absolutely-continuous , denote as the second largest singular value of the bivariate function , where denotes a conditional pdf of respect to . Then we have the following singular value characterization of conditional maximal correlation.
(Singular value characterization). Assume are discrete random variables with finite supports, or absolutely-continuous random variables such that a.s. Then
This property is consistent with the one of the unconditional version by setting to a constant, i.e.,
The unconditional version of this theorem was proven in . That is, for discrete with finite supports, equals the second largest singular value of the matrix with entries ; for absolutely-continuous such that with denoting a pdf of , equals the second largest singular value of the bivariate function . Combining this with Theorem 3, we have (14). ∎
Note that, is a mapping that maps a distribution to a real number in . Now we study the concavity of such a mapping.
(Concavity). Given , is concave in . That is, for any distributions and , and any , , where denotes the conditional maximal correlation under distribution .
This theorem directly follows from the characterization in (6). ∎
For a discrete random variable, the distribution is uniquely determined by its pmf. Therefore, for discrete random variables , can be also seen as a mapping that maps a pmf to a real number in . Assume are three finite sets. Denote as the set of pmfs defined on (i.e., the dimensional probability simplex). Consider as a mapping . Now we study the continuity (or discontinuity) of such a mapping.
(Continuity and discontinuity). For finite sets , is continuous (under the total variation distance) on . But in general, is discontinuous at such that .
For a pmf , . On the other hand, singular values are continuous in the matrix (see [9, Corollary 8.6.2]), hence is continuous in . Furthermore, since , in the total variation distance sense implies and . Therefore, , where denotes the conditional maximal correlation under distribution . However, if there exists such that and . Then letting in a direction such that always holds, we have that always holds. This implies is discontinuous at . ∎
1) For any random variables ,
2) Moreover, if and only if and are conditionally independent given . Furthermore, for discrete random variables with finite supports, if and only if for some functions and such that (i.e., and have conditional Gács-Körner common information given ), where denotes the conditional entropy of given .
Next we show that the three conditional correlations are equal for the Gaussian case.
(Gaussian case). For jointly Gaussian random variables , we have
also follows jointly Gaussian distribution, andfor any . Hence
Furthermore, both and are between and . Hence (15) holds. ∎
3.2 Other Properties: Tensorization, DPI, Correlation ratio equality, and Conditioning reducing covariance gap
The tensorization property and the data processing inequality for the unconditional maximal correlation were proven in[17, Thm. 1] and [20, Lem. 2.1] respectively. Here we extend them to the conditional case.
(Tensorization). Assume given with and , is a sequence of pairs of conditionally independent random variables, then we have
The unconditional version
where (18) follows by the following lemma.
Assume is a countable set, and is an arbitrary distribution on . Then for any function , we have
This lemma follows from the following two points. For a number , assume satisfies that . Then for any function ,
Since is arbitrary, we have .
On the other hand, denote . Then by the definition of , we have for all . Hence by the union bound, we have . Furthermore, for a number , implies that there exists an such that . Hence
By the definition of , we have . Since is arbitrary, we have . ∎