# Generalized Cramér's coefficient via f-divergence for contingency tables

This study proposes measures describing the strength of association between the row and column variables via the f-divergence. Cramér's coefficient is a possible mechanism for the analysis of two-way contingency tables. Tomizawa et al. (2004) proposed more general measures, including Cramér's coefficient, using the power-divergence. In this paper, we propose more general measures and show some of their properties, demonstrating that the proposed measures are beneficial for comparing the strength of association in several tables.

• 1 publication
• 5 publications
• 3 publications
• 2 publications
12/12/2018

### Divergence measures estimation and its asymptotic normality theory : Discrete case

In this paper we provide the asymptotic theory of the general phi-diverg...
10/05/2020

### Measuring Association on Topological Spaces Using Kernels and Geometric Graphs

In this paper we propose and study a class of simple, nonparametric, yet...
10/30/2019

### An extended class of RC association models: definition, properties and estimation

The extended RC association models introduced in this paper allow the us...
10/30/2019

### An extended class of RC association models: definition and estimation

The class of RC association models introduced in this paper allows the u...
12/24/2019

### Power Comparisons in 2x2 Contingency Tables: Odds Ratio versus Pearson Correlation versus Canonical Correlation

It is an important inferential problem to test no association between tw...
05/13/2021

### Characterizing the Functional Density Power Divergence Class

The density power divergence (DPD) and related measures have produced ma...
02/25/2022

### Venture capital investments through the lens of network and functional data analysis

In this paper we characterize the performance of venture capital-backed ...

## 1 Introduction

Contingency tables and their analysis are critical for a myriad of fields, such as medical, psychology, education, and social science. Accordingly, many coefficients have been proposed to measure the strength of association between the row and column variables, namely to measure the degree of departure from independence. Pearson’s coefficient of mean square contingency and of contingency serve as prime examples. For an contingency table, consider

as the probability that an observation will fall in the

th row and th column of the table . The measure is defined by

 ϕ2 =r∑i=1c∑j=1(pij−pi⋅p⋅j)2pi⋅p⋅j =(r∑i=1c∑j=1p2ijpi⋅p⋅j)−1,

where and . The measure , defined by

 P =(ϕ2ϕ2+1)1/2,

lies between zero and one but cannot always attain the upper limit of one. Furthemore, the maximum value of depends on the number of rows and columns in the table under complete association. Tshuprow’s coefficient (Tschuprow, 1925, 1939) and Cramér’s coefficient (Cramér, 1946) are present to avoid these limitations of (see, e.g., Bishop et al., 2007; Everitt, 1992; Agresti, 2003 etc). Tshuprow’s coefficient is defined by

 T =(ϕ2[(r−1)(c−1)]1/2)1/2,

attaining a value of one in the case of complete association in an table but is unable to do so when . Cramér’s coefficient is defined by

 V2 =ϕ2min(r−1,c−1),

obtaining a value of one for all values of and in the case of complete association. Notably, .

Tomizawa et al. (2004) considered some generalizations of Cramér’s coefficient via power-divergence (Cressie and Read, 1984; Read and Cressie, 1988

). Consider the response variable

and the explanatory variable as the row and column variables, respectively. Tomizawa et al. (2004) proposed the following measure

 V21(λ) =I(λ)({pij};{pi⋅p⋅j})K(λ)1for λ≥0,

where

 I(λ)({pij};{pi⋅p⋅j}) =1λ(λ+1)r∑i=1c∑j=1pij[(pijpi⋅p⋅j)λ−1], K(λ)1 =1λ(λ+1)(r∑i=1p1−λi⋅−1),

and the value at is assumed to be the continuous limit as . When , this measure is expressed as:

 V21(0) =∑ri=1∑cj=1pijlog(pijpi⋅p⋅j)−∑ri=1pi⋅logpi⋅.

is identical to Cramér’s coefficient when with . Furthermore, is the power-divergence between and (see for details, Cressie and Read, 1984; Read and Cressie, 1988). The measure lies between zero and one for . When , the table has a structure of null association (i.e., ), and when , it has a structure of complete association (i.e., for each column ). There uniquely exists such that and for all other , or for all . For each , signifies that the row category of an individual can be predicted perfectly when the column category is known. Conversely, outlines that the row marginal distribution is identical to the conditional row marginal distribution provided that the value of the column category is given. Hence, indicates how much the prediction of the individual row category could be improved if knowledge about the column category is available.

Meanwhile, consider when the and are explanatory and response variables, respectively. Tomizawa et al. (2004) proposed the following measure

 V22(λ) =I(λ)({pij};{pi⋅p⋅j})K(λ)2for λ≥0,

where

 K(λ)2 =1λ(λ+1)(r∑i=1p1−λ⋅j−1).

Notably, is identical to Cramér’s coefficient when with . Moreover, for and highlight how much the prediction of the individual column category could be improved if knowledge about the row category is available.

When the explanatory and response variables are undefined, Miyamoto et al. (2007) proposed the following measure by combining the ideas of both measures and :

 G2(λ) =g−1(12(g(V21(λ))+g(V22(λ))))for λ≥0,

where is the monotonic function. When and ,

is identical to the harmonic mean and the geometric mean of

and , respectively. Accordingly, when with , the measure is consistent with Cramér’s coefficient . As and , has the properties proving it is on the interval . While predicting the values of individual’s categories, indicates how much the prediction could be improved if knowledge about the value of one variable is available. We are now interested in more generalizing the measures using the -divergence.

The rest of this paper is organized as follows. Section proposes a new measure describing the strength of association for two-way contingency tables. Section

presents the approximate confidence intervals of the proposed measures, and Section

demonstrates the relationships between two-way contingency tables made by a bivariate normal distribution and the measures. Lastly, Section

presents the numerical examples.

## 2 Generalized measure

The -divergence between and is defined as:

 If({pij};{qij}) =∑i∑jqijf(pijqij),

where is a convex function on with , , , and (Csiszár and Shields, 2004). Assume as a once-differentiable and strictly convex function.

Consider an contingency table. We assume that when , and when . Measures that present the strength of association between the row and column variables are proposed in three cases: (i) When the row and column variables are response and explanatory variables, respectively. (ii) When those are explanatory and response variables, respectively. (iii) When response and explanatory variables are undefined. We define measures for the asymmetric situation (in the case of (i) and (ii)) and for the symmetric situation (in the case of (iii)).

### 2.1 Case 1

For the asymmetric situation wherein the column variable is the explanatory variable and the row variable is the response variable, the following measure presenting the strength of association between the row and column variables is proposed

 V21(f) =If({pij};{pi⋅p⋅j})K1(f),

where

 If({pij};{pi⋅p⋅j}) =r∑i=1c∑j=1pi⋅p⋅jf(pijpi⋅p⋅j), K1(f) =r∑i=1p2i⋅f(1pi⋅).

Next, the following theorem for the measure is obtained.

###### Theorem 1.

For each convex function ,

1. if and only if there is a structure of null association in the table (i.e., ).

2. if and only if there is a structure of complete association. For each column , there uniquely exists (note that , , may hold when ) such that and for all other (or for all ).

Before proving the theorem for the measure , the following lemma for the proof is introduced.

###### Lemma 1.

Let f be a strictly convex function on and

 f(x)={xg(x)(x>0)0(x=0). (1)

Subsequently, is a strictly monotonically increasing function.

The proof of Lemma 1 is provided in the appendix.

###### Proof of Theorem 1.

The -divergence is first transformed as follows:

 If({pij};{pi⋅p⋅j}) =r∑i=1c∑j=1pij⎛⎜⎝1pijpi⋅p⋅jf(pijpi⋅p⋅j)⎞⎟⎠ =r∑i=1c∑j=1pijg(pijpi⋅p⋅j),

where is given by (1). From Lemma 1, since is the strictly monotonically increasing function, it holds that

 If({pij};{pi⋅p⋅j})≤r∑i=1c∑j=1pijg(1pi⋅)=r∑i=1r∑i=1p2i⋅f(1pi⋅).

Furthermore, from Jensen’s inequality, we have:

 If({pij};{pi⋅p⋅j}) ≥f(r∑i=1c∑j=1pi⋅p⋅jpijpi⋅p⋅j)=0. (2)

Hence, is obtained.

Afterward, follows from the property of the -divergence if a structure of null association is observable (i.e., ). When , it holds that:

 If({pij};{pi⋅p⋅j}) =r∑i=1c∑j=1pi⋅p⋅jf(pijpi⋅p⋅j)=0.

From the equality of (2) holds, we have for all . Thus we obtain from the properties of the -divergence.

Finally, if there uniquely exists for each column such that and for all other , the measure can be expressed as:

 V21(f) =∑ri=1∑cj=1pi⋅pijf(1pi⋅)∑ri=1p2i⋅f(1pi⋅)=1.

Contrariwise, when , we have:

 0 =If({pij};{pi⋅p⋅j})−K1(f) =r∑i=1c∑j=1pij(g(pijpi⋅p⋅j)−g(1pi⋅))

From Lemma , as is the strictly monotonically increasing function, the equality is satisfied if for each column, there is only one such that and for the other all . ∎

Similar to the interpretation of the measure , indicates the degree to which the prediction of the row category of an individual may be improved if knowledge regarding the column category of the individual is available. In this sense, the measure shows the strength of association between the row and column variables. The examples of -divergence are given below. When ,

is identical to the Kullback-Leibler divergence:

 IKL({pij};{pi⋅p⋅j}) =r∑i=1c∑j=1pijlog(pijpi⋅p⋅j).

When , Pearson’s divergence is derived and is identical to the Cramér’s coefficient with , and , it is identical to the power-divergence measure

 ICR({pij};{pi⋅p⋅j}) =1λ(λ+1)r∑i=1c∑j=1pij[(pijpi⋅p⋅j)λ−1],

and is identical to . When for , is identical to the -divergence (Ichimori, 2013),

 Iθ({pij};{pi⋅p⋅j}) =r∑i=1c∑j=1(pij−pi⋅p⋅j)2θpij+(1−θ)pi⋅p⋅j.

Also, this measure is also one of the generalizations of Cramér’s coefficient and is identical to Pearson’s coefficient when .

### 2.2 Case 2

For the asymmetric situation wherein the row variable is the explanatory variable and the column variable is the response variable, we propose the following measure which presents the strength of association between the row and column variables:

 V22(f) =If({pij};{pi⋅p⋅j})K2(f),

where

 Kf =r∑i=1p2⋅jf(1p⋅j).

Therefore, the following theorem is obtained for the measure .

###### Theorem 2.

For each convex function ,

1. if and only if there is a structure of null association in the table (i.e., ).

2. if and only if there is a structure of complete association; namely, for each row , there uniquely exists . Intrestingly, and may hold when such that and for all other (or for all ).

The proof of Theorem 2 is similar to the proof of Theorem 1. Note that when with , is identical to the Cramér’s coefficient , while for , is consistent with .

Similar to the interpretation of the measure , the measure specifies the degree to which prediction of the column category of an individual may be improved if knowledge regarding the row category of the individual is available. Accordingly, the measure relays the strength of association between the row and column variables.

### 2.3 Case 3

In an contingency table wherein explanatory and response variables are undefined, using measures and is inappropriate if we are interested in knowing the degree to what knowledge about the value of one variable can help us predict the value of the other variable. For such an asymmetric situation, the following measure is proposed by combining the ideas of both measures and :

 V23(f) =h−1((w1h(V21(f))+w2h(V22(f)))),

where is the monotonic function and . Then, the following theorem is attained for the measure .

###### Theorem 3.

For each convex function ,

1. if and only if there is a structure of null association in the table (i.e., ).

2. if and only if there is a structure of complete association; namely, (i) when for each row, there uniquely exists where , , such that and for all other (or for all ). (ii) When , there uniquely exists for each column , where , , such that and for all other (or for all ).

###### Proof of Theorem 3.

First, for the weighted average of and ,

 min(h(V21(f)),h(V22(f)))≤w1h(V21(f))+w2h(V22(f))≤max(h(V21(f)),h(V22(f))).

From this correlation, we can show that:

 min(V21(f),V22(f))≤V23(f)≤max(V21(f),V22(f)).

As and originate from Theorems 1 and 2, holds.

Subsequently, if because of emerging from Theorems 1 and 2, is obvious. Conversely, if , then:

 g(0)=w1h(V21(f))+w2h(V22(f))

Notably, is a monotonic function, so the equality is satisfied at . Hence, we obtain via Theorems 1 and 2,.

Besides, is obvious in the case of and , as according to Theorems 1 and 2. Conversely, if , then:

 h(1)=w1h(V21(f))+w2h(V22(f))

As mentioned previously, is the monotonic function, so the equality is satisfied at . Thus, is satisfied under situation and . ∎

We can show that if and , the measure is denoted by

 V2G(f) =If({pij};{pi⋅p⋅j})√K1(f)K2(f) =√V21(f)V22(f),

and if and , the measure is represented by

 V2H(f) =2If({pij};{pi⋅p⋅j})K1(f)+K2(f) =2V21(f)V22(f)V21(f)+V22(f).

Notably, and are the geometric mean and harmonic mean of and , respectively. We confirm that when with , is identical to the Cramér’s coefficient , while for , is consistent with .

For an square contingency table with the same row and column classifications, if and only if the main diagonal cell probabilities in the table are nonzero and the off-diagonal cell probabilities are all zero after interchanging some row and column categories. Therefore, all observations concentrate on the main diagonal cells. While predicting the values of categories of an individual, would specify the degree to which the prediction could be improved if knowledge about the value of one variable is available. In such a sense, the measure also indicates the strength of association between the row and column variables. If only the marginal distributions and are known, consider predicting the values of the individual row and column categories in terms of probabilities with independent structures.

###### Theorem 4.

For any fixed convex functions and monotonic functions ,

1. [label = 0., ref = 0.]

###### Proof of Theorem 4.

The inequality 1 in Theorem 4 has already been validated in the proof of Theorem 3, so it has been omitted. We show the inequality 2. Let and , then it holds that:

 V2H(f) =h−11((w1h!(V21(f))+w2h1(V22(f))))

As is a convex function,

 V2H(f) ≤h−12((w1h2∘h−11∘h1(V21(f))+w2h2∘h−11∘h1(V22(f)))) ≤h−12((w1h2(V21(f))+w2h2(V22(f)))) =V2G(f).

When with , we see that (being Cramér’s coefficient).

## 3 Approximate confidence intervals for measure

Consider that denotes the observed frequency from multinomial distribution, and let denoted the total number of observations, namely, . The approximate standerd error and the large-sample confidence interval are obtained for using the delta method, which is described in e.g. Agresti (2003); Bishop et al. (2007)

, etc. The estimator of

(i.e., ) is given by with replaced by , where . When using the delta method, has a asymptotically normal distribution (i.e., as ) with mean

and variance

. Refer to the Appendix for the values of .

We define as once-differentiable and strictly convex and as a differentive of with respect to x. Assume be with replaced by

. Next, an estimated standard error of

is , and an approximate percent confidence interval of is , where is the upper percentage point from the standard normal distribution.

## 4 Numerical study

In this section, numerical studies based on a bivariate normal distribution are demonstrated. Consider an underlying bivariate normal distribution with means , variances , and correlation . When forming the contingency table, the cutpoints are the () percentage points where the probabilities are equal for each row (column). As an example, Table 1 provides the tables, formed using three cutpoints for each row and column variables, , , and with increment by from to . By applying the measures () setting power-divergence for any and -divergence for any , we consider the degree to which the relationship with (i) the correlation and (ii) the number of rows and columns is predicted.

Table 2 presents the values of the measures for each tables in Table 1. Notably, artificial contingency table is formed by ; thus, . Table 2 shows that when the correlation is away from 0, () are close to . Besides, if and only if the measures show that there is a structure of null association in the table, and and if and only if the measures confirm that there is a structure of complete association.

Table 3 conveys the values of the measures , , and (especially and ) in some artificial data with . In Table 3, as the number of columns increase, increases but decreases for each and . On the other hand, for each and , as the number of rows increase, decreases but increases. The measure combines both measures and to the extent to which knowledge of the value of one variable can help us predict the value of the other variable. Therefore, for each and , as the number of rows or columns increases, and are less, and the values remain the same even if the number of rows and columns are interchanged.

## 5 Examples

In this section, some relevant examples are provided. We apply the measures setting power-divergence for any and -divergence for any . Let us observe the estimates of the measures and the confidence intervals.

### Example 1

Consider the data in Table 4, taken from Andersen (1994). These are data from a Danish Welfare Study which describes the cross-classification of alcohol consumption and social rank. Alcohol consumption in the contingency table is grouped according to the number of ”units” consumed per day. A unit is typically a beer, half a bottle of wine, or 2 cl or 40 alcohol. By applying the measures , we consider to what degree the prediction of alcohol consumption can be improved when the social rank of an individual is known.

Table 5 shows the estimates of the measures, standard errors, and confidence intervals. The confidence intervals for all do not contain zero for any and any . For example, when and , indicates that the strength of association between alcohol consumption and social rank is estimated to be 0.015 times the complete association. Hence, while predicting the value of alcohol consumption of an individual, we can predict it better than when we do not. Table 5 also gives , which considers the case where knowledge regarding only one type of alcohol consumption or social rank is available. From for and , we can predict better when we know the information than when we do not.

### Example 2

Consider the data in Table 6 taken from Read and Cressie (1988)

. These are data on 4831 car accidents, which are cross-classified according to accident type and accident severity. By applying the measures

, we consider the degree to which prediction of the accident severity can be improved when the accident type is known.

Table 7 gives the estimates of the measures, standard errors, and confidence intervals. Likewise, the confidence intervals for all do not contain zero for any and any . For instance, when and , indicates that the strength of the association between accident type and accident severity is estimated to be 0.060 times the complete association. Therefore, while predicting the value of accident severity of an individual, we can predict it better than when we do not. Table 7 also gives , which considers the case where knowledge regarding only one accident type or accident severity is available. From for and , we can predict better when we know the information of the other than when we do not.

When we compare the strength of association between the unaimed vision data in Tables 4 and 6 by using the measure , we can see that the degree is greater for Table 6 than for Table 4 because the values in the confidence interval for are greater for Table 6 than for Table 4.

### Example 3

Consider the data in Table 8 taken from Stuart (1955). This table provides information on the unaided distance vision of 7477 women aged 30 to 39 years and employed in Royal Ordnance factories in Britain from 1943 to 1946. As the right eye grade and the left eye grade have similar classifications, we apply the measure .

Table 9 gives the estimates of the measures, standard errors, and confidence intervals. The confidence intervals for all do not contain zero for any and any . From the values of the estimated measures, when and , the strength of the association between the right eye grade and the left eye grade is estimated to be 0.361 times the complete association. Therefore, when we want to predict a woman’s right and left eye grade, we can predict it better when we use the full distribution than when we use only utilize the marginal distributions and .

## 6 Concluding remarks

In this paper, we have proposed the generalized Cramér’s coefficients via the -divergence. The measures always range between and , independent of the dimensions and and the sample size . Thus, it is useful for comparing the strength of association between the row and column variables in several tables. It has a critical role in checking the relative magnitude of the degree of association between the row and column variables to the degree of complete association. Specifically, would be effective when the row and column variables are the response and explanatory variables, respectively, while would be useful when explanatory and response variables are not defined.

Furthermore, we first need to check if independence is established by using a test statistic, such as Pearson’s chi-squared statistic

, to analyze the strength of association between the row and column variables. Then, if it is determined that there is a structure of the association, the next step would be to measure the strength of the association by using . However, if it is determined that the table is independent, employed may not be meaningful. Furthermore, the measure is invariant under any permutation of the categories. Therefore, we can apply it to the analysis of data on a normal or ordinal scale.

We observe that (i) the estimate of the strength of association needs to be considered in terms of an approximate confidence interval for the measure rather than itself; (ii) the measure helps to describe relative magnitudes (of the strength of association) rather than absolute magnitudes.

## Acknowledgments

This work was supported by JSPS Grant-in-Aid for Scientific Research (C) Number JP20K03756.

## References

• Agresti (2003) Agresti, A. (2003). Categorical data analysis. John Wiley & Sons.
• Andersen (1994) Andersen, E. B. (1994). The statistical analysis of categorical data. Springer Science & Business Media.
• Bishop et al. (2007) Bishop, Y. M., Fienberg, S. E., and Holland, P. W. (2007).

Discrete multivariate analysis: theory and practice

.