1 Introduction
Contingency tables and their analysis are critical for a myriad of fields, such as medical, psychology, education, and social science. Accordingly, many coefficients have been proposed to measure the strength of association between the row and column variables, namely to measure the degree of departure from independence. Pearson’s coefficient of mean square contingency and of contingency serve as prime examples. For an contingency table, consider
as the probability that an observation will fall in the
th row and th column of the table . The measure is defined bywhere and . The measure , defined by
lies between zero and one but cannot always attain the upper limit of one. Furthemore, the maximum value of depends on the number of rows and columns in the table under complete association. Tshuprow’s coefficient (Tschuprow, 1925, 1939) and Cramér’s coefficient (Cramér, 1946) are present to avoid these limitations of (see, e.g., Bishop et al., 2007; Everitt, 1992; Agresti, 2003 etc). Tshuprow’s coefficient is defined by
attaining a value of one in the case of complete association in an table but is unable to do so when . Cramér’s coefficient is defined by
obtaining a value of one for all values of and in the case of complete association. Notably, .
Tomizawa et al. (2004) considered some generalizations of Cramér’s coefficient via powerdivergence (Cressie and Read, 1984; Read and Cressie, 1988
). Consider the response variable
and the explanatory variable as the row and column variables, respectively. Tomizawa et al. (2004) proposed the following measurewhere
and the value at is assumed to be the continuous limit as . When , this measure is expressed as:
is identical to Cramér’s coefficient when with . Furthermore, is the powerdivergence between and (see for details, Cressie and Read, 1984; Read and Cressie, 1988). The measure lies between zero and one for . When , the table has a structure of null association (i.e., ), and when , it has a structure of complete association (i.e., for each column ). There uniquely exists such that and for all other , or for all . For each , signifies that the row category of an individual can be predicted perfectly when the column category is known. Conversely, outlines that the row marginal distribution is identical to the conditional row marginal distribution provided that the value of the column category is given. Hence, indicates how much the prediction of the individual row category could be improved if knowledge about the column category is available.
Meanwhile, consider when the and are explanatory and response variables, respectively. Tomizawa et al. (2004) proposed the following measure
where
Notably, is identical to Cramér’s coefficient when with . Moreover, for and highlight how much the prediction of the individual column category could be improved if knowledge about the row category is available.
When the explanatory and response variables are undefined, Miyamoto et al. (2007) proposed the following measure by combining the ideas of both measures and :
where is the monotonic function. When and ,
is identical to the harmonic mean and the geometric mean of
and , respectively. Accordingly, when with , the measure is consistent with Cramér’s coefficient . As and , has the properties proving it is on the interval . While predicting the values of individual’s categories, indicates how much the prediction could be improved if knowledge about the value of one variable is available. We are now interested in more generalizing the measures using the divergence.The rest of this paper is organized as follows. Section proposes a new measure describing the strength of association for twoway contingency tables. Section
presents the approximate confidence intervals of the proposed measures, and Section
demonstrates the relationships between twoway contingency tables made by a bivariate normal distribution and the measures. Lastly, Section
presents the numerical examples.2 Generalized measure
The divergence between and is defined as:
where is a convex function on with , , , and (Csiszár and Shields, 2004). Assume as a oncedifferentiable and strictly convex function.
Consider an contingency table. We assume that when , and when . Measures that present the strength of association between the row and column variables are proposed in three cases: (i) When the row and column variables are response and explanatory variables, respectively. (ii) When those are explanatory and response variables, respectively. (iii) When response and explanatory variables are undefined. We define measures for the asymmetric situation (in the case of (i) and (ii)) and for the symmetric situation (in the case of (iii)).
2.1 Case 1
For the asymmetric situation wherein the column variable is the explanatory variable and the row variable is the response variable, the following measure presenting the strength of association between the row and column variables is proposed
where
Next, the following theorem for the measure is obtained.
Theorem 1.
For each convex function ,


if and only if there is a structure of null association in the table (i.e., ).

if and only if there is a structure of complete association. For each column , there uniquely exists (note that , , may hold when ) such that and for all other (or for all ).
Before proving the theorem for the measure , the following lemma for the proof is introduced.
Lemma 1.
Let f be a strictly convex function on and
(1) 
Subsequently, is a strictly monotonically increasing function.
The proof of Lemma 1 is provided in the appendix.
Proof of Theorem 1.
The divergence is first transformed as follows:
where is given by (1). From Lemma 1, since is the strictly monotonically increasing function, it holds that
Furthermore, from Jensen’s inequality, we have:
(2) 
Hence, is obtained.
Afterward, follows from the property of the divergence if a structure of null association is observable (i.e., ). When , it holds that:
From the equality of (2) holds, we have for all . Thus we obtain from the properties of the divergence.
Finally, if there uniquely exists for each column such that and for all other , the measure can be expressed as:
Contrariwise, when , we have:
From Lemma , as is the strictly monotonically increasing function, the equality is satisfied if for each column, there is only one such that and for the other all . ∎
Similar to the interpretation of the measure , indicates the degree to which the prediction of the row category of an individual may be improved if knowledge regarding the column category of the individual is available. In this sense, the measure shows the strength of association between the row and column variables. The examples of divergence are given below. When ,
is identical to the KullbackLeibler divergence:
When , Pearson’s divergence is derived and is identical to the Cramér’s coefficient with , and , it is identical to the powerdivergence measure
and is identical to . When for , is identical to the divergence (Ichimori, 2013),
Also, this measure is also one of the generalizations of Cramér’s coefficient and is identical to Pearson’s coefficient when .
2.2 Case 2
For the asymmetric situation wherein the row variable is the explanatory variable and the column variable is the response variable, we propose the following measure which presents the strength of association between the row and column variables:
where
Therefore, the following theorem is obtained for the measure .
Theorem 2.
For each convex function ,


if and only if there is a structure of null association in the table (i.e., ).

if and only if there is a structure of complete association; namely, for each row , there uniquely exists . Intrestingly, and may hold when such that and for all other (or for all ).
The proof of Theorem 2 is similar to the proof of Theorem 1. Note that when with , is identical to the Cramér’s coefficient , while for , is consistent with .
Similar to the interpretation of the measure , the measure specifies the degree to which prediction of the column category of an individual may be improved if knowledge regarding the row category of the individual is available. Accordingly, the measure relays the strength of association between the row and column variables.
2.3 Case 3
In an contingency table wherein explanatory and response variables are undefined, using measures and is inappropriate if we are interested in knowing the degree to what knowledge about the value of one variable can help us predict the value of the other variable. For such an asymmetric situation, the following measure is proposed by combining the ideas of both measures and :
where is the monotonic function and . Then, the following theorem is attained for the measure .
Theorem 3.
For each convex function ,


if and only if there is a structure of null association in the table (i.e., ).

if and only if there is a structure of complete association; namely, (i) when for each row, there uniquely exists where , , such that and for all other (or for all ). (ii) When , there uniquely exists for each column , where , , such that and for all other (or for all ).
Proof of Theorem 3.
First, for the weighted average of and ,
From this correlation, we can show that:
We can show that if and , the measure is denoted by
and if and , the measure is represented by
Notably, and are the geometric mean and harmonic mean of and , respectively. We confirm that when with , is identical to the Cramér’s coefficient , while for , is consistent with .
For an square contingency table with the same row and column classifications, if and only if the main diagonal cell probabilities in the table are nonzero and the offdiagonal cell probabilities are all zero after interchanging some row and column categories. Therefore, all observations concentrate on the main diagonal cells. While predicting the values of categories of an individual, would specify the degree to which the prediction could be improved if knowledge about the value of one variable is available. In such a sense, the measure also indicates the strength of association between the row and column variables. If only the marginal distributions and are known, consider predicting the values of the individual row and column categories in terms of probabilities with independent structures.
Theorem 4.
For any fixed convex functions and monotonic functions ,

[label = 0., ref = 0.]


Proof of Theorem 4.
When with , we see that (being Cramér’s coefficient).
3 Approximate confidence intervals for measure
Consider that denotes the observed frequency from multinomial distribution, and let denoted the total number of observations, namely, . The approximate standerd error and the largesample confidence interval are obtained for using the delta method, which is described in e.g. Agresti (2003); Bishop et al. (2007)
, etc. The estimator of
(i.e., ) is given by with replaced by , where . When using the delta method, has a asymptotically normal distribution (i.e., as ) with meanand variance
. Refer to the Appendix for the values of .We define as oncedifferentiable and strictly convex and as a differentive of with respect to x. Assume be with replaced by
. Next, an estimated standard error of
is , and an approximate percent confidence interval of is , where is the upper percentage point from the standard normal distribution.4 Numerical study
In this section, numerical studies based on a bivariate normal distribution are demonstrated. Consider an underlying bivariate normal distribution with means , variances , and correlation . When forming the contingency table, the cutpoints are the () percentage points where the probabilities are equal for each row (column). As an example, Table 1 provides the tables, formed using three cutpoints for each row and column variables, , , and with increment by from to . By applying the measures () setting powerdivergence for any and divergence for any , we consider the degree to which the relationship with (i) the correlation and (ii) the number of rows and columns is predicted.
Table 2 presents the values of the measures for each tables in Table 1. Notably, artificial contingency table is formed by ; thus, . Table 2 shows that when the correlation is away from 0, () are close to . Besides, if and only if the measures show that there is a structure of null association in the table, and and if and only if the measures confirm that there is a structure of complete association.
Table 3 conveys the values of the measures , , and (especially and ) in some artificial data with . In Table 3, as the number of columns increase, increases but decreases for each and . On the other hand, for each and , as the number of rows increase, decreases but increases. The measure combines both measures and to the extent to which knowledge of the value of one variable can help us predict the value of the other variable. Therefore, for each and , as the number of rows or columns increases, and are less, and the values remain the same even if the number of rows and columns are interchanged.
5 Examples
In this section, some relevant examples are provided. We apply the measures setting powerdivergence for any and divergence for any . Let us observe the estimates of the measures and the confidence intervals.
Example 1
Consider the data in Table 4, taken from Andersen (1994). These are data from a Danish Welfare Study which describes the crossclassification of alcohol consumption and social rank. Alcohol consumption in the contingency table is grouped according to the number of ”units” consumed per day. A unit is typically a beer, half a bottle of wine, or 2 cl or 40 alcohol. By applying the measures , we consider to what degree the prediction of alcohol consumption can be improved when the social rank of an individual is known.
Table 5 shows the estimates of the measures, standard errors, and confidence intervals. The confidence intervals for all do not contain zero for any and any . For example, when and , indicates that the strength of association between alcohol consumption and social rank is estimated to be 0.015 times the complete association. Hence, while predicting the value of alcohol consumption of an individual, we can predict it better than when we do not. Table 5 also gives , which considers the case where knowledge regarding only one type of alcohol consumption or social rank is available. From for and , we can predict better when we know the information than when we do not.
Example 2
Consider the data in Table 6 taken from Read and Cressie (1988)
. These are data on 4831 car accidents, which are crossclassified according to accident type and accident severity. By applying the measures
, we consider the degree to which prediction of the accident severity can be improved when the accident type is known.Table 7 gives the estimates of the measures, standard errors, and confidence intervals. Likewise, the confidence intervals for all do not contain zero for any and any . For instance, when and , indicates that the strength of the association between accident type and accident severity is estimated to be 0.060 times the complete association. Therefore, while predicting the value of accident severity of an individual, we can predict it better than when we do not. Table 7 also gives , which considers the case where knowledge regarding only one accident type or accident severity is available. From for and , we can predict better when we know the information of the other than when we do not.
Example 3
Consider the data in Table 8 taken from Stuart (1955). This table provides information on the unaided distance vision of 7477 women aged 30 to 39 years and employed in Royal Ordnance factories in Britain from 1943 to 1946. As the right eye grade and the left eye grade have similar classifications, we apply the measure .
Table 9 gives the estimates of the measures, standard errors, and confidence intervals. The confidence intervals for all do not contain zero for any and any . From the values of the estimated measures, when and , the strength of the association between the right eye grade and the left eye grade is estimated to be 0.361 times the complete association. Therefore, when we want to predict a woman’s right and left eye grade, we can predict it better when we use the full distribution than when we use only utilize the marginal distributions and .
6 Concluding remarks
In this paper, we have proposed the generalized Cramér’s coefficients via the divergence. The measures always range between and , independent of the dimensions and and the sample size . Thus, it is useful for comparing the strength of association between the row and column variables in several tables. It has a critical role in checking the relative magnitude of the degree of association between the row and column variables to the degree of complete association. Specifically, would be effective when the row and column variables are the response and explanatory variables, respectively, while would be useful when explanatory and response variables are not defined.
Furthermore, we first need to check if independence is established by using a test statistic, such as Pearson’s chisquared statistic
, to analyze the strength of association between the row and column variables. Then, if it is determined that there is a structure of the association, the next step would be to measure the strength of the association by using . However, if it is determined that the table is independent, employed may not be meaningful. Furthermore, the measure is invariant under any permutation of the categories. Therefore, we can apply it to the analysis of data on a normal or ordinal scale.We observe that (i) the estimate of the strength of association needs to be considered in terms of an approximate confidence interval for the measure rather than itself; (ii) the measure helps to describe relative magnitudes (of the strength of association) rather than absolute magnitudes.
Acknowledgments
This work was supported by JSPS GrantinAid for Scientific Research (C) Number JP20K03756.
References
 Agresti (2003) Agresti, A. (2003). Categorical data analysis. John Wiley & Sons.
 Andersen (1994) Andersen, E. B. (1994). The statistical analysis of categorical data. Springer Science & Business Media.

Bishop et al. (2007)
Bishop, Y. M., Fienberg, S. E., and Holland, P. W. (2007).
Discrete multivariate analysis: theory and practice
. Springer Science & Business Media.  Cramér (1946) Cra