In this essay we comment on Collins’ (2002, 2011) statement ”correspondence analysis makes you blind”. Collins stated it twice as a reply to Whitlark and Smith (2001) and Bock (2011a), who analyzed two different brand image count data sets by correspondence analysis (CA). Both Whitlark and Smith (2002) and Bock (2011b) in their reply to Collins (2002, 2011) accepted without hesitation the insightful and original observations described by Collins. Beh and Lombardo (2014, p.131-132) provide a cursory report on these dialogues.
|Table 1: WS brand-attribute count table.|
|Table 2: Seriated WS brand-attribute data structure uncovered by Collins.|
1.1 WS brand-attribute count data
Table 1, from Whitlark and Smith (2001), shows a brand-attribute count data. Whitlark and Smith (2001) analyzed it by CA resulting in a map, Figure 1, very similar to the Taxicab CA (TCA) map in Figure 2; they interpreted Figure 1 using the adjusted chi-square residuals by arguing that it corrects ”inaccuracies introduced by the dimensionality problem”. TCA is a l variant of CA, see Choulakian (2006). The interpretation of Figures 1 and 2 does not correspond to the simple seriated structure uncovered by Collins (2002) given in Table 2, where the rows and columns of Table 1 are permuted according to the row and column averages in descending order. Collins interpreted the seriated structure in Table 2 as:
”a relief map of a plain sloping steeply down from north (FedEx) to south (Brand C) and less steeply from west (Trusted) to east (Essential). On the plain would be a molehill (Nokia is more innovative). But isn’t the table easy enough?”
Note that this simple structure uncovered by Collins is not found in Figures 1 and 2. Why? Lemma 1 provides a partial answer: The brands and the attributes are not independent, because of the existence of multiple ’molehill’s, the most important molehill being (Nokia is more innovative) as mentioned by Collins. The molehills (in bold in Table 2) represent positive associations, and can be analyzed and visualized in a complementary form in Figures (1 and 4) or in Figures (2 and 3). Figures 3 and 4 are TCA and CA contribution plots, which help us to interpret Figures 1 and 2. This is a summary of the paper.
1.2 Issues raised by Collins
Collins (2002) summarized his arguments against CA in Whitlark and Smith (2001) paper as:
”The popularity of CA and other techniques seems to arise from two feelings: The data are too complex to be handled by the human brain, and pictures communicate more than tables. In fact, the analysis of brand ”image” data like that given by the authors is easy, and quantitative patterns are best communicated through simple tables supported by words. A picture like the correspondence ”map” shown by W&S may say something qualitative about patterns in the data, but it says nothing quantitative or testable. Not even, for example, that the two dimensions of the map account for nearly 90% of the variance of the data, as reported in the text.”
The highlighted parts are ours. There are two main points raised by Collins.
First, qualitative patterns in CA map Figure 1 do not reflect quantitative patterns in Table 2. Friendly and Kwan (2011) divided statisticians (data analysts) in two categories: graph-people and table-people. Clearly Collins belongs to the group of table-people. In this paper, we will show that both, tables and maps, are needed for global and local analysis. Additionally, in section 2 we will discuss the important notions of quality and quantity in data analysis, based on Benzécri’s reflections.
Second, the interpretation of a CA map is difficult and its relation with the % of inertia (variance) explained. Here, we have to discuss two interrelated issues in CA. a) It concerns the interpretation of CA maps, which as we mentioned, was raised by Whitlark and Smith (2001); then clearly stated by Bock (2011b, pp. 587–588), and further discussed by Bock (2017) in a R-Blog titled ” How to Interpret Correspondence Analysis Plots (It Probably Isn’t the Way You Think)”; in the R-blog are discussed nine complex issues for the interpretation of CA maps. Beh and Lombardo (2014, p.132) summarize it as: in CA ”the inability of the principal coordinate to provide a meaningful interpretation of the distance between a row and column point in these plots”. Greenacre (2013) also proposed contribution biplots to tackle this issue. b) In this paper, we will distinguish between the intrinsic quality and the extrinsic quality of a principal dimension, and introduce indices that quantify the intrinsic quality of a principal axis. In CA maps ”variance accounted for” reflects the extrinsic quality. The intrinsic quality of a principal dimension examines the four quadrants of the residual cross-covariance matrix. We tackle these issues by considering CA and TCA as a stepwise Hotelling/Tucker decomposition of the residual cross-covariance matrix of the row and column categories into four quadrants.
1.3 Hotelling and Tucker decompositions
In mathematics, a data set for and can be interpreted as three kinds of mapping, see Benzécri (1973, p.56) and Choulakian (2016a). First, as a linear mapping: X: R R second, as a linear mapping: X: R R third, as a bilinear mapping X: (R R
R. Hotelling’s (1933) principal components analysis (PCA) is developed within the first two settings, while Hotelling’s (1936) canonical correlation analysis is developed within the third setting, whereX represents a cross-covariance matrix. Benzécri (1973) emphasized the development of CA within the first two settings as a weighted PCA method. In this paper, we shall emphasize the third setting.
It is well known that CA is a particular kind of Hotelling’s canonical correlation analysis, see for instance Goodman (1991), where the two sets of variables are the indicator sets of the categories of the two nominal variables. Another method, similar in perspective to canonical correlation analysis is Tucker’s (1958) interbattery analysis, which maximizes the covariance measure between the linear combination of the two indicator sets of quantitative variables. When CA did not produce interpretable maps of contingency tables, Tenenhaus and Augendre (1996) proposed the Tucker interbattery analysis as an alternative to CA.
The parameters in Hotelling’s canonical correlation and Tucker’s covariance analyses are generally estimated by singular value decomposition (SVD). When we estimate the parameters by TaxicabSVD (TSVD) introduced by Choulakian (2006) in place of SVD, surprisingly we notice that these two analyses complement each other because they are linearly related, see equation (17). For further details, see Choulakian, Simonetti and Gia (2014).
Figure 3, named TCov map, displays taxicab interbattery analysis map of WS data. Figures 2 and 3 (TCA and TCov maps) are different (similarly Figures 1 and 4 (CA and CA contribution maps) are different): thus they provide different information to us, sometimes confusing (for instance observe the positions of the brands B, E and F in Figures 2 and 3). One of the major novelties of this paper is that, we interpret Figure 3, the taxicab interbattery analysis TCov map, as TCA contribution map, and consequently we provide a new perspective on the interpretation of the associated TCA map Figure 2 via Lemma 6. For the interpretation of the row and column labels on the TCov map, we shall use a quantification of the intrinsic quality of a principal dimension, named quality of signs of residuals (QSR) index; which will be complimented by a look at the seriated residual covariance matrix. Then we extend the development of these ideas to CA; where we also discuss sparse contingency tables having the quasi-two blocks diagonal structure, which, according to Benzécri (1973, p.188-190), is quite common. Greenacre (2013) introduced and discussed CA contribution biplots, but did not relate them to CA maps; Lemma 6 accomplishes this task.
This paper is organized as follows: Section 2 sketches Benzécri’s reflections on quality and quantity in data analysis. Section 3 presents an overview of taxicab singular value decomposition (TSVD); section 4 presents preliminaries. In section 5 we develop the main subject matter, the quantification of the intrinsic quality of a principal dimension in CA and TCA. Section 6 presents applications. Finally we conclude in section 7.
2 Quantity and Quality
Benzécri (1982, 1988) has two papers on quality and quantity; in the second he discussed the relationship between quantity and quality historically, starting with Aristotle and finishing it with his description within the philosophy of data analysis, aka CA framework. Here we quote from Benzécri (1988, section 1.7):
”Pour l’analyse des données, nous retenons d’abord, suivant Aristote, que ”le caractère propre de la quantité qu’on peut lui attribuer l’égal et l’inégal”, tandis que, ”semblabe ou dissemblable se dit uniquement des qualités”. De ce point de vue, une description multidimensionnelle est toujours qualitative même si elle comporte que des variables numériques précises, parceque la multiplicité des descriptions possibles est telle qu’on rencontrera jamais d’égales, mais seulement de semblables”.
The following two definitions and the corollary provide a succinct summary of the quote.
Definition 1 (Aristotle on quantity): X is a quantitative variable if, given two realizations and of X, then either or
Definition 2 (Aristotle on quality): X is a qualitative variable if, given two realizations and of X, then either is similar to or is dissimilar to
Corollary 3 (Benzécri): Any multidimensional description is
always qualitative even though its components are precisely numerical.
Similar ideas also are expressed in a forward essay by Benzécri in Murtagh (2005).
Benzécri’s schematic conceptual formulation of data analysis is the following directed diagram
The first step: QualitydataQuantityfactors is done by dimension reduction. The nature of each factor (latent variable) is quantitative and there are almost always more than one factor. Even though each latent variable is quantitative, but its interpretation is qualitative: According to Benzécri (1988, section 1.3, in comments on Descartes): ”Toute qualité n’est que l’expression d’un rapport de quantités”; that is, quality is the expression of a ratio of quantities. For interpretation of a principal dimension, we apply Aristotle/Benzécri principle. Aristotle in his book PHYSICS defined ”principles are contraries” and cited as examples taken from his predecessors ”hot and cold”, ”the rare and the dense” and ”plenum and void” see Aristotle (1960, p.14). In CA, Benzécri (1973, p.227) following Aristotle based the interpretation of a principal dimension on contraries (dichotomies, oppositions) and gradations, where an opposition or a gradation represent a latent variable. In another context, Choulakian (2014, 2016b) used Euclid’s principle of contradiction for interpretation of the first principal dimension for the analysis of rank data.
The second step: QuantityfactorsQuality
clusters) is done by usual methods such as k-means.
Murtagh (2005, section 1.1), described Benzécri’s paradigm “a tale of three metrics”; which clearly characterizes the diagram where : the chi-squared and the Euclidean metrics are for the first step, and the ultrametrics for the second step. A similar description to the above diagram is also stated by De Leeuw (2005). This fact also is reflected in the first printed work, Benzécri (1973), titled DATA ANALYSIS; which is composed of two volumes: The first volume’s subtitle is La Taxonomie; the second volume’s subtitle is Analyse des Correspondances.
3 An overview of taxicab singular value decomposition
Consider a matrix of size and . Taxicab singular value decomposition (TSVD) of X is a decomposition similar to SVD of X; see Choulakian (2006, 2016a).
In TSVD the calculation of the dispersion measures , principal axes ( and principal scores for is done in an stepwise manner. We put and be the residual matrix at the -th iteration.
The variational definitions of the TSVD at the -th iteration are
The -th principal axes are
-th principal vectors are
Furthermore the following relations are also useful
where is the coordinatewise sign function, if and if
The -th taxicab dispersion measure can be represented in many different ways
The -th residual correspondence matrix is
An interpretation of the term in (5) is that, it represents the best rank-1 approximation of the residual correspondence matrix , in the sense of taxicab norm.
Thus TSVD of X corresponds to
a decomposition similar to SVD, but where the vectors for are conjugate, a weaker property than orthogonality. That is
In TSVD, the calculation of the principal component weights, and and the principal scores, and can be accomplished by two algorithms. The first one is based on complete enumeration based on equation (1). The second one is based on iterating the transition formulae (2,3). This is an ascent algorithm; that is, it increases the value of the objective function at each iteration, see Choulakian (2006, 2016a). The iterative algorithm could converge to a local maximum; so it should be restarted from several initial configurations. The rows or the columns of the data can be used as starting values.
Let of size be the associated correspondence matrix of a contingency table N, where . We define as usual , the vector the vector , and the diagonal matrix having diagonal elements and similarly We suppose that and are positive definite metric matrices of size and , respectively; this means that the diagonal elements of and are strictly positive. Let
be the residual matrix with respect to the independence model. is the cross-covariance between the categories of the th nominal row variable and the th nominal column variable.
The independence assumption can also be interpreted in another way as
which can be reexpressed as
this is the row and column homogeneity models. Benzécri (1973, p.31) named the vector ( for and fixed) the profile of the th column; and the element the density function of the probability measure with respect to the product measure . The element is named Pearson ratio in Goodman (1996) and Beh and Lombardo (2014, p.123).
4.1 Estimation of the parameters by SVD
Suppose the independence assumption is not true, then each of the two equivalent model formulations (8,10) can be generalized to explain the nonindependence by adding bilinear terms, where .
a) Cov (cross-covariance) decomposition:
This is an interbattery analysis proposed by Tucker (1958). Tenenhaus and Augendre (1996) estimated the parameters in (11) by singular value decomposition (SVD) of the matrix The parameters in (11) satisfy the following equations
b) CA (correspondence analysis) decomposition
This decomposition has many interpretations. Essentially, for data analysis purposes Benzécri (1973) interpreted it as weighted principal components analysis of row and column profiles. Another useful interpretation, comparable to Tucker interbattery analysis, is Hotelling(1936)’s correlation analysis, see Lancaster (1958) and Goodman (1991). The parameters in (14) satisfy the following equations
The above two decompositions given in (11) and (14) are cross-covariance based. There are also association (log ratio) based decompositions see Goodman (1991, 1996) or Greenacre and Lewi (2009).
4.2 Estimation of the parameters by TSVD
First, we estimate the parameters ( in (11) by TSVD; then the parametrs in (14) will be linearly related by
The parameters and in (11) are the principal coordinates of the TCov decomposition and they satisfy
Similarly, the parameters and in (14) are the principal coordinates of the TCA decomposition and they satisfy
Let be the th residual correspondence matrix, where
Similarly, let be the th residual density matrix, where
Let be an optimal binary partition of , and be an optimal binary partition of such that and Besides (18), the taxicab dispersion will additionally be related to the TCov principal coordinates and in (11) by the following useful equations:
Equations (22, 23) follow from the fact that for is a double-centered matrix, see Choulakian and Abou-Samra (2020). The quantification of the intrinsic quality of a principal dimension is based on (23).
4.3 An observation
The TCov principal coordinates, and are uniformly weighted, see equation (18); meanwhile TCA principal coordinates, and are marginally weighted, see equation (19). What is the consequence to this? The answer to this question is: Benzécri’s principle of distributional equivalence, which states that CA (and TCA) results are not changed if two proportional columns or rows are merged into one. This has the practical consequence that the effective size of sparse and large data sets can be smaller than the observed size; for further details concerning sparse contingency tables see Choulakian (2017).
5 Main developments
Let L be a permutation matrix such that the coordinates of are in decreasing order, for Similarly, M be a permutation matrix such that the coordinates of are in decreasing order, for
We consider the matrix
We have the following easily proved result
Lemma 4: Let A necessary condition for the independence model, or is that
Remark: Relations (25 and 26) characterize Robinson matrices used for seriation of artifacts or sites in archeology. That is why we named see Table 2, seriated contingency table following its seriated row and column marginals.
Lemma 5: TSVD of is equivalent to TCov( TSVD of
Lemma 6 states that the th row TCA (or CA) principal factor score is the weighted covariance of the th residual density function with the -th principal axis ; where is the ith row of and is the jth column of
Lemma 6: In CA and TCA
where represents the -th standardized principal axis in each method. In CA, and In TCA, and for , see equations (3 and 17).
Proof: Here, we provide a proof for TCA. We use the transition formula (2) for
which is the required result (27).
Remark: In CA, due to (16), equation (27), similarly (28), can further be simplified to
a well known result in Bastin et. al. (1980, p.157) or Goodman(1991, p. 1105, eq. A.1.3).
5.1 Quantifying the intrinsic quality of a taxicab principal axis
Within the Euclidean framework a measure of the quality of a principal dimension is the proportion (or percentage) of the residual variance explained (or inertia in the case of CA)
This is an extrinsic measure of quality, because it compares the dispersion of a principal axis with the residual dispersion . In the above equation replacing the l terms by the corresponding l terms, we obtain the measure of intrinsic quality expressed in Definition 7.
Let be an optimal binary partition of , and similarly be an optimal binary partition of for the th principal dimension Thus the data set is divided into four quadrants. We define a new index showing the quality of signs of the residuals (QSR) in each quadrant of the th residual cross-covariance matrix for in (20)
Definition 7: For the measure of the quality of signs of the residuals in the quadrant is
Similarly, a quantification of the quality of signs of the optimal cut of dimension is
Remark: The computation of the elements of are done easily in the following way. We note that the th principal axis can be written as
where and similarly
where and , and designates a column vector of 1’s of size So
To interpret the above indices, we recall from elementary probability theory the definition of association between two events by defining an index of associationfor and .
a) When , then the th category of the row variable and the th category of the column variable are not associated (independent).
b) When , then the th category of the row variable and the th category of the column variable are attractively or positively associated; that is, the event (i,j) occurs more than by chance.
c) When , then the th category of the row variable and the th category of the column variable are repulsively or negatively associated; that is, the event (i,j) occurs less than by chance.
Based on these, the interpretation of the indices becomes evident: for instance, measures the intensity of the attractive association between the subsets and while measures the intensity of the repulsive association between the subsets and
Allard et al. (2020) used the QSR index to choose between two competing methods of data analysis, TCA and taxicab log-ratio analysis of contingency tables and compositional data.
We have the following easily proved result
Lemma 8: a) For if and only if
The proof of part c, is based on the arithmetic-harmonic means inequality which states that for four strictly positive real numbersand
equality is attained when .
5.2 Quantifying the intrinsic quality of a principal axis in CA
Let be the th residual density matrix in CA,
where and represent the th standardized principal axis coordinates in CA.
Let be the th residual cross-covariance matrix in CA obtained from (29),
Let be an optimal binary principal axis partition of , and similarly be an optimal principal axis partition of by CA. Thus the residual covariance matrix is divided into four quadrants: and Based on the observation that both and are double centered, we can quantify the intrinsic quality of CA principal dimension by replacing by in subsection 5.1, and obtain measures.
Definition 9: For the CA measure of the quality of signs of the residuals in the quadrant is
for and and, and . Similarly the CA measure of the quality of signs of principal dimension is