Quantification of intrinsic quality of a principal dimension in correspondence analysis and taxicab correspondence analysis

08/24/2021
by   Vartan Choulakian, et al.
0

Collins(2002, 2011) raised a number of issues with regards to correspondence analysis (CA), such as: qualitative information in a CA map versus quantitative information in the relevant contingency table; the interpretation of a CA map is difficult and its relation with the % of inertia (variance) explained. We tackle these issues by considering CA and taxicab CA (TCA) as a stepwise Hotelling/Tucker decomposition of the cross-covariance matrix of the row and column categories into four quadrants. The contents of this essay are: First, we review the notion of quality/quantity in multidimensional data analysis as discussed by Benzécri, who based his reflections on Aristotle. Second, we show the importance of unravelling the interrelated concepts of dependence/heterogeneity structure in a contingency table; and to picture them two maps are needed. Third, we distinguish between intrinsic and extrinsic quality of a principal dimension; the intrinsic quality is based on the signs of the residuals in the four quadrants, hence to the interpretability. Furthermore, we provide quantifications of the intrinsic quality and use them to uncover structure in particular in sparse contingency tables. Finally, we emphasize the importance of looking at the residual cross-covariance values at each iteration.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

11/01/2020

Comments on "correspondence analysis makes you blind"

Collins' (2002) statement "correspondence analysis makes you blind" foll...
02/19/2021

Correlation Based Principal Loading Analysis

Principal loading analysis is a dimension reduction method that discards...
10/12/2021

Tangent Space and Dimension Estimation with the Wasserstein Distance

We provide explicit bounds on the number of sample points required to es...
09/11/2020

TCA and TLRA: A comparison on contingency tables and compositional data

There are two popular general approaches for the analysis and visualizat...
03/11/2021

Overlap of OLS Regression and Principal Loading Analysis

Principal loading analysis is a dimension reduction method that discards...
02/26/2020

Correspondence Analysis between the Location and the Leading Causes of Death in the United States

Correspondence Analysis analyzes two-way or multi-way tables withe each ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In this essay we comment on Collins’ (2002, 2011) statement ”correspondence analysis makes you blind”. Collins stated it twice as a reply to Whitlark and Smith (2001) and Bock (2011a), who analyzed two different brand image count data sets by correspondence analysis (CA). Both Whitlark and Smith (2002) and Bock (2011b) in their reply to Collins (2002, 2011) accepted without hesitation the insightful and original observations described by Collins. Beh and Lombardo (2014, p.131-132) provide a cursory report on these dialogues.

Table 1: WS brand-attribute count table.
Attribute
Company innovative leader solution rapport efficient relevant essential trusted
Oracle 155 157 109 133 151 96 35 170
Nokia 375 350 274 318 351 284 91 408
Fedex 476 675 550 669 748 627 307 754
A 86 66 105 110 117 76 30 122
B 30 21 25 37 40 20 9 43
C 18 12 11 16 17 12 2 18
D 25 23 33 36 34 28 12 35
E 21 20 21 26 27 18 9 36
F 190 307 305 332 355 309 131 392
G 18 16 16 25 21 18 10 29
H 408 549 467 551 613 523 239 624
I 143 225 194 191 206 184 121 248

Table 2: Seriated WS brand-attribute data structure uncovered by Collins.
Attribute
Company trusted efficient rapport leader relevant solution innovative essential average
Fedex 754 748 669 675 627 550 476 307 601
H 624 613 551 549 523 467 408 239 497
Nokia 408 351 318 350 284 274 375 91 306
F 392 355 332 307 309 305 190 131 290
I 248 206 191 225 184 194 143 121 189
Oracle 170 151 133 157 96 109 155 35 126
A 122 117 110 66 76 105 86 30 89
D 35 34 36 23 28 33 25 12 28
B 43 40 37 21 20 25 30 9 28
E 36 27 26 20 18 21 21 9 22
G 29 21 25 16 18 16 18 10 19
C 18 17 16 12 12 11 18 2 13
average 240 223 204 202 183 176 162 83 184

1.1 WS brand-attribute count data

Table 1, from Whitlark and Smith (2001), shows a brand-attribute count data. Whitlark and Smith (2001) analyzed it by CA resulting in a map, Figure 1, very similar to the Taxicab CA (TCA) map in Figure 2; they interpreted Figure 1 using the adjusted chi-square residuals by arguing that it corrects ”inaccuracies introduced by the dimensionality problem”. TCA is a l variant of CA, see Choulakian (2006). The interpretation of Figures 1 and 2 does not correspond to the simple seriated structure uncovered by Collins (2002) given in Table 2, where the rows and columns of Table 1 are permuted according to the row and column averages in descending order. Collins interpreted the seriated structure in Table 2 as:

”a relief map of a plain sloping steeply down from north (FedEx) to south (Brand C) and less steeply from west (Trusted) to east (Essential). On the plain would be a molehill (Nokia is more innovative). But isn’t the table easy enough?”

Note that this simple structure uncovered by Collins is not found in Figures 1 and 2. Why? Lemma 1 provides a partial answer: The brands and the attributes are not independent, because of the existence of multiple ’molehill’s, the most important molehill being (Nokia is more innovative) as mentioned by Collins. The molehills (in bold in Table 2) represent positive associations, and can be analyzed and visualized in a complementary form in Figures (1 and 4) or in Figures (2 and 3). Figures 3 and 4 are TCA and CA contribution plots, which help us to interpret Figures 1 and 2. This is a summary of the paper.

Figure 1: CA map of WS data.
Figure 2: TCA map of WS data.
Figure 3: TCOV (TCA contribution map of WS data.
Figure 4: CA contribution map of WS data.

1.2 Issues raised by Collins

Collins (2002) summarized his arguments against CA in Whitlark and Smith (2001) paper as:

”The popularity of CA and other techniques seems to arise from two feelings: The data are too complex to be handled by the human brain, and pictures communicate more than tables. In fact, the analysis of brand ”image” data like that given by the authors is easy, and quantitative patterns are best communicated through simple tables supported by words. A picture like the correspondence ”map” shown by W&S may say something qualitative about patterns in the data, but it says nothing quantitative or testable. Not even, for example, that the two dimensions of the map account for nearly 90% of the variance of the data, as reported in the text.”

The highlighted parts are ours. There are two main points raised by Collins.

First, qualitative patterns in CA map Figure 1 do not reflect quantitative patterns in Table 2. Friendly and Kwan (2011) divided statisticians (data analysts) in two categories: graph-people and table-people. Clearly Collins belongs to the group of table-people. In this paper, we will show that both, tables and maps, are needed for global and local analysis. Additionally, in section 2 we will discuss the important notions of quality and quantity in data analysis, based on Benzécri’s reflections.

Second, the interpretation of a CA map is difficult and its relation with the % of inertia (variance) explained. Here, we have to discuss two interrelated issues in CA. a) It concerns the interpretation of CA maps, which as we mentioned, was raised by Whitlark and Smith (2001); then clearly stated by Bock (2011b, pp. 587–588), and further discussed by Bock (2017) in a R-Blog titled ” How to Interpret Correspondence Analysis Plots (It Probably Isn’t the Way You Think)”; in the R-blog are discussed nine complex issues for the interpretation of CA maps. Beh and Lombardo (2014, p.132) summarize it as: in CA ”the inability of the principal coordinate to provide a meaningful interpretation of the distance between a row and column point in these plots”. Greenacre (2013) also proposed contribution biplots to tackle this issue. b) In this paper, we will distinguish between the intrinsic quality and the extrinsic quality of a principal dimension, and introduce indices that quantify the intrinsic quality of a principal axis. In CA maps ”variance accounted for” reflects the extrinsic quality. The intrinsic quality of a principal dimension examines the four quadrants of the residual cross-covariance matrix. We tackle these issues by considering CA and TCA as a stepwise Hotelling/Tucker decomposition of the residual cross-covariance matrix of the row and column categories into four quadrants.

1.3 Hotelling and Tucker decompositions

In mathematics, a data set for and can be interpreted as three kinds of mapping, see Benzécri (1973, p.56) and Choulakian (2016a). First, as a linear mapping: X: R R second, as a linear mapping: X: R R third, as a bilinear mapping X: (R R

R. Hotelling’s (1933) principal components analysis (PCA) is developed within the first two settings, while Hotelling’s (1936) canonical correlation analysis is developed within the third setting, where

X represents a cross-covariance matrix. Benzécri (1973) emphasized the development of CA within the first two settings as a weighted PCA method. In this paper, we shall emphasize the third setting.

It is well known that CA is a particular kind of Hotelling’s canonical correlation analysis, see for instance Goodman (1991), where the two sets of variables are the indicator sets of the categories of the two nominal variables. Another method, similar in perspective to canonical correlation analysis is Tucker’s (1958) interbattery analysis, which maximizes the covariance measure between the linear combination of the two indicator sets of quantitative variables. When CA did not produce interpretable maps of contingency tables, Tenenhaus and Augendre (1996) proposed the Tucker interbattery analysis as an alternative to CA.

The parameters in Hotelling’s canonical correlation and Tucker’s covariance analyses are generally estimated by singular value decomposition (SVD). When we estimate the parameters by TaxicabSVD (TSVD) introduced by Choulakian (2006) in place of SVD, surprisingly we notice that these two analyses complement each other because they are linearly related, see equation (17). For further details, see Choulakian, Simonetti and Gia (2014).

Figure 3, named TCov map, displays taxicab interbattery analysis map of WS data. Figures 2 and 3 (TCA and TCov maps) are different (similarly Figures 1 and 4 (CA and CA contribution maps) are different): thus they provide different information to us, sometimes confusing (for instance observe the positions of the brands B, E and F in Figures 2 and 3). One of the major novelties of this paper is that, we interpret Figure 3, the taxicab interbattery analysis TCov map, as TCA contribution map, and consequently we provide a new perspective on the interpretation of the associated TCA map Figure 2 via Lemma 6. For the interpretation of the row and column labels on the TCov map, we shall use a quantification of the intrinsic quality of a principal dimension, named quality of signs of residuals (QSR) index; which will be complimented by a look at the seriated residual covariance matrix. Then we extend the development of these ideas to CA; where we also discuss sparse contingency tables having the quasi-two blocks diagonal structure, which, according to Benzécri (1973, p.188-190), is quite common. Greenacre (2013) introduced and discussed CA contribution biplots, but did not relate them to CA maps; Lemma 6 accomplishes this task.

1.4 Organisation

This paper is organized as follows: Section 2 sketches Benzécri’s reflections on quality and quantity in data analysis. Section 3 presents an overview of taxicab singular value decomposition (TSVD); section 4 presents preliminaries. In section 5 we develop the main subject matter, the quantification of the intrinsic quality of a principal dimension in CA and TCA. Section 6 presents applications. Finally we conclude in section 7.

2 Quantity and Quality

Benzécri (1982, 1988) has two papers on quality and quantity; in the second he discussed the relationship between quantity and quality historically, starting with Aristotle and finishing it with his description within the philosophy of data analysis, aka CA framework. Here we quote from Benzécri (1988, section 1.7):

”Pour l’analyse des données, nous retenons d’abord, suivant Aristote, que ”le caractère propre de la quantité qu’on peut lui attribuer l’égal et l’inégal”, tandis que, ”semblabe ou dissemblable se dit uniquement des qualités”. De ce point de vue, une description multidimensionnelle est toujours qualitative même si elle comporte que des variables numériques précises, parceque la multiplicité des descriptions possibles est telle qu’on rencontrera jamais d’égales, mais seulement de semblables”.

The following two definitions and the corollary provide a succinct summary of the quote.

Definition 1 (Aristotle on quantity): X is a quantitative variable if, given two realizations and of X, then either or

Definition 2 (Aristotle on quality): X is a qualitative variable if, given two realizations and of X, then either is similar to or is dissimilar to

Corollary 3 (Benzécri): Any multidimensional description is always qualitative even though its components are precisely numerical.

Similar ideas also are expressed in a forward essay by Benzécri in Murtagh (2005).

Benzécri’s schematic conceptual formulation of data analysis is the following directed diagram

The first step: QualitydataQuantityfactors is done by dimension reduction. The nature of each factor (latent variable) is quantitative and there are almost always more than one factor. Even though each latent variable is quantitative, but its interpretation is qualitative: According to Benzécri (1988, section 1.3, in comments on Descartes): ”Toute qualité n’est que l’expression d’un rapport de quantités”; that is, quality is the expression of a ratio of quantities. For interpretation of a principal dimension, we apply Aristotle/Benzécri principle. Aristotle in his book PHYSICS defined ”principles are contraries” and cited as examples taken from his predecessors ”hot and cold”, ”the rare and the dense” and ”plenum and void” see Aristotle (1960, p.14). In CA, Benzécri (1973, p.227) following Aristotle based the interpretation of a principal dimension on contraries (dichotomies, oppositions) and gradations, where an opposition or a gradation represent a latent variable. In another context, Choulakian (2014, 2016b) used Euclid’s principle of contradiction for interpretation of the first principal dimension for the analysis of rank data.

The second step: QuantityfactorsQuality

clusters) is done by usual methods such as k-means.

Murtagh (2005, section 1.1), described Benzécri’s paradigm “a tale of three metrics”; which clearly characterizes the diagram where : the chi-squared and the Euclidean metrics are for the first step, and the ultrametrics for the second step. A similar description to the above diagram is also stated by De Leeuw (2005). This fact also is reflected in the first printed work, Benzécri (1973), titled DATA ANALYSIS; which is composed of two volumes: The first volume’s subtitle is La Taxonomie; the second volume’s subtitle is Analyse des Correspondances.

3 An overview of taxicab singular value decomposition

Consider a matrix  of size and . Taxicab singular value decomposition (TSVD) of X is a decomposition similar to SVD of X; see Choulakian (2006, 2016a).

In TSVD the calculation of the dispersion measures , principal axes ( and principal scores for is done in an stepwise manner. We put and be the residual matrix at the -th iteration.

The variational definitions of the TSVD at the -th iteration are

The -th principal axes are

(1)

and the

-th principal vectors are

(2)

Furthermore the following relations are also useful

(3)

where is the coordinatewise sign function,  if    and    if  

The -th taxicab dispersion measure can be represented in many different ways

The -th residual correspondence matrix is

(5)

An interpretation of the term in (5) is that, it represents the best rank-1 approximation of the residual correspondence matrix , in the sense of taxicab norm.

Thus TSVD of X corresponds to

(6)

a decomposition similar to SVD, but where the vectors for are conjugate, a weaker property than orthogonality. That is

In TSVD, the calculation of the principal component weights, and and the principal scores,  and   can be accomplished by two algorithms. The first one is based on complete enumeration based on equation (1). The second one is based on iterating the transition formulae (2,3). This is an ascent algorithm; that is, it increases the value of the objective function at each iteration, see Choulakian (2006, 2016a). The iterative algorithm could converge to a local maximum; so it should be restarted from several initial configurations. The rows or the columns of the data can be used as starting values.

4 Preliminaries

Let of size be the associated correspondence matrix of a contingency table N, where . We define as usual , the vector the vector , and the diagonal matrix having diagonal elements and similarly We suppose that and are positive definite metric matrices of size and , respectively; this means that the diagonal elements of and are strictly positive. Let

or elementwise

(8)

be the residual matrix with respect to the independence model.  is the cross-covariance between the categories of the th nominal row variable and the th nominal column variable.

The independence assumption can also be interpreted in another way as

(9)

which can be reexpressed as

this is the row and column homogeneity models. Benzécri (1973, p.31) named the vector ( for and fixed) the profile of the th column; and the element the density function of the probability measure with respect to the product measure . The element is named Pearson ratio in Goodman (1996) and Beh and Lombardo (2014, p.123).

4.1 Estimation of the parameters by SVD

Suppose the independence assumption is not true, then each of the two equivalent model formulations (8,10) can be generalized to explain the nonindependence by adding bilinear terms, where .

a) Cov (cross-covariance) decomposition:

(11)

This is an interbattery analysis proposed by Tucker (1958). Tenenhaus and Augendre (1996) estimated the parameters in (11) by singular value decomposition (SVD) of the matrix The parameters in (11) satisfy the following equations

(12)

b) CA (correspondence analysis) decomposition

(14)

This decomposition has many interpretations. Essentially, for data analysis purposes Benzécri (1973) interpreted it as weighted principal components analysis of row and column profiles. Another useful interpretation, comparable to Tucker interbattery analysis, is Hotelling(1936)’s correlation analysis, see Lancaster (1958) and Goodman (1991). The parameters in (14) satisfy the following equations

(15)

The above two decompositions given in (11) and (14) are cross-covariance based. There are also association (log ratio) based decompositions see Goodman (1991, 1996) or Greenacre and Lewi (2009).

4.2 Estimation of the parameters by TSVD

First, we estimate the parameters ( in (11) by TSVD; then the parametrs in (14) will be linearly related by

(17)

The parameters and in (11) are the principal coordinates of the TCov decomposition and they satisfy

(18)

Similarly, the parameters and in (14) are the principal coordinates of the TCA decomposition and they satisfy

(19)

Let be the th residual correspondence matrix, where

(20)

Similarly, let be the th residual density matrix, where

(21)

Let be an optimal binary partition of , and be an optimal binary partition of such that and Besides (18), the taxicab dispersion will additionally be related to the TCov principal coordinates and in (11) by the following useful equations:

Equations (22, 23) follow from the fact that for is a double-centered matrix, see Choulakian and Abou-Samra (2020). The quantification of the intrinsic quality of a principal dimension is based on (23).

4.3 An observation

The TCov principal coordinates, and are uniformly weighted, see equation (18); meanwhile TCA principal coordinates, and are marginally weighted, see equation (19). What is the consequence to this? The answer to this question is: Benzécri’s principle of distributional equivalence, which states that CA (and TCA) results are not changed if two proportional columns or rows are merged into one. This has the practical consequence that the effective size of sparse and large data sets can be smaller than the observed size; for further details concerning sparse contingency tables see Choulakian (2017).

5 Main developments

Let L be a permutation matrix such that the coordinates of are in decreasing order, for Similarly, M be a permutation matrix such that the coordinates of are in decreasing order, for

We consider the matrix

(24)

We have the following easily proved result

Lemma 4: Let A necessary condition for the independence model, or is that

(25)

and

(26)

Remark: Relations (25 and 26) characterize Robinson matrices used for seriation of artifacts or sites in archeology. That is why we named see Table 2, seriated contingency table following its seriated row and column marginals.

Lemma 5: TSVD of is equivalent to TCov( TSVD of

Lemma 6 states that the th row TCA (or CA) principal factor score is the weighted covariance of the th residual density function with the -th principal axis ; where is the ith row of and is the jth column of

Lemma 6: In CA and TCA

and

where represents the -th standardized principal axis in each method. In CA, and In TCA, and for , see equations (3 and 17).

Proof: Here, we provide a proof for TCA. We use the transition formula (2) for  

which is the required result (27).

Remark: In CA, due to (16), equation (27), similarly (28), can further be simplified to

a well known result in Bastin et. al. (1980, p.157) or Goodman(1991, p. 1105, eq. A.1.3).

5.1 Quantifying the intrinsic quality of a taxicab principal axis

Within the Euclidean framework a measure of the quality of a principal dimension is the proportion (or percentage) of the residual variance explained (or inertia in the case of CA)

This is an extrinsic measure of quality, because it compares the dispersion of a principal axis with the residual dispersion . In the above equation replacing the l terms by the corresponding l terms, we obtain the measure of intrinsic quality expressed in Definition 7.

Let be an optimal binary partition of , and similarly be an optimal binary partition of for the th principal dimension Thus the data set is divided into four quadrants. We define a new index showing the quality of signs of the residuals (QSR) in each quadrant of the th residual cross-covariance matrix for in (20)

Definition 7: For the measure of the quality of signs of the residuals in the quadrant is

Similarly, a quantification of the quality of signs of the optimal cut of dimension is

Remark: The computation of the elements of are done easily in the following way. We note that the th principal axis can be written as

where and similarly

where and , and designates a column vector of 1’s of size So

where

To interpret the above indices, we recall from elementary probability theory the definition of association between two events by defining an index of association

for and .

a) When , then the th category of the row variable and the th category of the column variable are not associated (independent).

b) When , then the th category of the row variable and the th category of the column variable are attractively or positively associated; that is, the event (i,j) occurs more than by chance.

c) When , then the th category of the row variable and the th category of the column variable are repulsively or negatively associated; that is, the event (i,j) occurs less than by chance.

Based on these, the interpretation of the indices becomes evident: for instance, measures the intensity of the attractive association between the subsets and while measures the intensity of the repulsive association between the subsets  and

Allard et al. (2020) used the QSR index to choose between two competing methods of data analysis, TCA and taxicab log-ratio analysis of contingency tables and compositional data.

Notation:

We have the following easily proved result

Lemma 8: a) For if and only if

b) For

c) (

The proof of part c, is based on the arithmetic-harmonic means inequality which states that for four strictly positive real numbers

and

equality is attained when .

5.2 Quantifying the intrinsic quality of a principal axis in CA

Let be the th residual density matrix in CA,

where and represent the th standardized principal axis coordinates in CA.

Let be the th residual cross-covariance matrix in CA obtained from (29),

5.2.1

Let be an optimal binary principal axis partition of , and similarly be an optimal principal axis partition of by CA. Thus the residual covariance matrix is divided into four quadrants: and Based on the observation that both and are double centered, we can quantify the intrinsic quality of CA principal dimension by replacing by in subsection 5.1, and obtain measures.

Definition 9: For the CA measure of the quality of signs of the residuals in the quadrant is

for and and, and . Similarly the CA measure of the quality of signs of principal dimension is

where