# A random matrix perspective of cultural structure

Recent studies have highlighted interesting structural properties of empirical cultural states. Such a state is a collection of vectors of cultural traits of real individuals, based on which one defines a matrix of similarities between individuals. This study provides further insights about the structure encoded in these states, using concepts from random matrix theory. For generating random matrices that are appropriate as a structureless reference, we propose a null model that enforces, on average, the empirical occurrence frequency of each possible trait. With respect to this null model, the empirical similarity matrices show deviating eigenvalues, which may be signatures of cultural groups that might not be recognizable by other means. However, they can conceivably also be artifacts of arbitrary, dataset-dependent correlations between cultural variables. In order to understand this possibility, independently of any empirical information, we study a toy model which explicitly enforces a specified level of correlation in a minimally-biased way, in the simplest conceivable setting. In parallel, a second toy model is used to explicitly enforce group structure, in a very similar setting. By analyzing and comparing cultural states generated with these toy models, we show that a deviating eigenvalue, such as those observed for empirical data, can also be induced by correlations alone. Such a "false" group mode can still be distinguished from a "true" one, by evaluating the uniformity of the entries of the respective eigenvector, while checking whether this uniformity is statistically compatible with the null model. For empirical data, the eigenvector uniformities of all deviating eigenvalues are shown to be compatible with the null model, suggesting that the apparent group structure is not genuine, although a decisive statement requires further research.

## Authors

• 2 publications
• ### Ultrametricity increases the predictability of cultural dynamics

A quantitative understanding of societies requires useful combinations o...
12/16/2017 ∙ by Alexandru-Ionuţ Băbeanu, et al. ∙ 0

• ### Statistical applications of random matrix theory: comparison of two populations I

This paper investigates a statistical procedure for testing the equality...
02/28/2020 ∙ by Rémy Mariétan, et al. ∙ 0

• ### On the spectral property of kernel-based sensor fusion algorithms of high dimensional data

In this paper, we apply local laws of random matrices and free probabili...
09/25/2019 ∙ by Xiucai Ding, et al. ∙ 0

• ### Statistical applications of Random matrix theory: comparison of two populations III

This paper investigates a statistical procedure for testing the equality...
05/29/2020 ∙ by Rémy Mariétan, et al. ∙ 0

• ### An Open Model for Researching the Role of Culture in Online Self-Disclosure

The analysis of consumers' personal information (PI) is a significant so...
03/19/2020 ∙ by Christine Bauer, et al. ∙ 0

• ### Statistical applications of Random matrix theory: comparison of two populations II

This paper investigates a statistical procedure for testing the equality...
02/28/2020 ∙ by Rémy Mariétan, et al. ∙ 0

• ### Selecting the number of components in PCA via random signflips

Dimensionality reduction via PCA and factor analysis is an important too...
12/05/2020 ∙ by David Hong, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

Understanding the complex behavior of social systems greatly benefits from constructively combining the increasing amount of empirical data with a variety of quantitative, theoretical approaches, often originating in the natural sciences Urry ; Lazer . Although much of this interdisciplinary research focuses on the network and connectivity aspects of social systems Kadushin , efforts are also being made for understanding a complementary aspect: the formation and dynamics of opinions, preferences, attitudes and beliefs, more generically referred to as “cultural traits” Castellano . In particular, recent studies have placed a stronger emphasis on using empirical data about the cultural traits of real individuals Valori ; Stivala ; Babeanu_1 ; Babeanu_2 ; Babeanu_3 . Such data is typically recorded within a short period of time from a random sample of people in a population, via a social survey with a large number of questions, so that a vector (or sequence) of cultural traits can be constructed for every individual, where each trait is an answer to one of the questions. The collection of all cultural vectors constructed from one empirical source is called an empirical “cultural state”, or an empirical “set of cultural vectors”, since it can be used to empirically specify the initial conditions of an Axelrod-type model of cultural dynamics Axelrod . Using previously developed tools Valori ; Stivala that relied on models of cultural and opinion dynamics, Ref. Babeanu_1 showed that empirical cultural states are characterized by properties that are highly robust across different data sets. These properties have been further explored Babeanu_2 ; Babeanu_3 but not entirely understood. This generic structure present in an empirical cultural state is largely retained by the person-person matrix of cultural similarities that can be defined based on the cultural vectors, allowing for this structure to be further investigated by means of a random matrix approach.

Random matrix theory Mehta ; Edelman has been successfully for a variety of applications, such as the analysis of financial systems Potters

. The framework deals with the properties of random matrices, under certain distributional assumptions. The associated statistical ensembles of matrices are used to compute the expected values (or even the probability distributions) of interesting, matrix dependent quantities. These theoretical expectations can be compared to empirical counterparts evaluated on matrices that encode information about the real world systems that are being studied. Statistically significant deviations of the empirical quantities are then interpreted as interesting, non trivial structural properties of the respective systems. The focus is on the eigenvalue spectrum of the empirical matrix, which can be, for instance, a correlation matrix between the time series recording the price dynamics of stocks

MacMahon

or the activity neurons

Almog . In such cases, the appropriate assumptions of randomness are captured by the the Marchenko-Pastur Marchenko law, which gives a limiting distribution for the spectrum. The eigenmodes whose eigenvalues are significantly larger than what is expected based on the Marchenko-Pastur law are interpreted as joint dynamical patterns in terms of which the non-trivial behavior of the system can be understood, while the other are interpreted as the noise components of the system. Recently, Ref. Patil extended this approach to similarity matrices constructed from categorical data, where entry of the matrix is a similarity between two time series of discrete symbols. For instance, for one of the data sets of Ref. Patil , each sequence of symbols corresponds to an electoral constituency of India, with different symbols associated to different winning parties and successive time steps associated to successive elections.

In this study, this random matrix approach is applied to spectra of empirical matrices of cultural similarities, constructed from data previously used in Refs. Valori ; Stivala ; Babeanu_1 ; Babeanu_2 ; Babeanu_3

. Instead of relying on analytic formulas for estimating and filtering the noise, we make extensive use of numerical methods. This allows for a detailed investigation of three null models (Sec.

II

), among which the uniformly random generation is the simplest and conceptually closest to analytic approaches behind the Marchenko-Pastur distribution

Patil . As a second null model, we make use of trait shuffling, which is known to be important for understanding empirical cultural states, independently from spectral decomposition and random matrix notions Valori ; Stivala ; Babeanu_1 ; Babeanu_2 ; Babeanu_3

, since it reproduces exactly the empirical trait occurrence frequencies. We propose an additional null model which also reproduces these empirical trait frequencies on average, while also incorporating some mathematically desirable properties of the uniform random generation. We name this procedure ”restricted random generation”. These null models are compared in terms of how well they reproduce the upper boundary of the noisy spectral region (“the bulk”), as well as the position of the highest eigenvalue. This is a strong outlier which can be understood as a “global mode”, which for similarity matrices is guaranteed to be present even under the uniformly random scenario

Patil . As shown in Sec. II

, the restricted random generation turns out to be more appropriate and is thus selected for further analysis. Based on restricted randomness, we numerically evaluate the probability distribution of the upper noise boundary, showing that there are several empirical eigenvalues significantly above this boundary. These correspond to modes that capture the structure in empirical data, since they are incompatible with null hypothesis behind restricted randomness. Hence, this manuscript will often refer to them as “structural modes”.

It is tempting to interpret these deviating eigenmodes as signatures of group structure, in a manner similar to time series analysis MacMahon , in the sense that the individuals are organized in terms of several cultural groups categories. This is particularly intriguing, given that Ref. Babeanu_2 provides indirect evidence for cultural structure being governed by a small number of cultural prototypes supposedly induced by universal “rationalities”. However, it is important to keep in mind that the empirical data also shows pairwise correlations between cultural variables, that are at least partly induced by arbitrary, dataset-dependent similarities between how the corresponding items are defined, as previously pointed out Valori ; Babeanu_1 ; Stivala . Since these correlations are not retained by restricted randomness and shuffling, it is possible that deviating eigenmodes are a direct consequence of them. This rises the question of whether these eigenmodes are signatures of authentic group structure or are just artifacts of arbitrary correlations between variables. First, we explicitly show that, at least in principle, one can differentiate between the “correlations scenario” and the “groups scenarios” – this is not trivial, since group structure also entails, to a certain extent, pairwise correlations between features. This is done in Sec. III by studying, in a highly simplified, abstract setting, consisting of only binary features, two probabilistic models for generating (sets of) cultural vectors. The first model, labeled “FCI” (Sec. III.1), explicitly enforces a certain pairwise coupling between all features, in a manner that gives rise to a certain level of correlation, without introducing any additional assumption or bias in the underlying probability distribution. This is ensured by a maximum-entropy derivation Jaynes , which leads to a statistical ensemble that is mathematically equivalent to the canonical ensemble of the Ising model on a fully-connected lattice Colonna-Romano , where each feature corresponds to one lattice site and each cultural vector corresponds to a spin configuration. The second model, labeled “S2G” (Sec. III.2), explicitly enforces a dual group structure, whose strength can be analytically tuned to match the first model in terms of level feature-feature correlations that arise as a side effect. More details about these models and about the interpretation of deviating eigenmodes are given in Sec. III.

For any given level of feature-feature correlations, the two models are used for generating sets of cultural vectors. The structure of the resulting similarity matrices is captured by the subleading eigenvalue and its eigenvector. However, Sec. IV shows that the expected strength and significance of the subleading eigenvalue is exactly the same for the FCI and S2G models, for any given correlation level, so the subleading eigenvalue does not discriminate between the two scenarios, confirming that the presence of deviating eigenvalues does not necessarily imply the presence of group structure. We show that the essential difference between the FCI and S2G is captured by the entries of the eigenvector associated to the subleading eigenvalue. In particular, the uniformity of these entries, quantified by the “eigenvector entropy” (see Sec. IV

), shows a clearly different behaviour as a function of correlation for the two models, with S2G showing an increasingly higher uniformity as the correlation level increases. Moreover, the dependence of the second-highest eigenvector entropy on the correlation level reproduces well the symmetry-breaking phase transitions that characterize the two models. In each case, the eigenvector entropy suddenly jumps from a regime of compatibility to a regime of incompatibility with the null model, exactly when the probability distribution associated to the respective model becomes bi-modal. The critical correlation associated to this transition is almost ten times smaller for S2G than for FCI. This further justifies the use of eigenvector entropy as an indicator of group structure in empirical data, as a complement to eigenvalue information.

Along these lines, Sec. V presents an enhanced analysis of empirical data, showing how the eigenmodes are distributed in terms of eigenvalue and eigenvector entropy, in comparison to expectations based on restricted randomness. Interestingly, all the structural modes previously identified in empirical data (based on eigenvalues) are actually compatible with the null model in terms of eigenvector uniformity (based on eigenvector entropy). This is the case for all three datasets used in this study, suggesting that structural modes of culture are actually artifacts of arbitrary correlations between cultural variables. However, such a conclusion is conditional on the S2G model introduced here being representative, in a qualitative way, for any type of authentic cultural groups that empirical data might capture. As explained in Sec. VI, this might actually not be the case, so the presence of groups cannot be entirely rejected based on this study, especially if these groups are highly entangled. In particular, the S2G model does not account for the “mixing”, “multiple-self” ingredient that has been shown to be crucial Babeanu_2 for an interpretation of cultural structure in terms of a small number of prototypes Stivala , inspired by social science frameworks such as Plural Rationality Theory Thompson . In fact, it appears likely that structural modes induced by mixing prototypes would not exhibit higher eigenvector uniformities than expected based on the null model, while they should still qualify as group modes. More research is needed to establish whether this is indeed the case and, if so, to find a way of distinguishing structural modes induced by mixing prototypes from those induced by correlations.

## Ii Eigenvalue distributions for empirical data and null models

In this section, the eigenvalue spectra of empirical matrices of cultural similarities are evaluated. At the same time, three null models are evaluated and compared. Each null model is used to numerically generate similarity matrices, by randomly sampling from the associated statistical ensemble, which enforces, to a certain extent, the empirical information that is expected to not be of interest – this is information that, on a priori grounds, clearly has more to do with arbitrary survey design choices than with any authentic cultural structure. One of these models, namely the “restricted random” model, which is first introduced here, is chosen as good benchmark with respect to which interesting structure is to be measured, as explained below. Before explaining the actual results, some mathematical clarifications are given with respect to the computation of similarity matrices, the spectral decomposition procedure and the definitions of the null models.

A cultural similarity matrix is a square, matrix obtained from cultural vectors, which are all defined with respect the same set of cultural features (variables or dimensions). Each feature can take one of possible, discrete values, called “cultural traits”, where labels the features, according to some order that is arbitrary, but consistent across all vectors. Moreover, each feature can be either nominal, marked as , or ordinal, marked as , which affects how its similarity contribution is defined. Each entry of the similarity matrix is then computed according to:

 sij=1FF∑k=1[fknomδ(xki,xkj)+(1−fknom)(1−|xki−xkj|qk−1)], (1)

encoding the similarity between vectors and , where stands for the Kronecker delta function and and denote the traits recorded with respect feature in vectors and respectively – for the ordinal case, it is important that and take discrete, rational values between and , while for the nominal case they only need to take symbolic values from any (feature-specific) alphabet. Note that the similarity measure in Eq. (1) is an arithmetic average of the similarity contributions of the cultural features, in agreement with Refs. Valori ; Stivala ; Babeanu_1 ; Babeanu_2 ; Babeanu_3 – although in these studies most concepts are presented in terms of cultural distances , these have a trivial relationship to cultural similarities: . For an empirical matrix, each vector corresponds to one individual in the real world, each feature to one question or item in the questionnaire used to collect the data, so that the realized trait , which lies at the intersection between vector and feature , corresponds to the answer/rating given by individual to question/item . For a matrix generated based on a null models, the vectors are generated according to the specified random procedure, while retaining (at least) the empirical data format, namely the type and range of each feature . Note that, in contrast to the empirical symbolic sequences used in Ref. Patil , cultural vectors have no axis of time, so everything is equivalent up to a reordering of the cultural features, as long as this is done consistently for all cultural vectors. This is irrelevant for any of the mathematical operations involved by the analysis here, but it is relevant for the interpretation: cultural vectors capture no time-evolution, and should be interpreted as instantaneous, multidimensional opinion profiles, rather than as dynamical, one-dimensional dynamical profiles.

From Eq. (1) it follows that such a similarity matrix is real and symmetric, from which it follows, according to the spectral theorem, that it has real eigenvalues with associated orthonormal eigenvectors with real entries. This implies that the matrix can be decomposed in the following way:

 sij=N∑l=1λlvilvjl, (2)

where “” and “” are used to denote the th highest eigenvalue and, respectively, the eigenvector associated to it, while is the th entry of eigenvector . Throughout this study, special attention is payed to and , the highest and second highest eigenvalues of various similarity matrices, also denoted as the “leading” and “subleading” eigenvalues respectively. In parallel, “” is used to denote any generic eigenvalue. More notation will be introduced below, as needed.

All similarity matrices used in this study are based on sets of cultural vectors, regardless of whether they are empirical, generated with one of the three null models introduced below or with one of the toy models introduced in Sec. III. At the same time, the number of features is always larger than , which ensures that the information in every similarity matrix is not redundant, since its number of entries is smaller than the number of entries in the set of cultural vectors .

Fig. 1(a) shows the eigenvalue spectrum of an empirical similarity matrix computed based on cultural vectors extracted from Eurobarometer (EBM) data, which records attitudes and opinions of European Union citizens on various topics concerning technology, the environment and certain policy issues EBM . The data is formatted according to the procedure described in Ref. Babeanu_1 , which makes cultural features available. The vertical axis gives the number of eigenvalues occurring in each bin along the horizontal axis. The inset focuses on the higher region of the horizontal axis, where the leading eigenvalue is located. The high value of is expected based on purely mathematical grounds Patil , due to the overall positivity of any such similarity matrix. In most cases, all entries of the eigenvector associated to have the same sign and very similar absolute values, meaning that, according to Eq. (2), the captures a large, highly uniform, positive component of the matrix entries . The eigenmode thus accounts for the overall tendency towards similarity of the entire system, which is partly due to how similarity is defined and partly (see below) due to feature-level non-uniformities. For this reason, the mode will also be referred as the “global mode”, term which originates from time-series analysis MacMahon based on correlation matrices, for which a global mode may or may not be present, depending on the system. Using exactly the same format as Fig. 1(a), each of the other three panels of Fig. 1 shows the spectrum of a similarity matrix generated from each of the three null models: “uniform randomness”, “shuffling” and “restricted randomness”.

First, Fig. 1(b) shows the spectrum of a similarity matrix generated via uniform randomness (abbreviated as “u-random”). Specifically, for every vector, each trait is chosen independently at random from the traits available at the level of the respective feature, with equal probability attached each possible trait. This means that uniform randomness retains minimal information from the empirical cultural state used for Fig. 1(a): only the number of features, the type and the number of traits of each feature. Note that the leading eigenvalue of this matrix is comparable to that of the empirical matrix. Ref. Patil showed that the analytic, limiting distribution given by the Marchenko-Pastur formula has a shape that is qualitatively similar to the bulk of the u-random spectrum. Quantitatively however, the analytic and numerical distributions become truly similar only if an important parameter controlling the former is left free and fit to the numerical results, instead of being directly set to , which can be done when dealing with matrices of correlations between time series with numerical entries each. Moreover, the Marchenko-Pastur formula completely fails to describe the leading eigenvalue.

Second, Fig. 1(c) shows the eigenvalue spectrum of a similarity matrix generated via shuffling. Specifically, with respect to every every feature, the traits realized in the empirical state are randomly permuted among the vectors, such that every permutation is equally likely. This is done independently for every feature, so that all types of correlations between features are destroyed. The procedure preserves exactly the number of times each trait is empirically realized, in addition to preserving the data format of the empirical state in Fig. 1(a), Note that, by construction, the assignment of traits to vectors is not entirely independent across vectors, implying that the number of vectors resulting from shuffling has to be exactly the same as the number of empirical vectors used.

Third, Fig. 1(d) shows the spectrum of a similarity matrix generated via restricted randomness (abbreviated as “r-random”). Specifically, for every vector, each trait is chosen independently at random from the traits available at the level of the respective feature, with different probabilities attached to the possible traits, these probabilities being directly proportional to the empirical occurrence frequencies of the respective traits. This means that, like the shuffling procedure, restricted randomness also reproduces the empirical trait frequencies, but on average. Moreover, it also retains the independent generation specific to uniform randomness, which allows for an arbitrary number of cultural vectors to be generated, regardless of how large this number is for empirical data. The independent generation should also make the analytic tractability of the model easier. Although neither of these two advantages are directly exploited in this study, they suggest that restricted randomness is conceptually more appropriate than either uniform randomness or shuffling, as it incorporates the desirable properties of both.

The rough shape of the eigenvalue histogram is quite similar across the four panels of Fig. 1, which means that empirical data contains a large amount of noise, which can be described reasonably well by any of the three null models. Interesting discrepancies are present in terms of the leading eigenvalue: the empirical value is very similar to the shuffled and r-randomn values, while higher than the u-random value. This shows that the overal tendency towards similarity is smaller in the uniformly-random case than in the other three cases. This is understandable given that shuffling and restricted randomness reproduce the feature-level non-uniformities, which in turn are responsible for an overal level of similarity which is higher than what is expected from uniform randomness Babeanu_2 , leading to an enhanced global mode.

Very important are the empirical outliers in Fig 1(a), which encode empirical structure that is independent of feature-level nonuniformities. The two higher outliers are larger than the bulk boundary as predicted by any of the three null models, while the other two appear compatible with the random bulk predicted by uniform randomness. This highlights the importance of choosing the appropriate null model, since this determines the position of the boundary between noise modes and structural modes along the axis, which in turn decides how many empirical eigenmodes are to be regarded as structurally relevant on the higher side of this boundary. It appears that the position of this boundary is somewhat different for the three null models, but this is hard to evaluate only based on Fig. 1, due to limitations inherent in the binning.

Fig. 2(a) overcomes these limitations by showing the subleading eigenvalue distribution for the three null models, in parallel with the leading eigenvalue distributions in Fig. 2(b), where the colors associated to the three null models are the same as those in Fig. 1. For comparison, the empirical eigenvalues are shown by the vertical (red) lines in the upper bands of Fig. 2. Each and distribution is produced numerically by sampling sets of cultural vectors from the statistical ensemble of the respective null model. It appears that shuffling and r-random show essentially the same distribution, while for u-random this is located at higher values. Since sets the boundary for the random bulk, more empirical eigenmodes are to be regarded as structurally relevant with respect to a null model based on shuffling or restricted randomness, rather then on uniform randomness. Choosing between shuffling and r-random appears appropriate, since they are consistent with empirical data in terms of the leading eigenvalue, as noted before, now confirmed in a more statistically reliable way by Fig. 2(b). Such a choice is compatible with the idea of focusing on the empirical structure that is present independently of feature-level non-uniformities, which are expected to strongly depend on how the associated questions and the possible answers are formulated and much less on authentic properties of the real social system from which the data is extracted. With respect to either the shuffled or the r-random distribution, all four empirical outliers noted in Fig. 1(a)

appear statistically significant, with a departure of at least two standard deviations from the mean.

On the other hand, based on Fig. 2(b), the empirical leading eigenvalue also appears statistically compatible with both shuffling and restricted randomness, but closer to the mean of the former. This, however, deserves a closer inspection, due to the limitations inherent in the binning of Fig. 2(b). Fig. 3 focuses on the shuffled and r-random distributions, giving a better impression of how well either null model predicts the empirical leading eigenvalue based on partial information about trait frequencies. It appears that, due to the sharpness of the shuffled distribution, the empirical value is actually not statistically compatible with it, while it is clearly compatible with the r-random distribution. For this reason, we choose restricted randomness as the appropriate null model. Note that, for visual purposes, the bins are chosen to be much smaller fo the shuffled than for the r-random distribution – both histograms contain entries, one for each random matrix sampled from the respective ensemble.

Finally, it is worth repeating the analysis on empirical cultural states constructed from two more datasets, namely the General Social Survey GSS (GSS) – Fig. 4(a) – and Jester JS (JS) – Fig. 4(b). Both datasets are also formatted according to the procedure described in Ref. Babeanu_1 , leading to features for GSS and to features for JS. The two figures follow the format of Fig. 2(a), since this emphasizes the empirical outliers and their departure from the subleading eigenvalue distributions of the three null models – although, at this point, the choice has already been made in favor of restricted randomness, the other two distributions are also shown for consistency. Both the GSS and JS eigenvalue spectra show outliers that are significantly larger than what is expected based on the r-random null model: three such outliers are present for GSS and four for JS. The deviating eigenvalues are, on average, larger for JS than for EBM, and higher for EBM than for GSS. – note that the axis ranges of Figs. 4(a)4(b) and 2(a) are not the same.

Based on the results above, one can say that the empirical structure captured by matrices of cultural similarity is generally recognizable via eigenvalues that are significantly larger than what is expected based on a null hypothesis accounting for empirical trait frequencies: they are significantly higher than the subleading eigenvalue and much lower than the leading eigenvalue expected from this null hypothesis. For the rest of this study, the eigenpairs (eigenvector-value pairs) associated to these deviating eigenvalues will often be referred to as “structural modes”.

## Iii Two interpretations of structural modes

This section explores possible ways of interpreting the structural modes of culture described above. To begin with, certain aspects of linear algebra are emphasized, in relation to the diagonalization of similarity matrices, which justify an interpretation of structural modes as group modes, like in the context of correlation matrices. Then, two hypotheses are formulated: first, that structural modes are just the effect of correlations between cultural features, thus only retaining information about how the associated questions/items are chosen; second, that structural modes are an effect of genuine groups or grouping tendencies among the individuals, thus retaining information about the social system from which the data is extracted. This leads to probabilistic formulations of the two hypotheses in a very simplistic setting: the correlations-only scenario is realized as the “fully-connected Ising” (FCI) model in Sec. III.1, while the groups scenario is realized as the “symmetric two-groups” (S2G) model in Sec. III.2. Finally, in Sec. III.3, the mathematical properties of the two models are studied in order to check that they behave as expected and to better emphasize their differences.

It is instructive to first consider some elementary, but important mathematical properties of the eigenvalues and the associated eigenvectors satisfying Eq. (2), since they provide important hints towards how the structural modes are to be interpreted. For the sake of clarity, the following explanations make use of the term “individual” as a replacement for “cultural vector”, although most of the concepts presented are also valid, at least mathematically, for similarity matrices constructed from randomly generated cultural vectors, based on any probabilistic model.

Since the eigenvectors have only real entries and form an orthonormal basis, one can write any real vector with entries as a linear combinations of the eigenvectors:

 w=N∑l=1αlvl, (3)

with real coefficients . The rest of this argument is restricted to unit vectors , which satisfy , which can be translated as in terms of the eigenvectors’ coefficients. This encompasses all the eigenvectors as special cases. Moreover, let us define the following scalar quantity:

 S=N∑i=1N∑j=1wisijwj, (4)

as the double contraction of the similarity matrix with the vector . By means of Eq. 4 and Eq. 3, for any vector (including the special cases when this entirely matches one of the eigenvectors ) every entry of becomes associated to one of the individuals based on which the similarity matrix is computed. Thus, can be seen as a (normalized) linear combination of the or individuals. can be interpreted as the self-similarity of any normalized linear combination , since every pairwise similarity is multiplied by the numbers and attached to individuals and . For any normalized , one can show that:

 S=1+2N−1∑i=1N∑j=1+1wisijwj, (5)

which immediately follows from the fact that , which is a direct consequence of how the similarity is defined in Eq. (1). Note that whenever gives a strength of to one individual and to all the other, which supports the interpretation of as a self similarity. It is also important to note, from Eq. (5), that is larger when is such that pairs of entries with the same sign correspond to higher values of and higher values of , while pairs with opposite signs correspond to lower values of and lower values of .

The largest self-similarity is attained when the linear combination , among all unit vectors, takes the form of the eigenvector with the largest associated eigenvalue , corresponding to . This largest self-similarity value is actually equal to the largest eigenvalue: . This is shown by plugging Eq. (2) and Eq. (3) into (4) and using the normalization condition, leading to:

 S=N∑l=1α2lλl. (6)

More generally, one can see here that each eigenvector with the th highest eigenvalue , corresponding to , is such that it gives the largest possible value of , while also being normalized and orthogonal to all eigenvectors with When corroborating this with the insights provided by Eq. (5), one realizes that any subset of individuals with strong, internal similarities is captured by one of the eigenmodes, whose eigenvalue is larger if the overall level of internal similarity is higher. Moreover, the eigenvector entries of these strongly similar elements will have the same sign and the highest absolute values.

By combining the above with the findings of Sec. II, a more complete interpretation is obtained for structural modes: they are the normalized linear combinations of the individuals, orthogonal to each other and to the global mode, with the highest possible self-similarities, of which with the lowest is significantly higher than what is expected from restricted randomness. Each of these structure modes could indicate the presence of a group of highly similar individuals, which is why in the context of time-series analysis they are often called “group modes” MacMahon . Although it is not clear how a linear combination of individuals (or of cultural vectors) should be expressed in terms of cultural traits and features, this is not important for this study and does not affect the above arguments.

An alternative interpretation of structural modes comes from realizing that social surveys are imperfect, in the sense that one cannot guarantee the absence of overlaps or of similarities between the variables that are used. These translate to correlations between cultural features, which have been noticed in previous studies Valori ; Stivala ; Babeanu_1

and which are specific to the design of each dataset. It is conceivable that feature correlations, if strong enough, could induce artifactual structural modes themselves. For example, if a large fraction of the associated items or questions are designed such that they are mostly sensitive to the same underlying degree of freedom, the similarity between individuals responding to any of these items in a certain way will be high, since these individuals will likely respond to all the other similar items in the same way. It appears likely that this behavior would be captured by a structural mode. If this is the mechanism behind the structural modes shown in Sec.

II, it means that they do not provide information about the inherent organization of real-world culture, but just about the design of the “instrument” used to “measure” culture. On the other hand, to make things more complicated, feature-feature correlations may also be a consequence of group structure.

It is thus crucial to understand the extent to which structural modes of culture are due to the details of the experimental setting and to what extent they are due to authentic groups that are recognizable in the real world regardless of such details. This study makes a first step in this direction, by translating the two scenarios as mathematical, probabilistic models capable of generating (sets of) cultural vectors that are governed either by a coupling between cultural features (Sec. III.1) or by a grouping tendency (Sec. III.2). These models are designed to work without any empirical input, in the simplest conceivable setting, consisting of binary features – it does not matter whether these features are regarded as ordinal or nominal, since the two types of similarity contributions are equivalent if there only traits available, as can be seen from Eq. (1). For each feature, the two traits are marked as and – although the former should should be mapped to when computing similarities between vectors, if features are to be regarded as ordinal. Each of the two models defines a statistical ensemble (and an associated cultural space distribution, in the language of Refs. Babeanu_1 ; Babeanu_2 ), according to which cultural vectors can be drawn in random, but non-uniform way. Both statistical ensembles are defined such that each feature-level probability distribution is uniform – the two traits have an equal probability of attached. Note that, although both models are probabilistic in nature, neither of them is intended as a null model, since neither makes use of information from empirical data nor is it intended for direct, quantitative comparisons to empirical data, nor to be realistic to any extent. They are toy-models, intended to prove certain principles and provide certain insights about correlations and groups in the context of cultural states. Nonetheless, they do provide an arena for studying and developing certain mathematical tools in a highly controlled setting, tools that can be later used for studying empirical data.

### iii.1 The feature-feature correlations scenario

This section explains the “fully-connected Ising” (FCI) model, in the context of generating (sets of) cultural vectors in a stochastic way. The purpose of this probabilistic model is to enforce a certain level of correlation across all pairs of cultural features, controllable via one parameter, but nothing else in addition. This can be done by properly choosing the probability distribution taking as support the set of possible cultural vectors with binary features, or, in other words, the set of possible spin configurations with lattice sites. Note that the support of this distribution has elements, which is the number of sites/points of the “cultural space”, according to the formalism in Ref. Babeanu_1 .

One needs to choose the maximally-random (thus minimally biased) probability distribution that entails a certain level of feature-feature correlations. This is found by maximizing the Shannon entropy (Eq. (15)) subject to two constraints: one enforcing the normalization of the probability distribution (Eq. (16)), the other enforcing the overall level of pairwise coupling between cultural features (Eq. (17)). This procedure is a realization of maximum-entropy inference introduced in Ref. Jaynes , and is described in detail in Sec. A. The resulting probability distribution can be expressed as:

 p(μ,F,F+)=1Z(μ)F!F+!(F−F+)!exp[μ2((2F+−F)2−F)]. (7)

This gives the (total) probability attached to all cultural vectors with out of traits marked as “+” or “”, where is the parameter controlling the overall level of coupling between features. Moreover, is a normalization factor, namely the partition function in Eq. (22). Note that , since the expression combines the probability of different possible configurations with the same , which, due to symmetry reasons are equally likely. There are such configurations (the “density of states”) for each .

The model is mathematically equivalent to the Ising model of magnetism on a fully connected lattice Colonna-Romano , described in the canonical ensemble, with the parameter replacing the ratio between spin-spin coupling and temperature, which controls for the overall level of alignment between spins. This parallel does not come as a surprise: for any statistical physics ensemble defined by the averages of certain, externally controlled/measured (physical) quantities, the mathematical derivation can be formulated in terms of maximum-entropy inference Jaynes , which ultimately provides a statistical, information-theoretic justification of minimum-bias as a replacement for assumptions like “ergodicity”. Due to this parallel, the nomenclature related to spins is sometimes used instead of that related to cultural features.

Based on Eq. (7), one can derive the expression for the correlation between any two features:

 (8)

based on the entire statistical ensemble. The details of this derivations are also given in Sec. A.

In Eq. (7) and Eq. (8), the coupling parameter is positive: . Physically, this corresponds to ferromagnetism, meaning that alignment between spins is favored, a tendency which is enhanced with increasing . Using Eq. (7), one can check that, for vanishing coupling , the probability of choosing a configuration with a given is directly proportional to the number of such configurations, which is specified by the binomial coefficient preceding the exponential. As is increased, more emphasis is given to configurations with unequal numbers of and traits, at the expense of configurations that are more balanced. Using Eq. (8), one can also check that the correlation increases with increasing coupling , as expected, and that for any .

### iii.2 The group structure scenario

This section explains the “symmetric two-groups” (S2G) model, in the context of generating (sets of) cultural vectors in a stochastic way. This probabilistic model enforces an organization of cultural vectors in terms of two, equally sized groups, with high similarities within groups and low similarities between groups. The model defines a probability distribution taking as support the same set of possible cultural vectors as in Sec. III.1: the cultural space defined by binary features, with configuration. One of the group is “centered” around the configuration with a trait with respect to each feature, while the other group is centered around the opposite configuration, having a trait with respect to each feature. The model is designed such that all features contribute equally to the group structure. As a consequence, this induces a certain level of correlation over all pairs of cultural features. The strength of these correlations is controlled by the same free parameter that controls the strength of the group structure.

According to the S2G model, every cultural vector that is generated is first randomly assigned to one of the two groups, with equal probabilities. These two groups are denoted as the “” group and the “” group. Then, at the level of every feature, the trait is randomly and independently chosen among the two possibilities, but with unequal probabilities: the trait with the same sign as the group is chosen with probability , while the trait with the opposite sign is chosen with probability . Here, is the free model parameter controlling the strength of the group structure: lower values imply stronger group structure and stronger correlations between features, as made more explicit by Eq. (10). From this procedure, it follows that, at the level of every feature, each generated trait falls under one of the following situations:

• with probability , it is attached to a vector belonging to group and has a value of ;

• with probability , it is attached to a vector belonging to group and has a value of ;

• with probability , it is attached to a vector belonging to group and has a value of ;

• with probability , it is attached to a vector belonging to group and has a value of .

Note that the probabilities of the four cases add up to , that the combined probability of either value is and that the probability of either group is also .

For this model, the probability that a generated configuration has traits is:

 p(ν,F,F+)=12F!F+!(F−F+)!(2ν)F+(1−2ν)F+[(2ν)F−2F++(1−2ν)F−2F+], (9)

while the correlation between any two features is:

 C(ν)=1−8ν+16ν2. (10)

The mathematical derivations of Eq. (9) and Eq. (10) are given in Sec. B. Note that the correlation in Eq. (10) behaves as expected, namely: (when the two groups are maximally dissimilar the correlation is maximal) and (when the two groups are indistinguishable the correlation is zero). Finally, Eq. 10 can be written in the form of a quadratic equation, whose solution reads:

 ν(C)=1−√C4, (11)

after having taken into account that . Note that the alternative, solution given by the quadratic formula would be valid for the interval, which is not used here, since it is entirely equivalent (up to an inversion) with the interval, while being relevant only when group is allowed to be biased towards traits instead of towards traits, and viceversa, which is not the case here.

### iii.3 Mathematical comparison of the two scenarios

This section deals with the comparison between the FCI and the S2G models, in terms of properties that can be extracted directly from the equations in Sec. III.1 and Sec. III.2, without the need of randomly sampling from the the two statistical ensembles. Specifically, we focus on the behavior of the feature-feature correlation (Fig. 5), the shape of the probability distribution (Fig. 6) and the symmetry breaking phase transition (Fig. 7) associated to each model.

Fig. 5 shows the behavior of the correlation between any two cultural features for the two models. First, Fig. 5(a) shows how the correlation entailed by the FCI model depends on the model parameter controlling the pairwise couplings between features, based on Eq. (8). Different curves correspond to different values of . Note that the correlation increases from to as the coupling is increased, but it also increases as the number of features is increased. Second, Fig. 5(b) shows how the correlation entailed by the S2G model depends on the model parameter controlling the group strength, based on Eq. (10). Here, the correlation decreases from to as the coupling is decreased, which is consistent with the fact that, by construction, lower values of correspond to a stronger group structure. Note that the behavior is independen of , which is obvious from Eq. (10).

All the following comparisons are based on a matching of the two models in terms of the correlation level . Specific values of are chosen, based on which the correlation level entailed by the FCI model is computed via Eq. (8), for a given . Then, the corresponding of S2G entailing the same correlation is calculated based on Eq. (11). This creates a correspondence between parameter of FCI and parameter of S2G by means of the correlation . Since is a number extracted from the full statistical ensemble under a specific parameterization, it can be regarded as a model parameter, namely as a replacement or remapping of (in the case of FCI) and of (in the case of S2G), which allows for a side-by-side comparison of the two models in terms of other quantities.

This -to--to- mapping is first exploited by Fig. 6, which shows the probability distributions associated to the FCI and S2G models, as described by Eq. (7) and Eq. (9) respectively. In either case, the distribution is shown for the same values of the correlation that are listed by the legend at the top. These values correspond to the values of the and parameters that are listed in Table 1. The calculations are based on a value of , which is comparable to the values associated to the empirical cultural states used in Sec. II and Sec. V.

Note that, in the limit of vanishing correlation , the distributions of both models converge to the uniform probability distribution, which assigns to every value of

a probability that is equal to the fraction of possible configurations with that many “+” traits . This uniform distribution is characterized by the existence of one maximum at the center of the

axis. As the correlation increases, the shape of the distribution becomes wider, with two equal maxima arising on either side of the axis, whose separation also increases with increasing . Thus, both models exhibit a symmetry breaking phase transition.

However, a close inspection of Fig. 6 reveals that the symmetry breaking happens later (higehr values of ) for the FCI model than for the S2G model, meaning that there is a non-vanishing interval for which FCI exhibits a unimodal behaviour, while the S2G exhibits a bimodal behaviour, interval which contains the value. This interval is of crucial interest for this study, since it corresponds to the correlation regime for which the symmetric group structure built into the S2G model is visible in the shape of the probability distribution, while the feature-feature coupling built into the FCI model is not strong enough to induce a qualitatively similar shape. Still, even for values that are high enough for the FCI distribution to also show maxima, the exact shapes of the two distributions are still different, with the S2G maxima being stronger than the FCI ones (visible for ). This is a visual confirmation that the two statistical ensembles are indeed different and that the S2G ensemble has a smaller Shannon entropy than the FCI ensemble, for any, non-vanishing value of , thus being more biased, more constrained and encoding more structure, which should manifest itself at the level of higher-order correlations (involving more than two spins/features).

A more complete picture of the phase transitions exhibited by the two models is provided by Fig. 7. This shows the dependence of two mathematical properties of the probability distributions in Fig. 6 on the model parameters. The first property, denoted here by , is a normalized departure of either probability peak from the center of the (horizontal) axis. The second property, denoted here by , is a normalized height of either probability peak compared to the probability at the center of the (horizontal) axis. Note that is a placeholder for either the parameter or the parameter, depending, respectively, on whether the FCI or the S2G model is used. Both quantities are zero when symmetry breaking is not present and are positive when symmetry breaking is present, giving higher values for better defined probability peaks. They can thus be used as “order parameters” characterizing the phase transition, although they are evaluated in a a priori way, based on the expression of the probability distribution, rather than based on configurations samples from the associated ensemble. Mathematically, the first quantity is defined as:

 O1(γ,F)=[0.5F]−F*+(γ,F)[0.5F], (12)

while the second quantity is defined as:

 O2(γ,F)=p∗(γ,F)−p(γ,F,[0.5F])p∗(γ,F), (13)

where the square brackets stand for the “integer part” operation. Moreover, is the (integer) position along the axis of the first (lower-) peak and is the height of this peak. At the same time, is evaluated according to either Eq. (7) or Eq. (9), depending on whether the quantity is evaluated for the FCI model ( is replaced by ) or for the S2G model ( is replaced by ). The value of is extracted by iteratively exploring the lower half of the axis, while evaluating according to either Eq. (7) or Eq. (9). On the other hand, is essentially an abbreviation for .

The four panels of Fig. 7 show the behaviour of for the FCI model (Fig. 7(a)), the behaviour of for the S2G model (Fig. 7(b)), the behaviour of for the FCI model (Fig. 7(c)) and the behaviour of for the S2G model (Fig. 7(d)). The dependence of either quantity on the parameter (for FCI) and on the parameter (for S2G) is translated in terms of the corresponding correlation value , via Eq. (8) and Eq. (10) respectively. Note that the two quantities agree in terms of the correlation value for which the transition occurs, for both the FCI (Fig. 7(a) vs Fig. 7(c)) and the S2G (Fig. 7(b) vs Fig. 7(d)), for any number of features . It is clear that the transition point comes closer to with increasing for both models. Finally, Fig. 7 shows that, independently of , the transition point of S2G is located at lower values of than that of FCI.

## Iv Discriminating between the two interpretations

This section investigates the signatures of the two structural scenarios introduced in Sec. III, from a spectral analysis and random matrix perspective, with the purpose of identifying quantities that can differentiate between the two underlying hypotheses: feature-feature correlations vs group structure. To this end, sets of cultural vectors are numerically sampled from the two ensembles and similarity matrices are computed, based on Eq. (1). Since both the FCI and S2G ensembles are such that the (marginal) feature-level probability distributions are uniform, restricted randomness (see Sec. II) is equivalent to uniform randomness as a null model (at least if the number of cultural vectors is reasonably high) with respect to which structure is to be evaluated. Thus, for simplicity, uniform randomness (u-random) is used as a null model in this section. All comparisons made here make use of matching the feature-feature coupling parameter of FCI and the group strength parameter of S2G in terms of the correlation level , as described in Sec. III.3. Moreover, the number of features and the number of cultural vectors are and for all the FCI, S2G and u-random cultural states generated and used for the figures of this section.

The most obvious quantity that could conceivably discriminate between the FCI and the S2G models is the subleading eigenvalue , or the extent to which this goes above the uncertainty range predicted by uniform randomness. Fig. 8 shows the dependence of on the correlation level for FCI (red) and S2G (blue), while the horizontal black lines show the u-random uncertainty range (the mean value and 1 standard deviation on each side of the mean), as a compact replacement of the distributions shown in Fig. 2(a), Fig. 4(a) and Fig. 4(b) – as mentioned in the figure caption, these lines are not meant to give any information about the correlation level of the u-random null model, nor about realized correlations based on specific sets of vectors sampled from the ensemble. Surprisingly, does not distinguish between the FCI and the S2G models, for any given correlation level , since the average values clearly overlap. At the same time, (for both models) does depart significantly from the null model expectations. This explicitly shows that empirical structural modes such as those identified in Sec. II can actually be triggered be feature-feature correlations alone, at least in certain cases (those for which the simplistic setting behind the FCI and S2G models is reasonably representative). Thus, empirical eigenvalues that significantly depart from what is expected based on the null hypothesis do not automatically indicate groups. In the light of Sec. III, Fig. 8 also implies that the subleading eigenmodes of matrices produced via FCI have, on average, the same similarity as those of matrices produced via S2G, for a given correlation level. This appears counter-intuitive, since the low- presence of symmetry breaking for S2G makes it much easier to identify two, well separated groups, one for each side of the axis of Fig. 6. However, a closer inspection of the probability distributions in Fig. 6 reveals that FCI is more likely to produce, even in the absence of symmetry breaking, cultural vectors that are at one extreme or the other (almost fully populated with traits or with traits). These extremal configurations are much more representative, or “central”, for the configurations that are possible on the respective side of the axis. Also note that the values of used in Fig. 8 are the same for FCI and S2G and the same as those used in Fig. 9 and Fig. 10 described below. For each FCI and S2G point in any of these plots, explicit averaging over the sampled sets of cultural vectors is only performed with respect to the quantity associated to the vertical axes. For the correlation level , associated to the horizontal axes, we simply use the analytically-computed, ensemble-level value, for the given parameterization of the model (Eq. (8) and Eq. (10)).

Sec. C shows, in a manner similar to Fig. 8, the behavior of the largest and and third largest eigenvalues – and respectively – for the FCI and S2G models, in comparison to the u-random null model. The analysis there makes it clear that the and are both compatible with the null hypothesis. Thus, all or most of the structural information of cultural states generated from either the FCI or the S2G model is captured by the eigenpair. Since cannot discriminate between the two scenarios, this means that all or most discriminating power is encoded in the associated eigenvector , which is the focus of the rest of this section.

Based on Sec. III.3 and in particular on Fig. 6, one can say that, for the interesting correlation interval where FCI does not exhibit symmetry breaking while S2G does, configurations that are on one side of the axis and are generated with S2G have a much more equal fraction of traits of a certain sign than those generated with FCI. These S2G configurations should thus have a much more equal contribution to the structural mode than FCI configurations, so the associated entries should be much more equal for S2G than for FCI. Given the symmetric nature of both models, it follows that the absolute values of all the entries should be much more equal for S2G cultural states than for FCI ones, while, in either case, the entries associated to cultural vectors on different sides of the axis would (typically) have different signs. This reasoning suggests that the difference between FCI and S2G would be captured by a quantity that evaluates the overal extent of “equality” of the absolute values of the entries of the eigenvector, or, in other words, the eigenvector “uniformity”. Since these entries are normalized via for any eigenvector , the Shannon entropy is a natural quantity for evaluating the uniformity. This leads to the definition of “eigenvector entropy” associated to to the th highest eigenvalue , as a measure of uniformity:

 Hl=−N∑i=1|vil|2log|vil|2 (14)

where is the th entry of the eigenvector associated to – note that this quantity was also used in Ref. Patil , which cites Ref. Jones .

Fig. 9 shows the behavior of the eigenvector entropy associated to the second highest eigenvalue , in a format very similar to that of Fig. 8. This confirms that discriminates well between the two models, with S2G showing clearly higher values than FCI as long the correlation level does not come arbitrarily close to . Moreover, comparing the two profiles with the u-random one- band reveals that the structure of S2G becomes incompatible with the null-hypothesis for much lower correlation values than the structure of FCI. However, for either model, the curve does not show the sudden increase that one would expect based on the phase transitions described in Sec. III.3, in the manner they are exhibited by the more theoretical and curves in Fig. 7.

The smoothness of the curves is actually related to the fact that, for the low- regime, where is highly compatible with the null hypothesis, is typically not the second highest eigenvector entropy, although it is associated to the second highest eigenvalue. This suggests a definition of as the th highest eigenvector entropy, independently of the associated eigenvalue. Fig. 10 is a modification of Fig. 9, with used as a replacement for for the vertical axis, affecting all the FCI, S2G and u-random calculations. Note that, unlike in Fig. 9, the sudden changes in Fig. 7 are now reflected in Fig. 10. Moreover, the transition points at in Fig. 7 seem to be well reproduced in Fig. 10, while the FCI and S2G shapes of the curves are quite similar to those of , which are related to the height of the probability distribution peaks. Finally for higher values, each curve in Fig. 10 is almost identical to the associated in Fig. 9, so strong structure makes it very likely that the eigenvector of the second highest eigenvalue has the second highest entropy, and is effectively equivalent to .

The considerations above strongly suggest that a significant departure of the eigenvector entropy from the null model expectation is a good indication that the eigenvector encodes information about a group or a grouping tendency. In the simplistic (binary, marginally-uniform) setting of the FCI and S2G models, one could define the presence of groups in a theoretical, a priori way via the presence of maxima (and symmetry breaking) in the probability distribution over the axis: when maxima are present, most vectors sampled from the distribution can be unambiguously recognized as belonging to one of the two groups, based on their value. Under this interpretation, within the interesting inteval for which S2G exhibits groups and FCI does not, the eigenvector entropy and its departure from randomness expectations is crucial for deciding, in a a-posteriori way, whether groups are present or not.

## V Revisiting the empirical data

The findings of Sec. IV point out the importance of the eigenvector entropy, in addition to the eigenvalue, for deciding whether a structural mode qualifies as an authentic group mode or not. Thus, the two quantities should be used together for a second, more detailed inspection of the empirical data in Sec. II. This is the purpose of the current section. The empirical similarity matrices are computed based on the same three sets of cultural vectors used in Sec. II, constructed from Eurobarometer (EBM), General Social Survey (GSS) and Jester (JS) data.

Fig. 11 shows a scatter of the empirical eigenpairs of the EBM matrix, where the horizontal axis is associated to the eigenvalue , while the vertical axis is associated to the eigenvector entropy . The global mode eigenpair is highlighted by the inset. In the main plot, the vertical lines show the average and the 1- band of what one may expect for the subleading eigenvalue , based on the r-random null model, which reproduces, on average, the empirical trait frequencies (see Sec. II). In the inset, the vertical lines show the same type of information for the leading eigenvalue . The horizontal lines in the main plot and the inset show the average and the 1- band of what one may expect for, respectively, the subleading entropy and the leading entropy , based on the r-random null model. Note that, as anticipated in Sec. IV, the subleading entropy is usually not associated to the subleading eigenvalue, while the leading entropy appears to always be associated to the leading eigenvalue.

The four structural modes identified based on Fig. 2(a) are also visible in the main plot of Fig. 11, to the right of the vertical r-random band. Importantly, all their eigenvector entropies are below the horizontal r-random band, suggesting that neither of them qualifies as a group mode. Actually, all the bulk EBM eigenpairs are also below the r-random band, and thus compatible with the null hypothesis in terms of the uniformity of eigenvector entries. Also note that the leading eigenvector entropy is significantly smaller than what the null model predicts, but this difference is much smaller than the difference between the leading eigenvector entropy and the subleading one. This means that the contributions of different cultural vectors to the global mode are less equal than expected based on randomness, but much more equal than the contributions to any of the structural modes.

The analysis in Fig. 11 is also applied to the other datasets and the results are presented in Fig. 12, with Fig. 12(a) showing the results for GSS data and Fig. 12(b) showing the results for JS data. In both cases, the results are similar to those of EBM data: the structural modes do not show a higher eigenvector uniformity than what is expected based on the null model, nor do any of the smaller- modes, while the eigenvector uniformity of the global mode is smaller than what is expected based on the null model, but much higher than what is expected or realized for the structural modes and the random modes. In the light of Sec. III and Sec. IV, these results suggests that structural modes of empirical matrices of cultural similarity are not due to authentic group structure, but to correlations between cultural features originating in arbitrary similarities between the questions or items composing the dataset. However, as discussed in Sec. VI, such a conclusion would be premature, implicitly relying on assumptions about cultural groups that might be too strong.

## Vi Discussion

This was the first study where empirical matrices of cultural similarity between individuals were analyzed from a random matrix perspective, allowing for a separation of structurally irrelevant eigenmodes from the structurally relevant ones. The statistical significance of the latter, here referred to as “structural modes”, was demonstrated in Sec. II, using a detailed numerical approach of explicitly sampling configurations from three null models. Among these three, the “restricted randomness” model, first proposed here, was concluded to be the most appropriate for later use. Restricted randomness enforces, in a flexible way, the non-uniformity inherent in each cultural feature, as this is assumed to be mostly a consequence of experimental design rather than a consequence of system-specific structure. As a consequence, this null model reproduces well the leading eigenvalue of the empirical matrix, which is interpreted as the “global mode”. By using this null model, meaningful empirical structure is implicitly defined via the inhomogeneities present in the cultural space distribution Babeanu_1 ; Babeanu_2 that cannot be expressed in terms of the feature-level inhomogeneities.

A central question for the rest of the study was whether the structural modes identified in Sec. II are just signatures of correlations between cultural features or, more interestingly, signatures of cultural groups. The former hypothesis goes along with the idea that some of the items in the questionnaire are similar to each other. The latter hypothesis goes along with the idea of coexistence, within the geographical region from which the empirical data was obtained, of several types of individuals, where each type could correspond, for instance, to a certain political affiliations, assuming that each affiliation comes a long with a certain set of values, opinions or beliefs. Even more interesting is the possibility that structural modes correspond to groups that form around cultural prototypes Stivala ; Babeanu_2 associated to a small number of universal “rationalities” or “ways of life” Thompson . This hypothesis has been shown to be compatible with some generic structural properties of culture, provided that prototype mixing is in place Babeanu_2 .

We approached this question by designing, in the simplest possible setting, two probabilistic toy models that implement the “correlations” scenario and the “groups” scenario (see Sec. III) named “FCI” (Sec. III.1) and “S2G” (Sec. III.2) respectively. These models and the associated scenarios are not mutually exclusive: the presence of groups does induce correlations, while correlations, if strong enough, can also induce an “impression” of groups. However, the FCI model is conceived such that only feature-feature couplings are enforced in a manner that does not introduce any unintended assumption, by means of a maximum-entropy approach Jaynes . This is meant to “simulate” an overal level of pairwise similarity between the questions of a hypothetical survey, assuming that the hypothetical system from which the answers are obtained is otherwise maximally random. Moreover, there is a non-vanishing correlation interval for which (under a certain, meaningful projection) the S2G model has a bimodal probability distribution (Sec. III.3), while the FCI model has a unimodal distribution. One can say that, for this interval, the group structure of S2G is manifested, while the feature-feature couplings of FCI do not yet create the impression of groups. The boundaries of this inteval are well defined, via the symmetry breaking phase transition of S2G on the low-correlation side and the one of FCI on the high-correlation side.

This correlation region is exploited (Sec. IV) for understanding how the presence or absence of groups becomes visible via spectral analysis. In both cases, there is one eigenvalue that becomes increasingly separated from the random bulk when increasing the level of correlations between features. However, this increasing trend is, up to statistical errors arising from finite sampling, exactly the same for the FCI and S2G models, even for the above-mentioned correlation region. This suggests that the presence of deviating eigenvalues in empirical data is not a certain signature of group structure. The difference between the two scenarios becomes visible if one calculates the uniformities of the eigenvectors by means of “eigenvector entropy” (inspired by Ref. Patil , where it is called “information entropy”). There is one eigenvector uniformity that, for an increasing level of correlations, becomes increasingly separated from the random bulk. This increasing trend is significantly different for the two models, while starting in an abrupt way and replicating well, for each model, the phase transition expected on theoretical grounds. Thus, for the interesting correlation region, S2G shows a deviating eigenvector uniformity, while FCI does not. This suggests that empirical eigenmodes corresponding to authentic groups should exhibit not only an eigenvalue that is significantly higher than the null model expectation, but also an eigenvector uniformity that is significantly higher than the null model expectation.

This motivated a more detailed investigation of empirical data in Sec. V, which showed that all empirical eigenvalues that are significantly higher than what can be expected based on restricted randomness are associated to eigenvector uniformities that are not significantly higher than what can be expected based on the same null model. This suggests that empirical deviating eigenvalues are signatures of correlations and not of group structure, since such correlations are known to be present, although to different extents and differently distributed in different datasets Babeanu_1 . One may even be tempted to reject the “cultural prototypes” hypothesis previously used in Refs. Stivala ; Babeanu_2 . However, Ref. Babeanu_2 clearly showed that this hypothesis is structurally compatible with empirical data only when prototype “mixing” is enforced, which means that the cultural vectors associated to different individuals are random combinations of the prototype vectors, although each vector is most often dominated by one of the prototypes. Since the S2G model used here to simulate group structure does not incorporate mixing, it is possible that group structure resulting from a “mixed prototypes” scenario is different enough to not exhibit eigenvector uniformities which are higher than expected based on the the null hypothesis.

Actually, the implementation of the “Mixed Prototype Generation” procedure of Ref. Babeanu_2 is able to generate, for many parameter choices, vectors that are arbitrarily similar to one of the prototypes, as well as vectors that are balanced combinations of the prototypes. If a modified S2G model incorporating such a mixing would be formulated, this would very likely be able to induce, in the language of Fig. 6, probability distributions that are wider than those of S2G, showing weaker decays when approaching the and the endpoints, while still different than those of FCI, for a given correlation level. These distributions might not even show a double-peak structure, and would likely preserve their shapes in the limit of – assuming that the interval is mapped to another interval of a constant length when increases – while the peaks of the S2G distributions become sharper with increasing

, due to the central limit theorem. It appears likely that cultural states sampled from such “mixed-S2G” distributions would only exhibit a subleading eigenvector uniformity that significantly deviates from null model expectations for correlation levels that are higher than those required by FCI: below that level, the vectors composing each of the two “groups” would have highly different levels of “centrality” within the group, leading to non-equal entries in the eigenvector capturing most of the structure. Certainly, such a mixed-S2G would come with a rather different meaning of “groups” and of “group structure” than that implicit in S2G and recognizable via eigenvector uniformity. One would also need to find a new, eigenvector-dependent quantity that is sensitive to this different type of group structure and that can be also used for empirical data. Such considerations are left for future research.

The fact that this study used multidimensional sociological data, while heavily relying on eigenvalue decomposition, may raise the question of how the approach here is different from traditional social science research using principal component analysis

Dunteman . Although principal component analysis is mathematically equivalent to eigenvalue decomposition, in a social science context, the former most often implies a decomposition of the matrix of covariances or correlations between the variables, while this study focuses on the matrix of similarities between individuals. This actually makes the approach here conceptually more similar to clustering methods Kaufman , which aim at identifying group structure, while providing an optimal clustering of the given set of individuals. However, these methods do not attempt to decompose the similarity matrix and remove the irrelevant eigenmodes. In fact, following the approach of Ref. MacMahon , the sum of the similarity matrix contributions associated to the structural modes identified here can be interpreted as a modified modularity matrix, which could provide a new method for clustering individuals via modularity maximization. Since this automatically eliminates the noise components and the common trend encoded in the global mode, such a method might be able to disentangle clusters that are not recognized by previous approaches. However, such a method might also be sensitive to false positive cluster splittings, due to structural modes possibly being artifacts of feature-feature correlations, as shown in this study (at this point, it is not clear whether this is also a problem for the method in Ref. MacMahon , intended for matrices of correlations between time series). These aspects are also left for future research.

## Vii Conclusion

This study examined cultural structure from a new angle, relying on certain notions from random matrix theory. This provided a filtering procedure for matrices of cultural similarity between individuals, which eliminates, in a statistically rigorous way, the structurally-irrelevant components. Much effort was dedicated to the interpretation of the remaining, structurally-relevant components. Two possible interpretations were formulated and quantitatively examined. On one hand, structural components may be a consequence of overlaps between cultural variables, mainly encoding information about the experimental setup. On the other hand, they may be a consequence of a modular organization of culture, thus encoding information about cultural groups. The analysis here favored the former scenario, but this may be a consequence of the latter scenario having been formalized in a manner that is too restrictive. More work is needed for entirely rejecting or accepting the possibility that culture has a modular structure.

Acknowledgements:

The author acknowledges insightful discussions with Diego Garlaschelli, Assaf Almog, Marco Verweij, Maroussia Favre, Santo Fortunato and Vincent Traag. This work was supported by the Netherlands Organization for Scientific Research (NWO/OCW).

## References

• [1] John Urry. The complexity turn. Theory, Culture & Society, 22(5):1–14, 2005.
• [2] David Lazer, Alex Pentland, Lada Adamic, Sinan Aral, Albert-László Barabási, Devon Brewer, Nicholas Christakis, Noshir Contractor, James Fowler, Myron Gutmann, Tony Jebara, Gary King, Michael Macy, Deb Roy, and Marshall Van Alstyne. Computational social science. Science, 323(5915):721–723, 2009.
• [3] Charles Kadushin. Understanding Social Networks: Theories, Concepts and Findings. Oxford University Press, 2012.
• [4] Claudio Castellano, Santo Fortunato, and Vittorio Loreto. Statistical physics of social dynamics. Rev. Mod. Phys., 81:591–646, May 2009.
• [5] Luca Valori, Francesco Picciolo, Agnes Allansdottir, and Diego Garlaschelli. Reconciling long-term cultural diversity and short-term collective social behavior. Proc. Natl. Acad. Sci., 109(4):1068–1073, 2012.
• [6] Alex Stivala, Garry Robins, Yoshihisa Kashima, and Michael Kirley. Ultrametric distribution of culture vectors in an extended Axelrod model of cultural dissemination. Sci. Rep., 4(4870), 2014.
• [7] Alexandru-Ionuţ Băbeanu, Leandros Talman, and Diego Garlaschelli. Signs of universality in the structure of culture. The European Physical Journal B, 90(12):237, Dec 2017.
• [8] Alexandru-Ionuţ Băbeanu and Diego Garlaschelli. Evidence for mixed rationalities in preference formation. Complexity, 2018, 2018. Article ID 3615476.
• [9] Alexandru-Ionuţ Băbeanu, Jorinde van de Vis, and Diego Garlaschelli. Ultrametricity increases the predictability of cultural dynamics. arXiv:1712.05959v1, 2017.
• [10] Robert Axelrod. The dissemination of culture. Journal of Conflict Resolution, 41(2):203–226, 1997.
• [11] Madan Lal Mehta. Random Matrices. Elsevier, 2012.
• [12] Alan Edelman and N. Raj Rao. Random matrix theory. Acta Numerica, 14:233–297, 2005.
• [13] Marc Potters, Jean-Philippe Bouchaud, and Laurent Laloux. Financial applications of random matrix theory: old laces and new pieces. Acta Phys. Pol. B, 36(9):2767–2784, 2005.
• [14] Mel MacMahon and Diego Garlaschelli. Community detection for correlation matrices. Phys. Rev. X, 5:021006, Apr 2015.
• [15] Assaf Almog, Ori Roethler, Renate Buijink, Stephan Michel, Johanna H Meijer, Jos H. T. Rohling, and Diego Garlaschelli. Uncovering functional brain signature via random matrix theory. arXiv:1708.07046v2, 2017.
• [16] V. A. Marchenko and L. A. Pastur. Distribution of eigenvalues for some sets of random matrices. Math. USSR-Sb., 1:457–483, 1967.
• [17] Aashay Patil and M. S. Santhanam. Random matrix approach to categorical data analysis. Phys. Rev. E, 92:032130, Sep 2015.
• [18] E. T. Jaynes. Information theory and statistical mechanics. Phys. Rev., 106:620–630, May 1957.
• [19] Louis Colonna-Romano, Harvey Gould, and W. Klein. Anomalous mean-field behavior of the fully connected ising model. Phys. Rev. E, 90:042111, Oct 2014.
• [20] Michael Thompson, Richard J. Ellis, and Aaron Wildavsky. Cultural Theory. Westview Press, 1990.
• [21] Karlheinz Reif and Anna Melich. Euro-barometer 38.1: Consumer protection and perceptions of science and technology, november 1992.
• [22] Tom W. Smith, Peter Marsden, Michael Hout, and Jibum Kim. General social surveys, 1993 ed. http://gss.norc.org/get-the-data/spss, 1972–2012.
• [23] Ken Goldberg, Theresa Roeder, Dhruv Gupta, and Chris Perkins. Eigentaste: A constant time collaborative filtering algorithm. Information Retrieval, 4(2):133–151, 2001.
• [24] K R W Jones. Entropy of random quantum states. Journal of Physics A: Mathematical and General, 23(23):L1247, 1990.
• [25] George H Dunteman. Principal components analysis. Sage Publications, 1989.
• [26] Leonard Kaufman and Peter J. Rousseeuw.

Finding groups in data: An Introduction to Cluster Analysis

.
John Wiley & Sons, 1990.

## Appendix A The fully-connected Ising (FCI) model

This section gives the details behind the mathematical expressions in Sec. III.1, which introduced the fully-connected Ising model. Deriving the probability distribution associated to this model follows the maximum-entropy approach introduced by Ref. [18]. This crucially relies on the Shannon entropy, which is a functional of the probability distribution:

 H[p]=−∑→Sp→Slogp→S, (15)

where denotes a generic spin configuration with spins on a fully-connected lattice, or a generic cultural vector with binary cultural features whose possible traits are marked as “” and “”. The value of the functional is maximized subject to two constraints, one related to the normalization of the probability distribution over the set of possible configurations:

 ∑→Sp→S=1, (16)

the other related to enforcing, on average, a certain amount of misalignment:

 ∑a

namely the average number of pairs of opposite traits within a given configuration , where the first summation is over all distinct pairs of distinct features (or lattice sites). The maximization is done using the Lagrange multipliers technique for Eqs. (15), (16), (17), which implies that one should find the extrema of the following functional:

 L[p]=H[p]−λ0⎛⎝∑→Sp→S−1⎞⎠−λ⎛⎝∑a

where and are free parameters associated to the two constraints. By taking partial derivatives of Eq. (18) with respect to each and further manipulations, one finds the following probability distribution:

 (19)

where is a normalization factor, known in statistical physics as the “partition function”:

 Z(−λ)=∑→Sexp[−λ∑a

where one can replace the coupling parameter with (whose positive value favors alignment as opposed to anti-alignment, which corresponds to ferromagnetism) and re-express the sum over configurations as a sequence of sums over the possible traits of each feature , leading to:

 Z(μ)=F∏k=1⎛⎝∑Sk=±1⎞⎠exp[μF−1∑a=1F∑b=a+1SaSb]. (21)

In the exponent of this expression, there are terms, out of which are equal to , while the other are equal to . Based on this, after further manipulations and after taking advantage of symmetries, the partition function can be expressed as as:

 Z(μ)=F∑F+=0F!F+!(F−F+)!exp[μ2((2F+−F)2−F)], (22)

where the combinatorial factor (binomial coefficient) before the exponential function counts the number of configurations with the same number of traits (the density of states). In a way rather analogous to the partition function, the double summation in the exponent of Eq. (19) can also be eliminated. After multiplication with the density of states, this leads to Eq. (7), which gives the probability of having a configuration with spins up.

On the other hand, using Eq. (20), Eq. (17) can be written as:

 K=−∂(log(Z(−λ)))∂λ=∂(log(Z(μ)))∂μ, (23)

while the correlation between features/spins and is:

 Cab=⟨SaSb⟩−⟨Sa⟩⟨Sb⟩√⟨S2a⟩−⟨Sa⟩2√⟨S2b⟩−⟨Sb⟩2, (24)

where is the expected value of quantity with respect to the statistical ensemble. However, one can easily show, using Eq. (22) that and that , so , which combined with Eq. (17) leads to . But due to symmetry, the expected correlation is the same for all pairs , so:

 Cab=C(μ,F)=2F(F−1)K==2F(F−1)∂(log(Z(μ)))∂μ, (25)

for any pair , which can also be written in the form shown by Eq. (8) – Eq. (23) was used for the last transformation in Eq. (25).

One should expect that (null correlations for null coupling), which based on Eq. (8), implies that the following identity holds:

 F∑F+=0(F−2)!((2F+−F)2−F)F+!(F−F+)!=0, (26)

which, after substitution of with and of with and some further manipulations leads to the following combinatorial identity:

 N∑k=0(Nk)((2k−N)2−N)=0 (27)

which can be shown to hold using the expressions for the binomial expansion and for the first and second moments of a binomial distribution with the probability parameter set to

.

## Appendix B The symmetric two-groups (S2G) model

This section provides the mathematical derivations of the important mathematical formulas related to the symmetric two-group model, introduced in Sec. III.2. The derivations are based on the model description there.

First, we proove Eq. (9). On one hand, the probability that a cultural vector meant to be part of group is assigned to a configuration with traits is:

 p++(ν,F,F+)=F!F+!(F−F+)!(1−2ν)F+(2ν)F−F+, (28)

which is a binomial distribution with probability for the possibility and for the possibility . On the other hand, the probability that a configuration meant to be part of group has traits is:

 p−+(ν,F,F+)=F!F+!(F−F+)!(2ν)F+(1−2ν)F−F+, (29)

which is the same binomial distribution, but with inverted probabilities. Since the two groups are by construction equally likely, the combined probability of all configurations with traits is:

 p(ν,F,F+)=12p++(ν,F,F+)+12p−+(ν,F,F+). (30)

Inserting Eq. (28) and Eq. (29) in Eq. (30) leads to Eq. (9).

Second, we proove Eq. (10). The correlation coefficient of any two features and is given by Eq. (24), which, for symmetry reasons similar to the case of the FCI model, simplifies to:

 Cab(ν)=∑→SSaSbp→S(ν)=C(ν). (31)

Moreover, the probability attached to any configuration can be written as:

 p→S(ν)=12(p−→S(ν)+p+→S(ν)), (32)

where and are the probabilities of configuration , conditional on whether it is generated for group or for group respectively. In turn, these probabilities can be factorized in terms of feature-level probabilities of traits:

 p−→S(ν)=F∏a=1p−Sa(ν),       p+→S(ν)=F∏a=1p+Sa(ν), (33)

because once the group is chosen, each trait (with possible values and ) is chosen independently at the level of the respective feature . By inserting Eq. (33) in Eq. (32) and the result in Eq. (31), by carrying out appropriate algebraic manipulations, while making use of the fact that and of the fact that , one obtains:

 C(ν)=12[p−−−(ν)−p−−+(ν)−p−+−(ν)+p−++(ν)]++12[p+−−(ν)−p+−+(ν)−p++−(ν)+p+++(ν)], (34)

where, for instance, is the probability that trait is chosen for one of the features and that trait is chosen for the other feature, conditional on the given configuration being generated for group . Based on the model description in Sec. III.2, one can see that:

 p−−−(ν)=p+++(ν) =(1−2ν)2, (35) p−++(ν)=p+−−(ν) =(2ν)2, (36) p−−+(ν)=p++−(ν) =(1−2ν)(2ν), (37) p−+−(ν)=p+−+(ν) =(2ν)(1−2ν). (38)

By plugging these in Eq. (34), after simple algebraic manipulations, one obtains Eq. 10.

## Appendix C The structure of the FCI and S2G models

This section shows that the structure implicit in cultural states generated with either the FCI or the S2G model is captured by only one eigenpair of the similarity matrix, so that there is at most one structural mode. Specifically, as the correlation level is increased for the FCI and the S2G models, there is only one eigenvalue – the subdominant eigenvalue – that becomes separated from the random bulk, while becoming significantly larger than the upper boundary of the bulk that is expected based on uniform randomness. The behavior of has already been presented in Fig. 8. The results shown here, via Fig. 13, are complementary to those shown in Fig. 8, which uses the same format, while focusing on the behavior of in Fig. 13(a) and on the behavior of in Fig. 13(b). Note that , associated to the global mode, remains statistically compatible with the null model as the level of correlation is increased, for both FCI and S2G. On the other hand, decreases, while becoming, for large enough , significantly smaller than the upper boundary of the bulk predicted by uniform randomness. All this shows that the structure FCI and S2G is mostly captured by the eigenpair of , which becomes increasingly stronger as the correlation level increases. This appears to be a consequence of the fact that each model is controlled by one parameter, while all the non-uniformity of the associate probability distribution is captured by one dimension, namely the axis of Fig. 6.