The Eurovision Song Contest is a popular TV show, held since 1956, that takes place every year with participants the countries member of the European Broadcasting Union. The competition has undergone several various modifications through years and the number of participants has increased, together with the popularity of the show. Since its beginning, countries had to express their preferences for the competing songs through a voting system; representatives vote only for the songs that meet their tastes. Despite that, many issues of bias in the voting system have been raised during the years (Yair, 1995). In the press and the literature, it has been claimed that many votes are not the expression of preferences for the songs, but of preferences for the performing countries themselves. Therefore, it has been claimed that the exchange of votes is not random but rather it is determined by some kind of similarity measure: the more two countries are close according to the unknown proximity measure, the more they will tend to vote for each other.
The exchange of votes in the Eurovision contest can be represented by means of network theory, where the countries are the nodes and the votes are recorded as edges. More specifically, within each edition, the data may be represented in the form of an adjacency matrix , with generic element if a representative of country votes for a song by a performer from the country and 0 otherwise, and indexes countries. Network data can be represented by means of graph theory. More formally, a network is thought to be the realization of a graph , where denotes the set of nodes and the set of edges. The number of observed nodes and edges observed will be denoted, respectively, by and . Generally, the law generating the connections among units is unknown and several different models have been proposed to describe such networks. Erdős & Rényi (1959) and Erdős & Rényi (1960) modelled arch formation in a network as arising from a random process: each dyad is independent and is associated to the same probability of forming a link. This first model was then generalized by other authors, both relaxing the assumption of constant edge probability over the network and the assumption of independence of the dyads. Holland & Leinhardt (1981) with models and kept the assumption of independence among the dyads but increased the number of parameters describing edge probabilities, to take into account the attractiveness of a node (the highest the value the highest the probability for this node to be connected with others) and the mutuality (the propensity of forming symmetric relations). The independence assumption on the dyads was then relaxed via the introduction of Markov graphs by Frank & Strauss (1986), attempting to model triangular relations in a network. Later on models or ERGMs (Exponential Random Graph models) have extended the work done by Frank & Strauss (1986) by the introduction of differnet summary statistics, see for example Krivitsky et al. (2009) and Robins et al. (2007). A different approach is the so-called stochastic block model, which attempts to decompose the nodes in different sub-groups, see Holland et al. (1983), Airoldi et al. (2008). In its basic formulation, nodes within a group have the same probability of forming edges, while this probability changes among groups. Hoff et al. (2002) added an extra layer of dependency to model the complex structure of network data, via the assumption that the observed edge formation process could have been explained in terms of the nodes’ coordinates in a (low-dimensional) latent space. Two different specifications are considered, the distance model, where the latent space is euclidean, and the projection model, where it is bilinear. The model by Hoff et al. (2002) has been extended to perform clustering on the latent coordinates of the nodes by Handcock et al. (2007). Another approach that makes use of latent variables has been proposed by Snijders & Nowicki (1997). This model formulates the stochastic block model by Holland et al. (1983) introducing latent variables in the determination of the nodes’ group memberships. A more exhaustive review of models for statistical network analysis can be found in Goldenberg et al. (2010), Salter‐Townshend et al. (2012) and Murphy (2015).
The works presented above refer to models for a single network, that is, in the present context, to the modelling of one single edition of the Eurovision Song Contest or a summary of several editions. If a group of subsequent editions of the Contest is considered, many replications of the adjacency matrices, representing the preferences expressed by countries towards others, are available. Therefore, the data can be represented in the form of a multidimensional network (or multiplex), where is the realization of a collection of graphs , indexing editions. The generic graph has the same set of nodes as the others graphs in the collection (the participants to the group of editions of interest), but potentially different set of edges (the different preferences expressed in each edition). Hence, a multidimensional networks describes different (independent) realizations of a relation among the same group of nodes. Different models have been developed to deal with this kind of data. Fienberg et al. (1985) adapted a log linear model to the context of multiplex data. Greene & Cunningham (2013) proposed to summarize the information coming from all the different networks (views) aggregating them into a single one. Gollini & Murphy (2016) extended the model by Hoff et al. (2002)
to multiplex data, assuming that the edge probabilities are explained by a single latent variable. To estimate the joint latent space, they use a variational Bayesian algorithm and decompose the posterior distribution, fitting a different latent space to each network. Then, the separate estimates are employed to recover the joint latent space.Salter-Townshend & McCormick (2017) proposed a method to jointly model the structure within a network and the correlation among networks via a Multivariate Bernoulli model. Another approach developed to describe the (marginal) correlation among different networks in a multiplex has been proposed by Butts & Carley (2005). Hoff (2011) proposed to model multiplex data as multiway arrays and applied low-rank factorization to infer the underlying structure. Durante et al. (2016) proposed a Bayesian non-parametric approach to latent space modelling, where clustering is performed on the latent space dimensions in order to discriminate the most relevant ones for each view.
The present work aims at recovering the similarities among countries, modelling the exchange of votes during several editions of the Eurovision Song Contest. We adopt a framework similar to that of Gollini & Murphy (2016) and we consider the projection of the countries into a common latent space. Similarities among countries are then expressed in terms of distances in this latent space. We introduce network-specific coefficient parameters to weight the relevance of the latent space in the determination of edge probabilities in each network. We consider the editions that took place after the introduction of the televoting system and focus on the period 1998 to 2015. Further, we consider geographical and cultural covariates in the analysis.
The paper is organised as follows. Section 2 summarizes the history of Eurovision Song Contest together with the principal related works on the subject (section 2.1) and presents the analysed data (section 2.2). Latent space models for network data are introduce in section 3 and the proposed model is outlined in section 3.1. Model estimation is discussed in section 4. Further issues are discussed in section 5, such as model identifiability (section 5.1), missing data (section 5.2) and the introduction of edge-specific covariates (section 5.3). The application is presented in section 6 and the results are displayed in section 7. The simulation study is outlined in section 8, where also the main findings are reported. Section 9 presents the results of a comparison between the proposed model and the lsjm (Gollini & Murphy, 2016). We conclude discussing the model and its application on the Eurovision data in section 10.
2 Eurovision Song Contest
2.1 History of the contest and previous works on the subject
The Eurovision Song Contest, held since 1956, is a TV singing competition where the participants are countries members of the EBU (European Broadcasting Union). Despite its name, the European Broadcasting Union includes both European and non European countries. Indeed, Eurovision’s fame has spread all over the world during the last years and it has been broadcast from South America to Australia. It is the non-sportive TV program with the highest number of viewers in the world and one of the oldest ones (Lynch, 2015).
From its first edition, where only seven countries competed, there have been many changes in the number of participants, the voting system and the structure of the competition. Due to the increasing popularity of the program, many countries have been included in the contest. The current structure of the contest consists of two preliminary stages used to select the finalists; then, the selected countries compete in the final stage for the title. The voting system has been modified several times, in the voting procedure and the grading scheme. In the early years of the competition, a jury elected the winning song. Later, the system has been supported by televoting, introduced in 1998 in all the competing countries. As for the grading scheme, it is positional since 1962, but the method used to rank the countries has been modified across the different editions. From 1975 to 2015, each country had to express its top ten preferences ranking them from the most to the least favourite using the following scores: 12, 10, 8, 7, 6, 5, 4, 3, 2, 1. Each country had to vote exactly ten others, could not vote for itself and each grade could be used only once. At the end, the country receiving the highest overall score would have won the competition. A restriction has been imposed on the lyrics in the past, as the participants were required to perform a song written in their national language. However, this rule was definitively abolished after 1998.
Every year the singer and the song representing a country both change, making each edition of the Eurovision independent from the previous one. Indeed, the structure of the competition is built in such a way that the past results will not influence the future performances. Countries should vote only according to their tastes and, as musical evaluation has no objective criteria, the voting results should not depend on the countries themselves, but only on the songs. However, this claim was often doubted, especially after the introduction of televoting. Several issues have been raised on the voting system, which was said to be biased. The first paper investigating the presence of bias in the voting system is Yair (1995). This work considers voting relations among 22 of the 24 countries competing in the period 1975 to 1992 and claims that they can be clustered in three regional blocks, according to their voting preferences: Mediterranean, Western and Northern. Countries tend to vote for the others in the same block, hence following a non-objective (non-democratic) behaviour. Nevertheless, the paper does not provide an in-depth statistical evaluation of the results. The author supports the theory that the geographic location of a country influences its voting behaviour; this assumption has been further investigated by Fenn et al. (2006). In this work, the dynamic evolution of votes exchanged in the competition 1992 to 2003 has been analysed, with the aim at looking for sub-groups of countries; however, the sub-groups found are not fully explained by the geographical positions of the countries. Clerides & Stengos (2006) developed an econometric framework to analyse the data and arrive to similar conclusions, in the sense that the authors do not find any "strategic" vote exchange in the period 1981 to 2005. Saavedraa et al. (2007) investigated the structural properties of the dynamic network related to the period 1984-2003 via q-analysis and found that clustering arises mainly between countries closed to each other in a geographical sense. Spierdijk & Vellekoop (2006) applied multilevel models to look for the influence of geographic and cultural factors in the vote exchange from 1975 to 2003 and found that these do not explain the behaviour of all the competing countries. Ginsburgh & Noury (2008) claim that having a similar culture may influence the votes expressed by a country. Cultural proximity, as well as geographic proximity and migration flows in the period 1998-2012 have been investigated as sources of bias in the paper by Blangiardo & Baio (2014). The authors discovered the presence of a mild positive bias among few couples of countries but no evidence of a negative bias overall. Mantzaris et al. (2017) analysed the editions 1975 to 2005 searching for couples of countries exhibiting preferential voting. They investigate the hypothesis of random allocation in the votes. The authors found evidence that geographic proximity is influential up to some extent, however, many countries do not tend to vote according to this rule.
The aforementioned works show that there has been a growing interest in the structure underlying the exchanging of votes in the Eurovision Song Contest in the past twenty years. The authors investigate the influence of social, geographical, cultural and political factors on the mechanism forming preferences and agree that, at least to some extent, these components are relevant. However, none of the factors above is able to explain completely the votes exchanged in the competition in the last years.
The assumption that the exchange of votes is driven by similarities among countries may be a reasonable hypothesis. However, similarities among countries might not coincide with social, geographical, cultural and political factors. In fact, there could be some unobservable (latent) factors influencing such process. The aim of this work is at recovering the underlying latent similarities among countries. We focus on years subsequent to the introduction of the televoting system, 1998 to 2015. In doing so, we assume that the televoting preferences reflect the preferences of the whole population, and that they are more representative when compared to the opinions of the jury alone. We do not consider years 2016 and 2017 because the voting system was modified in that period. In fact, in the period from 1998 to 2015, votes given by the jury and by televoting were jointly considered: the final top ten for each country was determined looking at the intersection of the most voted songs by the two sources. Instead, from 2016 onwards, the final preferences expressed by each country are given by the union of the ten favourites of the jury and the ten favourites of the televoting. Hence, in the last two years of the competition, each country could elicit more than ten preferences. In the period 1998-2015, the countries were allowed to sing in any language and most of the songs were in English. The period we consider is homogeneous both with respect to the voting system and (for the large part) to the language used in the performance.
In each edition, the votes exchanged among the countries can be described by a network, with nodes being the countries. As we are interested in the exchange of votes and not in the ranking, an edge going from country to country will denote that has in its top ten. Vice versa, the edge will be from to if the latter has been voted by the . Indeed, recall that altough the grading scheme is positional, each country can express a limited amount of preferences , with in the analysed period. Rosen (1972) proved that when sampling units from a population of size 111In the present context, is the number of countries that country can potentially vote., if , the first units are independent with respect to the extraction order. That is, the ranking might not be relevant.
The resulting network is then directed, acyclic (as countries can not vote for themselves) and un-weighted. If we consider a group of editions for the contest, we will have a collection of networks, defined on the same group of countries (see paragraph 5.2), and this object is indeed a multidimensional network. The following section introduces a more general framework to analyse such data.
3 Latent space model for multidimensional networks
Latent space models have been introduced by Hoff et al. (2002) with the aim at reducing the complexity typical to the dependence structure observed in network data. This purpose is achieved via geometric projection of the nodes into a low dimensional space. The probability of observing an edge between two nodes is assumed to be dependent on some function of the unknown nodes’ coordinates in the latent space. Conditionally on the set of latent positions, the observed binary indicators are assumed to be independent. Hoff et al. (2002) distinguish between distance and projection latent space models, depending on the choice of the function summarizing the latent coordinates. The distance model assumes that the function employed to describe the dependence of the edge probabilities on the latent coordinates is indeed a distance function. Gollini & Murphy (2016) extended the distance model described by Hoff et al. (2002) to the case of multiplex data, with the introduction of the Latent Space Joint Model (lsjm in the following). The authors assumed that each network depends on a specific latent space and a network specific intercept. Moreover, the latent spaces (one for each network) are thought to be generated from a common latent space, which captures the average latent coordinates of the nodes and is behind all the networks. The variational approach used to carry out the estimation of the model is fast but suffers from computational issues when the dimension of the multiplex is relatively big, either in the number of nodes or in the number of networks .
The present work builds on the model of Gollini & Murphy (2016) with the aim at recovering the similarities among the countries participating in the Eurovision Song Contest. Notice that subsequent editions of the Eurovision are assumed not to depend on one another, as the singer and the song performed change every year without any pre-determined criteria. This assumption allows us to consider the exchange of votes in different editions of the contest as replications of the same phenomenon, which is the expression of musical appreciation between couples of countries. These different replications of preferences among countries will then be used to recover the similarities. Indeed, the basic assumption is that the more two countries are similar, the more they tend to vote for one another through the editions. As our interest lies in recovering such similarities, the choice of a distance based on a latent space model to reconstruct the network is quite appropriated. In fact, distances correspond to symmetric relations, which is a characteristic for similarity measure. Figure 1 shows an example for the latent space representation of a nodes network. Node in space has been moved in space , so that and . In our model, this correspond to a higher probability to observe a link between node and (or node and ) in space when compared to space .
The distances in the latent space are scaled by a network specific coefficient, in order to weight the influence of the latent space in the determination of the votes expressed in a given edition of the Contest. The lowest the value of this coefficient, the more the structure of edge probabilities resembles a random graph This could lead to reject the claim that the pattern observed in votes for the Eurovision contest is biased by some pre-existing preferences among countries.
The latent space is taken to have dimension , to allow graphical visualization and to compare the estimates of latent coordinates and the geographical positions (latitude and longitude) of the analysed countries. In this way we might be able to tell whether the latent configuration obtained resembles the geographical one and, if so, we may conclude that the position of a country on the map is indeed of some relevance in the competition. However, the choice of is still an open problem in the literature and for other applications there might not be a unique criterion for choice. As in Gollini & Murphy (2016), the distance function is taken to be the squared Euclidean distance function, to penalize more the probability of an edge linking nodes that are far apart in the latent space when compared to edge linking closer nodes.
The model is formulated in section 3.1 and also allows for the introduction of edge-specific covariates. In the present context, we will make use of cultural covariates, such as presence of a common language among a couple of participants, to see whether any cultural factors contribute to the formation of preferences. The multidimensional network is defined on all the countries that took part at least once in the Eurovision for the period 1998-2015 (see section 5.2 for a more detailed explanation).
The set or collection of adjacency matrices will be denoted by . The generic element of each matrix in the collection is binary, with if there is an edge from node to node in the network, else. Indexes are used to denote nodes in the network (countries) and index refers to the different networks in the multiplex.
3.1 The proposed model
Given the assumptions made in the section 3 and following Hoff et al. (2002) and Gollini & Murphy (2016), the probability of observing an edge between node and node in the network of the multiplex is given by:
where is the set of model parameters, with , . For ease of notation, we will denote .
Following Gollini & Murphy (2016), the function is taken to be the squared Euclidean distance, hence . The matrix of the distances among the nodes will be denoted by . This choice of the distance function allows to penalize more heavily the probability of an edge linking two nodes that are far apart in the latent space when compared to the one linking two closer nodes. Therefore, the latent space part of the model will push towards a non-random structure for the matrix of edge probabilities.
The parameters and , with , are network-specific. is a coefficient scaling the weight/impact of the latent space in the determination of the edge probabilities in the network. According to the assumption that the probability of observing an edge decreases with growing distances, the constraint must be imposed. As the coefficient is bounded and can not take negative values, it follows that the lowest the value of , the closer to a random graph the structure of the network will be. In fact, if , then:
and the model for the network reduces to a random graph (Erdős & Rényi, 1959) with edge probability . Thus, the coefficient induces sparsity in the graph and, when it is non-zero, edge probabilities are bounded by . To counterbalance the effect of the coefficient, the intercept parameter is defined so that the graph corresponding to is not disconnected. Indeed, if , then the graph will almost surely be connected, from the properties of random graphs (Erdős & Rényi, 1960). Taking , as , this property can be expressed in terms of as:
The last equality comes from the node set being constant across the multidimensional network. Defining a lower bound prevents from assigning large negative values to the intercept parameters. Indeed, if is too low, the effect of the latent distances would be dominated by the intercept parameter, even if this effect is relevant (). That is, large negative values of in the argument of the exponential in equation 1 would correspond to edge probabilities tending to and numerically undistinguishable. In such case, two dstinct matrices of edge probabilities having elements would lead numerically to the same likelihood.
4 Parameter estimation
4.1 Likelihood and posterior
Given the edge probability function defined in equation 1, the likelihood function for the model is a product of terms:
the corresponding log likelihood is:
As the matrices of edge probabilities are symmetric in all the analysed networks, one could equivalently consider only their upper or lower triangular part, and the number of terms to be considered in the product for the likelihood reduces to .
, we adopt a Bayesian approach to estimate the model. The latent coordinates are assumed to be independent random variables distributed according
-variate Gaussian distribution, as inGollini & Murphy (2016):
In the present context, the dimension of the multivariate Gaussian is fixed to . Indeed, the aim is to recover latent coordinates that, for each country, could be compared with the actual geographical ones. However, in other applications the selection of the dimension of the latent space could be a relevant issue. The determination of is a model selection problem and it is quite a debated topic in the literature.
The parameter space for both the intercepts and the coefficients is bounded, as described in paragraph 3.1. For this reason, the prior distributions for these parameters are described by truncated Gaussian distributions. As no a priori information is available on their relationship, they are assumed to be independent, both inside and across the networks:
The unknown are nuisance parameters. Indeed, their value is of no interest but their specification is relevant, as they determine the solutions for the parameters of interest, and . Given their relevant role in the model, these parameters will be estimated, to avoid subjective specifications of their values. To estimate such parameters, an extra layer of dependence is introduced in the model, as described in figure 2; that is, the model has a hierarchical structure. The prior distributions specified for the nuisance parameters are:
The distributions for the nuisance parameters depend on a set of hyperparameters, that have to be specified. However, their choice is not as influential as the one of the nuisance parameters for the estimation of and ; in the following, we will present some criteria for the determination of that were found to work well in practice.
The posterior distribution is therefore defined as:
The posterior distributions for the parameters and for the latent coordinates are not available in closed form. To estimate these parameters, proposal distributions have been developed and are presented in appendix A, together with the distributions of the nuisance parameters.
4.2 Algorithm for model estimation
Estimation of model parameters estimation is carried out using Markov Chain Monte Carlo. A detailed specification of the full conditional and the proposal distributions implemented can be found in the appendixA. Within each iteration of the chain, the nuisance parameters are updated from the corresponding full conditional; updated estimates for the intercepts, the coefficients and the latent coordinates are then proposed. The updates for the intercept and the coefficient in each network is jointly carried out, as it was empirically found that they may be correlated. The joint update helps improving the speed of convergence for these parameters, while the latent coordinates are updated separately and sequentially. Indeed, it could be the case that the current estimates for a subset of the latent positions have already converged, while the remaining ’s are still far from the "true" values. Updating the whole block jointly would not respond to the need to adjust just the mislocated coordinates; hence, updating them separately has been found to be a better strategy. After the set of latent coordinates has been updated at a given iteration of the algorithm, it has to be compared with the set of estimates obtained at the previous iteration. That is, since the likelihood in equation 2 considers the distances between the latent coordinates, it is invariant to rotation or translation of the latent positions . Then, it has to be ensured that the current set is not a "rigid" transformation of the previous ones, to prevent from non-optimal stationary solutions for the latent coordinates. To achieve such aim, a Procrustes (Dryden & Mardia, 1999) test is performed and the current configuration is discharged if the correlation with the previous one is above a certain threshold, fixed to be . The choice of this value, that ranges in is arbitrary and should reflect the presence of high correlation. Values of the threshold above have been found to work well in practice. Appendix E reports the pseudo-code describing the estimation procedure.
As for the initialization, the set of hyperparametersand are fixed to be , as values in the range have been tried and it has been found that the different specifications do not have substantial impact on the parameter estimates. The variance-scale hyperparameters are set to be , so that the means of the proposal distributions for and reduce to the corrected sample means (see appendix A).
Starting values for the distances are taken to be the geodesic distances between the nodes in a randomly chosen network of the multiplex. From these distances, given a fixed , starting values for the latent coordinates are computed via multidimensional scaling as in Hoff et al. (2002)
and the starting values for the distances are obtained taking the squared Euclidean distances between the starting values for the latent coordinates. These are then used to model, via logistic regression, the adjacency matrices. The estimated intercepts and coefficients are taken to be the starting values forand . If these starting values fall outside the bounds specified for the parameters, they are replaced with these bounds. The nuisance parameters and are initialized as the sample means of the initial estimates for and , respectively. In a similar fashion, and are initialised as the sample variances of and .
5 Further issues
As it can be easily noticed, the likelihood in equation (2
) is invariant to linear transformations of the coefficient parameters. Indeed, for some constant,
For this reason, the coefficient for the reference network is fixed to (index denotes the reference network). The role of the parameters is to rescale the distances and, therefore, the corresponding values are meaningful only when compared with each other and the reference. Thus, there is no loss of information in fixing ; however a further identifiability issue is:
To overcome this second issue, the intercept for the same network needs to be fixed. As the intercept term defines an upper bound for the edge probabilities in the corresponding network, the value chosen for should not underestimate this bound. We propose to fix it accordingly to the observed density in the network. More specifically, let us consider the expected value for the edge probability in network :
A naive empirical approximation to this value, which is not available in closed form, is given by
where the distances have been replaced by the constant as this value is the mean empirical distance among coordinates simulated form a Gaussian distribution. This leads to:
where with . Thus, , or any number greater than , can be taken as the value for the intercept in the reference network.
A network that is of particular interest can be taken as the reference network or, alternatively, if there is no reason to prefer a network, a randomly chosen one can be selected. In the present work, for ease of interpretation of the results, the first network of the multiplex has been set as the reference network.
5.2 The issue of absent countries
Due to the increasing popularity that Eurovision gained over the years, many countries have requested to participate in the contest and have been accepted. With the increasing number of participants, preliminary stages had to be introduced to select a smaller sub-group of countries accessing to the final, where they could compete for the title. The winner of the previous edition enters the final straightforwardly, while the remaining countries have to compete in the qualifications round. Therefore, the selection of the finalists in the edition does not depend on the results in the previous edition, apart from the specific case mentioned above. With the introduction of semi-finals, countries that do not make it to the last stage are allowed to vote for their favourite ten participants in the final; therefore, their participation is passive, as they can vote but can not receive votes. Last, several countries have abandoned the competition for years, for a series of particular reasons. The pre-selection process, together with the presence of passive countries and the drop outs imply that the set of participants in two consecutive editions may not be exactly the same; however, the phenomena above are structural and for that reason we decided not to treat them as a missing values problem.
To model absent countries, we will define the set of nodes in a more general way, as the set of countries that have voted at least once in the considered period. We can then rewrite the log-likelihood introduced in section 4.1 as:
where is an indicator variable, with if the node was present in the edition and could have voted for node ; while implies that the node was not allowed to vote for node in the edition. Let us denote as the binary matrix indicating whether or not a country was present in the edition. The rows of the matrix denote whether the corresponding countries may vote at the occasion; more formally, if , then the node was absent from that edition of the contest. Instead, the columns of refer to the possibility of being voted for the corresponding countries. That is, if the node has been voted in the edition.
Edge-specific covariates can be considered in the application. All the covariates used do not depend on the specific network of the multiplex, as they are constant over the editions; therefore, the effect on edge probabilities is assumed to be constant over the networks. Each covariate is stored in a matrix, that will be denoted by , where is the index for the set of covariates. To maintain the characterization of the intercept term, the effect of the covariates is taken to be inversely related to edge probabilities (see section 6). Therefore, the scaling coefficient applied to each matrix of covariates will be characterized in similar fashion as the coefficients , (see appendix A). The edge probability in equation (1) can be modified in presence of covariates to the following:
The proposal distribution used to update the parameters is derived in appendix A, where we briefly explain how to modify the proposal distributions for the other parameters when considering covariates in the model.
6 The Eurovision song contest data
The period we considered for this analysis covers 18 different editions of the Eurovison Song Contest. The editions took place after the introduction of televoting, from 1998 to 2015. The last two years of the show have been discarded from the analysis, as the voting system changed in 2016 (see section 2). The mechanism underlying the generation of the votes has changed, which means that the measure used to quantify the appreciation of a country for another has been modified. Nowadays countries can in fact express from ten to twenty preferences, depending on the degree of overlapping among the tastes of the jury and those of the public voting at home.
In the interval we considered two major changes have occurred in the structure of the program, due to the growing number of countries willing to participate in the show. First, in 2004, a semi-final stage was added to select participants. In 2008, after the 50th anniversary of the competition, the event was rebuilt and two semi final stages were introduced. countries that participate to the semi-finals are entitled to vote in the final, even if they have not passed the selection stage. Of course, the songs that do not go to the final can not compete for the title and can not receive any vote in the final. The voting structure in the final induced by the introduction of qualifying stages is modelled by the auxiliary variables , defined in section 5.2. During the period 1998-2015, a total of countries took part to the competition and each year. After 2004, on average countries were completely absent(not voting nor competing). A list of the countries and their ISO3 codes is given in appendix B.
Figures 4 describe some features of the 18 networks. In particular, in 4(a) we give an overview of the countries’ participation per year, distinguishing the role that each country had in a given edition: absent (A), present but can not be voted (Pa) or fully present (Pp). It is easy to see from the plot that some countries, such as UK or France, have been constantly present to the competition, as the while others had only made some sporadic appearances. Monaco for example competed from to 2004 to 2006, but never made it to the final.
Figure 4(b) reports the values for the association222 The association for the generic couple of editions is defined as:
At a second stage, covariates have been included in the analysis, similarly to what has been done, among others, by Blangiardo & Baio (2014), Spierdijk & Vellekoop (2006), in order to see whether the exchanges of votes in the period could be partly explained by "cultural" factors. The covariates we considered are listed below:
the log geographic distance between two countries. These distances were computed using the coordinates of the centroids of each country, obtained from https://developers.google.com/public-data/docs/canonical/countries_csv. The centroids have been estimated considering latitude and longitude of the main cities of each country, and the log geographic distances are stored in an matrix denoted by ;
the presence of a border common to a couple of countries. To maintain the characterization of the intercept, this information is coded as a binary variable that takes valueif there is a common border and otherwise, so that there is a negative relation between covariate and edge probabilities. This same reason is behind the definition of the other following covariates. The common border covariate is stored in an matrix denoted by ;
the fact that two countries share an official language. This information is coded as a binary variable that takes value if they share the official language and otherwise, and it is stored in an matrix denoted by ;
the fact that two countries share a major language, defined as a language spoken at least by of the population. This information is coded as a binary variable that takes value if two countries share a major language and if not, and it is stored in an matrix denoted by ;
the presence of a common past "history" shared by two countries (they were colonized by the same country, they were the same country, etc.).This information is coded as a binary variable that takes value if two countries share a common past and otherwise, and it is stored in an matrix denoted by .
The figures in 3 describe the association between the covariates and the adjacency matrices. The plot in 3(a) displays the association333The associations are computed as
Last, two sub-periods will be analysed separately, to check for large changes in the latent space position of a country according to the analysed years. That is, the analysis of the two sub-periods would give an idea on the stability of the average coordinates in the latent space recovered for the fill interval 1998-2015.
The following models have been considered in the analysis
Model 1: covariates not included;
Model 2: covariates included;
Model 3: log-geographic distance included ();
Model 4: information on shared borders included ();
Model 5: no latent space ( ) and covariate included;
Model 6: random graph model (no covariates and ).
Models 5 and 6 have been estimated to test for the need to consider an additional layer in the modelling structure (the latent space) for these data. Each model was estimated running the MCMC algorithm for 50000 iterations and with a burn of 5000 iterations. The intercept parameter in the first network was fixed to , as 444The intercept in the reference network is fixed as
where and . The deviance term is computed with the posterior estimates, while is the mean deviance of the posterior distribution. The best model is determined to be the one associated to the lowest value of the DIC; the values are reported in table 1.
|Model 1||Model 2||Model 3||Model 4||Model 5||Model 6|
Model 4 is found to be the best one, including both the latent space and the information on a shared border. The hypothesis of a random mechanism determining the exchange of votes can then be discarded in favour of a more complex solution, where similarities among countries, described by distances in a latent space, play a role in the formation process for observed preferences. Indeed, under Model 4, the coefficient estimates for the latent space distances are quite high in each edition, see table 2.
Estimated averages and standard deviations for the network parameters in the multiplex 1998-2015.
The covariate seems to be the only relevant one in the analysis. Indeed, the estimated coefficients associated with the other sets of edge covariates in model 2 where all close to , which supports the model choice obtained with the deviance information criterion. The procrustes correlations among the latent space estimated in model 1 and the one estimated in model 2 is quite high, namely . That is because the matrix of covariates is quite dense and acts like a fixed effect on the edge probabilities, with a decreasing effect each time there is no common border between two countries. Therefore, the introduction of the set of covariates seems not to have direct effect on the distances between the nodes. The mean of the posterior estimate of the effect associated with the border covariates is with a standard deviation of .
Figures 5, 6 and 7 show the estimates obtained in model 2 for the latent positions, the distances and the posterior distributions for the parameters of interest. Figure (a) reports the posterior means of the estimates for the country latent coordinates (reported with their ISO3 codes, see appendix B) together with the corresponding standard deviations.
Note that the model does not necessarily place in the center of the latent space those countries that have been most successful throughout the editions. Indeed, for example, Sweden and Denmark won the largest number of editions in the period 1998-2015, respectively and titles. However, Denmark is not in the middle of the space, but it is rather placed close to a group of countries from northern Europe. If we look at a specific country, its neighbours on the latent space are the countries that were estimated to be more similar in tastes, expressed in terms of voting patterns.
The latent space presents a number of denser sub-groups of locations, that partly resemble northern Europe, eastern Europe and North Eastern Europe. However, these subgroups are not completely faithful to the geographic locations of countries, as, for example, Spain is closer to Romania than to Portugal or France. Nevertheless, the population of Romanians in Spain defines one of the major immigrant group in the country as well as Latvians is one of the largest immigrant group in Ireland. Therefore, some of the geographical misplacements within the dense sub-groups in the latent space may be explained in terms of migration flows. What is certain is that geographical locations are not able to fully explain the observed patterns of votes.
Indeed, figures 6 and 8 show the presence of large differences between the estimated latent distances and the geographical ones. In particular, for a given country , the rows in the matrices of figure 8 represent the intersection between its nearest neighbours in the latent space (we denote this set as ) and its closer neighbours in terms of geographical distances (); we consider the values . Given the number of neighbours , the average555the average number of common neighbours is given by:
The estimated values for the network intercept parameters are quite similar for the different networks corresponding to the editions in the period 1998-2015 (figure 7). Indeed, the voting rule (in the Eurovision song contest) for that period required the participating countries to vote for exactly 10 others, which implied a fixed outdegree for each node in the corresponding networks. The observed densities are then quite similar and this is reflected in similar estimates for the parameters, which define the upper bound for the edge probabilities in a given network.
Appendix C reports the results for the analysis of the two sub-periods 1998-2007 and 2008-2015. The model considered for the sub-periods does not include any covariate, as the interest lays primarily in recovering the latent coordinates. The findings confirmed a weak correspondence between the estimated latent position of a country and its actual latitude and longitude coordinates on the globe. A direct comparison of the latent space in the period 1998-2015 and the ones for the periods 1998-2007 and 2008-2015 is not available, as the number of countries is different. Indeed, not all the participants in the period 1998-2015 were also participant in both of the sub-periods. Just to give an example, Italy rejoined the competition in 2011, after being absent in the period 1998-2007. Although some of the distances vary with respect to those in the longer period 1998-2015, the sub-groups observed in figure 5 are still present. Indeed, for example, Northern Europe countries tend always to be closer to each other, as well as Eastern Europe countries do.
8 A simulation study
Different simulations scenarios have been considered to test the proposed model. In particular, simulations have been exploited to assess the large-sample behaviour of the parameters estimates, when the dimension of the multiplex is large, and to verify the robustness in the latent coordinates estimates with respect to different underlying distributions. In all the different scenarios, the reference network was taken to be the first of the multiplex and the reference parameters have been fixed to and (the value of the intercept in the reference network in the application). The intercept and the coefficient parameters have been simulated from their prior distributions (see section 4.1), with , . Four simulation scenarios have been defined and divided in two blocks:
Block I This block has been built to verify the large-sample behaviour and the robustness of the estimates when the distribution of the latent coordinates is modified. Indeed, not all the observed data might be well described by latent Gaussian coordinates. Within each scenario in the block, we have considered types of multidimensional networks, with relatively small dimension of but increasing number of nodes:
The values for the coefficients and the intercepts are constants in scenarios I-III, conditionally on the type of multiplex considered.
: the latent coordinates have been simulated from a bivariate normal distribution (the prior distribution specified in the model).
Scenario II: the latent coordinates have been simulated from a mixture of bivariate normal distributions, where the number of components was set is
. The mean vector for each group has been simulated from a standard bivariate normal distribution and the covariance matrices are diagonal with elements randomly sampled in the interval. This scenario corresponds to the case of data representing different kind of relations among separated groups of nodes or communities. Indeed, the probability for node in group to link with node will be higher if then if ;
Scenario III: the latent coordinates have been simulated from a standard bivariate Hotelling’s T-squared distribution with degree of freedom. This scenario allows for some nodes to be located far from the center in the latent space. Thus, this case reflects the presence of inactive or semi-inactive nodes in the network, that tend to interact poorly with the rest not forming many edges;
Block II This block has been built to test the large-sample properties of the estimates when the number of networks is large. That is, the case of the application considered in the present work.
Scenario IV: the latent coordinates are simulated according to the prior distribution specified in section 4.1. Three different multidimensional networks have been simulated:
In all the considered scenarios, we have treated both the case where all the nodes are present in each network (denoted by ) and the case where some of the nodes are absent in some networks (denoted by ) (see section 5.2). In the second case, the missing data process resembles the one observed in the Eurovision data with respect to the average number of absent nodes per network and the number of absences for each node. The reference network was taken to be the first of the multiplex and the corresponding parameters have been fixed to and (the value in the application). The latent coordinates, the intercept and the coefficient parameters have been simulated from their prior distributions (see section 4.1), with , .
To estimate the model on the simulated data (Block I and Block II), we fixed , , and (see section 4.2 for details). Each model was estimated times, performing mcmc iterations and discarding the first . The parameter estimates were consistent with the simulated values in all different scenarios and the estimates of the coordinates have been found to be robust to the different specifications of the distribution for the latent positions. In appendix D we present in details the results for the different scenarios. Boxplots and tables with mean and standard deviations for the parameter estimates are presented, as well as mean and standard deviation of the procrustes correlation between the estimated and the simulated latent space coordinates. Overall, the proposed method returns reliable estimates for the true parameter values. The simulated values fall within the credible interval built on the posterior distributions, with a couple of exceptions that occur when the number of networks increases. However, in these cases, the true magnitude of the value is always recovered, as the estimates are still quite close to the actual simulated values. The simulated latent spaces are always recovered with high correlation, even in the stressed scenarios I and II. There is no substantial difference in estimates for cases and . Thus, the absence of some of the nodes in some networks of the multiplex does not impact the estimation of the latent positions.
9 Comparison with the lsjm model
The model proposed in the present work is similar to the latent space joint model (lsjm) proposed by Gollini & Murphy (2016), which considers a similar specification for the edge probability (1), with the difference that no coefficient term is associated to the distances, and these are network-specific. The model by Gollini & Murphy (2016) also considers a mean latent space, which originated the network-specific latent coordinates. To compare the two models, we have simulated different types of multiplex, all constructed fixing for , that is, according to the lsjm. The simulated multidimensional networks come from the following scenarios:
A sparse multiplex with , and ,
A multiplex with , and ,
A small but denser multiplex with , and .
Simulating from the model presented in this work (which will be referred to as lsmmn) when all the coefficients are fixed to corresponds to simulating from the latent space joint model when all the network-latent spaces are the same. Therefore, in the present setting, the two models can be compared. Both the lsjm and the lsmmn models have been estimated 10 times for each of the simulated multidimensional networks. In appendix D we report the results of such a comparison. Overall, the model we propose outperforms the lsjm both in the quality of parameter estimates and in recovering the latent coordinates.
In the present work, we have introduced a general and flexible model for the analysis of multidimensional networks. In particular, the model is defined to recover similarities among the nodes when the structure of the observed networks is complex and there is not a clear information on which external information can be used to explain the observed patterns. The model extends the latent space model by Hoff et al. (2002) and the latent space joint model by Gollini & Murphy (2016)
with the introduction of network-specific coefficient parameters to weight the relevance of the latent space in the edge-probabilities determination. When these coefficients are null, the model reduces to a random graph model for multidimensional networks. Moreover, missing data (not at random) and edge-specific covariates are considered. A hierarchical Bayesian approach is employed to define the model and its estimation is carried out via mcmc. We have defined hyperprior distributions for the hyperparameters of the model, to avoid their (subjective) specification. The latent coordinates allow for an efficient visualization of the network, a well desired feature for large multidimensional networks.
The model is applied to the votes exchanged among countries in a popular TV show, the Eurovision Song Contest, from 1998 to 2015. Cultural and geographical covariates have been included in the analysis and only the presence of a shared boarder between two countries was found to be relevant to explain the observed voting pattern. The recovered similarities among the participants in the period 1998-2015 have been found to only partially resemble the actual geographical locations of the countries. Indeed, a group structure was found in the latent space which does not completely agree to geographical criteria. These findings sustain the claim of bias in the voting structure observed in the Eurovision, which, however, can not be attributed to geographical reasons alone.
In the simulation study we have applied the proposed model on a large varieties of multidimensional networks and successfully recovered the latent coordinates and the network-specific parameters. That has proved the ability of the model to recover the (latent) association structure among the nodes in a multiplex, also when the number of networks is large.
The latent space model for multivariate networks is implemented in the R package spaceNet and it is available on CRAN.
Airoldi et al. (2008)
Airoldi, E. M., Blei, D. M., Fienberg, S. E. & Xing, E. P.
(2008), ‘Mixed-membership Stochastic
Journal of Machine Learning Research9, 1981–2014.
- Blangiardo & Baio (2014) Blangiardo, M. & Baio, G. (2014), ‘Evidence of bias in the Eurovision song contest: modelling the votes using Bayesian hierarchical models’, Journal of Applied Statistics 41(10), 2312–2322.
- Butts & Carley (2005) Butts, C. T. & Carley, K. M. (2005), ‘Some simple algorithms for structural comparison’, Computational and Mathematical Organization Theory 11(4), 291–305.
- Clerides & Stengos (2006) Clerides, S. & Stengos, T. (2006), ‘Love thy neighbour, love thy kin: Strategy and bias in the Eurovision Song Contest’, Discussion paper series of Centre for Economic Policy Research 5732, 1–28.
- Dryden & Mardia (1999) Dryden, I. L. & Mardia, K. V. (1999), Statistical shape analysis, Wiley Series in Probability and Statistics.
- Durante et al. (2016) Durante, D., Dunson, D. B. & Vogelstein, J. T. (2016), ‘Nonparametric Bayes Modeling of Populations of Networks’, Journal of the American Statistical Association In press.
- Erdős & Rényi (1959) Erdős, P. & Rényi, A. (1959), ‘On random graphs.I’, Publicationes Mathematicae 6, 290–297.
- Erdős & Rényi (1960) Erdős, P. & Rényi, A. (1960), ‘On the evolution of random graphs’, Publications of the Mathematical Institute of the Hungarian Academy of Sciences 5, 17–61.
- Fenn et al. (2006) Fenn, D., Sulemana, O., Efstathioub, J. & Johnson, N. F. (2006), ‘How does Europe Make Its Mind Up? Connections, cliques, and compatibility between countries in the Eurovision Song Contest’, Physica A: Statistical Mechanics and its Applications 360(2), 576–598.
- Fienberg et al. (1985) Fienberg, S. E., Meyer, M. M. & Wasserman, S. S. (1985), ‘Statistical Analysis of Multiple Sociometric Relations’, Journal of the American Statisical Association 80(389), 51–67.
- Frank & Strauss (1986) Frank, O. & Strauss, D. (1986), ‘Markov Graphs’, Journal of the American Statistical Association 81(395), 832–842.
- Ginsburgh & Noury (2008) Ginsburgh, V. & Noury, A. (2008), ‘The eurovision song contest. Is voting political or cultural?’, European Journal of Political Economy 24, 41–52.
- Goldenberg et al. (2010) Goldenberg, A., Zheng, A. X., Fienberg, S. E. & Airoldi, E. M. (2010), ‘A Survey of Statistical Network Models’, Foundations and Trends in Machine Learning 2, 129–233.
- Gollini & Murphy (2016) Gollini, I. & Murphy, T. B. (2016), ‘Joint Modeling of Multiple Network Views’, Journal of Computational and Graphical Statistics 25(1), 246–265.
- Greene & Cunningham (2013) Greene, D. & Cunningham, P. (2013), ‘ Producing a unified graph representation from multiple social network views’, Proceedings of the 5th Annual ACM Web Science Conference (Web-Sci’13) p. 118–121.
- Handcock et al. (2007) Handcock, M., Raftery, A. & Tantrum, J. (2007), ‘Model-based clustering for social networks (with discussion)’, Journal of the Royal Statistical Society: Series A 170(2), 1–22.
- Hoff (2011) Hoff, P. (2011), ‘ Hierarchical multilinear models for multiway data’, Computational Statistics and Data Analysis 55, 530–543.
- Hoff et al. (2002) Hoff, P. D., Raftery, A. E. & Handcock, M. S. (2002), ‘Latent Space Approaches to Social Network Analysis’, Journal of the American Statistical Association 97(460), 1090–1098.
- Holland et al. (1983) Holland, P. W., Laskey, K. & Leinhardt, S. (1983), ‘Stochastic Blockmodels: First Steps’, Social Networks 5, 109 – 137.
Holland & Leinhardt (1981)
Holland, P. W. & Leinhardt, S. (1981), ‘An Exponential Family of Probability Distributions for Directed Graphs’,Journal of the American Statistical Association 76, 33–50.
- Krivitsky et al. (2009) Krivitsky, P. N., Handcock, M. S., Raftery, A. E. & Hoff, P. D. (2009), ‘Representing degree distributions, clustering, and homophily in social networks with latent cluster random effects models’, Social Networks 31(3), 204–213.
Lynch, K. (2015), ‘Eurovision recognised by
Guinness World Records as the longest-running annual TV music competition
- Mantzaris et al. (2017) Mantzaris, A. V., Rein, S. R. & Hopkins, A. D. (2017), ‘Examining collusion and voting biases between countries during the Eurovision song contest since 1957’, arXiv:1705.06721 .
Murphy, T. B. (2015),
Model-Based Clustering for Network Data. In Handbook of Cluster Analysis, CRC press.
- Robins et al. (2007) Robins, G., Pattison, P., Kalish, Y. & Lusher, D. (2007), ‘An introduction to exponential random graph (p*) models for social network’, Social networks 29(2), 173–191.
- Rosen (1972) Rosen, B. (1972), ‘Asymptotic theory for successive sampling with varying probabilities without replacement, I’, The Annals of Mathematical Statistics 43(2), 373–397.
- Saavedraa et al. (2007) Saavedraa, S., Efstathioua, J. & Reed-Tsochasb, F. (2007), ‘Identifying the underlying structure and dynamic interactions in a voting network’, Physica A: Statistical Mechanics and its Applications 377(2), 672–688.
- Salter-Townshend & McCormick (2017) Salter-Townshend, M. & McCormick, T. H. (2017), ‘Latent Space Models for Multiview Network Data’, Annals of Applied Statistics 11(3), 1217–1244.
- Salter‐Townshend et al. (2012) Salter‐Townshend, M., White, A., Gollini, I. & Murphy, T. B. (2012), ‘Review of statistical network analysis: models, algorithms, and software’, Statistical Analysis and Data Mining 5(4), 243–264.
- Snijders & Nowicki (1997) Snijders, T. A. B. & Nowicki, K. (1997), ‘Estimation and Prediciton for Stochastic Blockmodels for Graphs with latent Block Structure’, Journal of Classification 14, 75–100.
- Spiegelhalter et al. (2002) Spiegelhalter, D., Best, N., Carlin, B. & van der Linde, A. (2002), ‘ Bayesian measures of model complexity and fit (with discussion’, Journal of the Royal Statistical Society- B 64(4), 583–639.
- Spierdijk & Vellekoop (2006) Spierdijk, L. & Vellekoop, M. (2006), ‘Geography, culture, and religion: Explaining the bias in Eurovision song contest voting’, Memorandum, Department of Applied Mathematics, University of Twente, Enschede 1794, 1–30.
- Yair (1995) Yair, G. (1995), ‘’Unite Unite Europe’ The political and cultural structures of Europe as reflected in the Eurovision Song Contest’, Social Networks 17(2), 147–161.
Appendix A Posterior distributions for the parameters
The posterior distribution for the model has been presented in section 4.1 in equation 4. The present appendix will present the full conditional distributions and the proposal distributions that have been derived from equation 4 and used to estimate the model, as described in section 4.2. For ease of calculation, the log-posterior distribution is considered:
Without any loss of information, the latent coordinates are assumed to be univariate. A generalization to the multivariate case is straightforward and will be considered when the proposal distribution for the latent positions is introduced.
a.1 Nuisance parameters
All the posterior distributions for the nuisance parameters have closed form. In particular, the variances follow Inverse Gamma distributions
The mean parameters are distributed as truncated normal distributions:
a.2 Latent positions
In order to derive a proposal distribution for the latent coordinates, only the terms containing the variables in the log-posterior distribution are considered:
To approximate the logarithmic term in the above equation, let us recall the definition of the LSE (Log Sum Exponential) function,
that is known to be bounded between:
The logarithmic term in the log-posterior part of interest for the latent coordinates has indeed LSE form, thus it can be bounded between:
The lower bound is met when only one of the term in the summation is non-zero, which is the case. Defining the binary variable as,
then the logarithmic term of interest can be approximated using its lower bound. The log-posterior terms in can be approximated as: