Quantitative methods to measure the participation to parliamentary debate and discourse of elected Members of Parliament (MPs) and the parties they belong to are lacking. This is an exploratory study in which we propose the development of a new approach for a quantitative analysis of such participation. We utilize the New Zealand government’s digital Hansard database to construct a topic model of parliamentary speeches consisting of nearly 40 million words in the period 2003-2016. A Latent Dirichlet Allocation topic model is implemented in order to reveal the thematic structure of our set of documents. This generative statistical model enables the detection of major themes or topics that are publicly discussed in the New Zealand parliament, as well as permitting their classification by MP. Information on topic proportions is subsequently analyzed using a combination of statistical methods. We observe patterns arising from time-series analysis of topic frequencies which can be related to specific social, economic and legislative events. We then construct a bipartite network representation, linking MPs to topics, for each of four parliamentary terms in this time frame. We build projected networks (onto the set of nodes represented by MPs) and proceed to the study of the dynamical changes of their topology, including community structure. By performing this longitudinal network analysis, we can observe the evolution of the New Zealand parliamentary topic network and its main parties in the period studied.
as they have proven to be useful tools for dealing with the vast amount of semantic information that is becoming available. Topic modeling is a set of machine learning techniques that take a collection of documents as input and attempts to discern the themes that pervade them. However, the methods that topic models utilize to search, summarize and understand large electronic archives have rarely been applied to political texts.
The New Zealand government has been making parliamentary transcripts (’Hansard’) available in digital format since 2003. Suitable annotation of these transcripts allow them to be used as a corpus for the development of topic models. This comprehensive corpus of political text can then be examined through a number of lenses. Topic models allow us to monitor the ebb and flow of themes that are discussed in parliament over multiple years and associate particular themes with individual Members of Parliament (MPs). This allows the identification of trends of topics that particular parties follow. That is, we may observe which issues are discussed repeatedly with great interest and which cease to be mentioned.
In the four parliamentary terms analyzed there was a transition of power from the Labour government (1999-2008) to the National government (2008-). The left-leaning Labour Party and right-leaning National Party have been the two parties sharing power for most of the century. In 1996, the method of electing MPs was changed to a mixed-member proportional (MMP) system and the two major parties were joined by a number of smaller parties. These smaller parties have sometimes held the balance of power, with the left-wing Green Party as the largest of these.
A number of textual analyses of political speeches are concerned with finding where on the political spectrum a speaker falls (e.g. [6, 7, 8]). Topic modeling as applied in our analyses cannot determine the sentiment of a statement or speech. Despite this fact, multiple aspects of politicians’ policy interests can be unraveled with further statistical analysis.
Here we construct bipartite networks [10, 11, 12, 9], whereby sets of MPs are linked to a set of topics, with each link representing a topic that is of clear interest to a particular MP, based on the content of their parliamentary speeches. We can then decompose such bipartite networks into their two projections: the MP-projection and the topic-projection. The former represents a network where the links between MPs indicate the existence of a mutual interest, and the latter represents a network where links represent topics that co-occur as interests of a particular MP. In this study, we make use only of MP-projections. Measuring properties such as the node degree (i.e. the number of links that connect it to other nodes), homophily [14, 13] and clustering and community structure [15, 16] of these networks provides information about their underlying topology. For instance, one can discover whether or not the typical range of interests of an MP is changing, as well as patterns in this behavior over time. Moreover, we apply community-detection methods [18, 17] in order to find clusters of politicians that share interests, and investigate the partisan composition of these communities.
This work is of an exploratory nature, in that our goal is twofold: to present a novel quantitative approach of measuring political activity and to demonstrate the benefits of performing quantitative analysis in a domain normally reserved for qualitative approaches, by using a combination of machine learning and complex networks techniques.
The remainder of this paper is organized as follows: the Methods and Data sections 2.2 and 2.3 introduce fundamental aspects of topic modeling and bipartite networks respectively and outline the preparation and organization of our data; Sections 3 and 4 present the results of our analyses alongside a discussion and our conclusions.
2 Methods and Data
The semantic data we are using for our analyses are extracted from the New Zealand Hansard database . Hansard presents records of what is said in the debating chamber as debates (a collection of speeches on a particular topic), speeches (individual statements by MPs) or dailies and volumes (collections of speeches over different time periods). By considering only those documents labeled in that database as a ‘speech’ we were able to find out in which topics specific MPs were engaging with. This makes it possible to associate speeches and by extension MPs with topics of interest over time.
Once these data are obtained, we observe that many speeches are rather short and contain little topical content. An example is given below, which comes from a committee discussion on the Shop Trading Hours Amendment Bill and was published in Hansard Volume 716 on the of August 2016 :
”CHAIRPERSON (Lindsay Tisch): Just a point: this debate concludes at quarter past and to whoever is speaking at the time, I will be stopping it at that point.”
In an attempt to remove these non-topical speeches, we have removed from our database speeches with 150 words or less, which constitute about 20% of the database. This cut-off is shown in Fig. 1 which presents the distribution of word-counts per speech. This decision is informed by observations of the insufficient topical content of speeches below this threshold.
2.2 Topic Models
The process of topic modeling involves utilizing a set of algorithms that have been developed to understand the underlying thematic structure of a corpus. The simplest and most commonly used topic model is Latent Dirichlet Allocation (LDA) . Within the framework of LDA, each document is a mixture of corpus-wide topics and each topic can be understood as a distribution over keywords. The total number of documents comprising the corpus is denoted as and the total number of topics as . Additionally, the order of words that comprise the document is not considered, only the frequency with which words appear.
From the perspective of LDA, documents are imagined to be the result of a generative process. This is the process by which the model assumes the documents arose given certain hidden variables. The word-distributions per document are observed, while the topic structure – per-word topic assignment, per-document topic proportions and per-corpus topic distributions – are hidden elements. Therefore, the central computational problem for LDA is to infer the hidden structure that likely generated the observed corpus. This means computing the conditional distribution of the hidden variables given what is observed. This conditional distribution is usually referred to as the posterior and can be expressed as
where are all topics in the corpus, are the per-corpus topic proportions, the per-corpus topic assignments and
the whole set of observed words. Unfortunately, computing the posterior is computationally unfeasible and hence needs to be approximated by an inference algorithm. Consequently, topic modeling algorithms are commonly classified as sampling-based algorithms or variational algorithms.
, which constructs a sequence of random variables in a Markov chain, where each variable is dependent on the previous. The algorithm then assumes that the true posterior distribution is the limiting distribution of this sequence, and obtains an approximation to this posterior using these samples. For a full mathematical description of LDA and a further discussion of the methods used to estimate a posterior, see.
LDA assumes the topics are the same for all documents, and only the topic proportions vary. Therefore, MALLET requires an input which specifies the number of topics to be discovered. Choosing this number is critical to the success of a topic model, as too few topics may merge distinct themes, while too many topics may introduce many ”themes” consisting of vocabularies that appear to have nothing in common, or even start splitting topics that were identifiable at smaller input values. For our analysis it is important that the topics are easily identifiable and distinct from one another. We found that 30 topics satisfies these requirements. Identified topics and their corresponding keywords can be found in Table LABEL:keywordstable. It is worth noting, however, that some topics (nine of them, corresponding to about 36% of the corpus) appeared to consist mainly of terms that were primarily either procedural or general rhetoric, such as ”proud”, ”hope” or ”nation”. As this language reveals little in the way of substantive interactions, such topics were omitted from our subsequent analyses after networks had been inferred. Fig. 2 shows the remaining topics with their rescaled proportions.
2.3 Bipartite networks
A bipartite network is mathematically defined as a graph , where and are disjoint sets of nodes and is the set of links connecting these nodes. For our purposes, the sets and correspond to the sets of MPs and topics, and the set
represents the links that emerge when an MP speaks sufficiently frequently about a topic. No connections among nodes of the same set are allowed in the bipartite network, that is, MPs are connected only to topics not others MPs and vice versa. Each set of nodes can have independent properties, such as the probability distribution for their nodes degree, or the number of nodes (system size).
Once we find the set of topics that a particular MP speaks about ‘often’ enough (this criterion is defined below), these are represented as links between the MP and those topics. After this process is completed for all MPs, we can construct a bipartite network where nodes representing MPs are connected only to nodes representing topics, and vice versa.
Bipartite structures play an important role in the analysis of social and economic networks. They are normally used to represent conceptual relations - such as membership, affiliation, collaboration, employment, ownership and others - between two different types of entities within a system [10, 11, 12]. Often, we are more interested in one of the types of nodes (e.g. MPs) and, in order to investigate the relationships between them, we create a new network with only these nodes. This new graph is a projection of the original bipartite network.
Topic modeling results in a natural bipartite network with projections that can be easily interpreted. The projections of a bipartite network are obtained by connecting nodes which share a common neighbor. That is, if two MPs are both linked to the same topic in the bipartite network, then they are linked in the MP-projection. For a bipartite network, this process results in two completely separated components, each composed exclusively of one type of node (MPs and topics in our case). Fig. 3 shows a schematic drawing of a bipartite network and its possible projections. The edges between nodes in these projections are then weighted, dependent on the number of neighbors the nodes share in the bipartite network. In our analyses we use simple weighting method , whereby each edge has a weight that equals the number of neighbors the nodes share in the bipartite graph. If two MPs are linked to the same three topics in the bipartite graph, then the edge linking them in the MP-projection will have weight equal to three.
The weighting in these projections offers a way to eliminate edges that represent tenuous links. This is important, as complete subgraphs (where every node is connected to every other node in the subgraph) of MPs are generated by every topic. This means that every MP that speaks about a popular topic is connected to every other MP that speaks about that same topic. The existence of popular topics can make analyses such as community detection challenging in the absence of weighting.
In order to build bipartite networks connecting MPs to topics, we looked at the corpus of each MPs speeches in more detail. We considered an MP to be connected to a topic when at least 6.7% of the MP’s speeches over the course of a year was assigned to that topic by MALLET. This occurs when MPs talk about a topic twice as much as would be expected if they were talking about all topics equally within a year. This method, removes topics that MPs only touch on briefly or in passing, which does not indicate engagement with the topic. Finally, MPs that had spoken less than words in the entire term were removed form the network for the lack of significance in the volume of words spoken.
3.1 Words spoken
Despite having fewer MPs, opposition parties tend to have a greater total word count than the governing party. Figure 4 shows the total word count for each of the 3 largest parties (as of the 50th parliament) over the course of 4 parliaments. In each parliament, the total word count for opposition parties exceeds that of the governing party. The increase in words spoken does not appear to be driven by any particular MP or small group of MPs (see Supporting Information, Fig. (A.2).
3.2 Time Series of Topic Popularity
Allowing a decomposition by party, we ran a topic model on data concatenated by MP and year. The topic proportions obtained over a total number of 30 topics are normalized for each year so that they are comparable across a time span of 14 years. Proceeding this way, we can reproduce the evolution of topic popularity over time at the Parliament and its decomposition for each of the three most represented parties. Clear trends and differences across parties are visible in Fig. 5 and 6. Evolution of proportion of other argued topics appears in Fig. A.1, Supporting Information.
3.3 The Parliamentary Speech Network
The MP-projected networks for the to parliaments resulting from the process described above are shown in Fig. 7. The community structure  in these networks is visible, as is the party make-up of these communities. Table 1 shows the number of MPs per party present in each of these four networks.
|[ late after line=|
|, late after last line=|
|, before reading=, after reading=]MPs_perparty_perterm.csv1=,2=,3=,4=,5=|
Left: Time series of the average degree of MPs in the MP-projected network, decomposed by party. Right: Time series of net homophily in the MP-projected networks for the empirical data and configuration model. It shows that if connections were random, the expected homophily would stay nearly constant and the network would be slightly dissortative. The results show average and standard deviation over 1000 runs.
Before examining the structure of the network, it is important to note that in all networks a number of MPs are not connected to any others. This is a reflection of our methodology. The 6.7% threshold that is applied to filter topics may in fact remove all topics in the unlikely scenario that the MP in question talks about many topics in roughly equal proportion. MPs with diverse interests may not be identified as fitting into any particular community.
A striking difference between the MP networks of the first two parliaments ( and ) and last two ( and ) is the party composition of visible communities. In the first two (Labour government, 2002-2008) the communities are quite party-wise diverse, while in the last two (National government, 2008-2014) there is a close-knit community made up of National Party MPs. This is corroborated by the three largest communities composition, for every term in our analysis, shown in Fig. 8 . In the first two terms, we note heterogeneous, smaller core communities and the absence of a community that is much larger than the others. That changes for the last two terms, specially the last one, where we see one community much larger than the other emerging, dominated by MPs of the National Party. Another point worth noticing is the smaller presence of minor parties in the largest communities over time, ending with no presence whatsoever in the largest community of the term network.
Also supporting this idea of a close-knit community made up of National Party MPs is Fig. 10(b) that compares homophily between MPs (based on party affiliation) for the empirical data and configuration model networks. The latter is a random model in which we keep the same degree sequence of the empirical one and rewire the links. It shows that if connections were just random, then the expected homophily would be fairly constant and the network would be slightly dissortative. On the other hand, there is an increasing homophily of the empirical networks, meaning that MPs preferentially share interests with other members of their parties, particularly within the National Party.
The degree distributions in Fig. 9 also tell an interesting story. From Fig 9(a) we see that topics are attracting the interest of more MPs over time. Associate with that is Fig. 9(b) that shows the distribution of topics that MPs spend the most time on during each government. Most MPs speak about 2-3 topics in large proportions over these periods, however for parliaments 47-49 we see a trend towards larger repertoires.
From Fig. 9(c), we can see the Labour government in the and parliaments display a fairly sparse network, where most MPs share interests with less than twenty other MPs. The and parliaments appear to show the formation of a compact National Party group (visible in Fig. 7) speaking about the same topics as many others within this group, such that members of this group in the parliament are connected to at least 45 other MPs, most of whom are also members of this group.
Topic models provide a way to parse human speech and extract themes from large bodies of text that are often difficult and time consuming to analyze manually. In few cases is it more important to gather and process this information than in the speeches of those people that control the legislative and political direction of a country. Topic modeling is unlikely to replace traditional media analysis of political speech, however, here we have shown that it is a useful tool in examining larger themes and trends in political discourse. We were able to use topic modeling to track changes in the content of parliamentary speeches across time, and identify features in these time-series that correspond to particular issues or events.
In the time period examined, a number of large events influenced political discourse in New Zealand, such as the 2011 Christchurch earthquake, the global economic crisis and changes to local government with the creation of the Auckland ‘Super City’ via the amalgamation of numerous smaller councils (see Fig. A.1) as well as more recently, the housing crisis. Parliamentary discussions around all of these topics were identified, alongside more conventional themes such as the economy, the budget, and social welfare.
Breaking down these topics by time and party shows the different emphases parties are putting on topics. For example, we can see that much of the discussion around the developing housing crisis has been pushed by Labour (see Fig. 5) and, to a lesser extent, the Green Party, while the governing National Party showed little additional interest. Conversely, around the time of the economic downturn the National Party spent more of its time talking about economics. In other cases, such as the discourse surrounding the Christchurch earthquake and the governance of the Canterbury region (see Fig. 5), increase in discussion was driven by all parties. Unsurprisingly, the party that spend the most of their time discussing the environment, was the Green Party.
Some events are sufficiently large that it would be difficult for a political party to ignore them, such as the Christchurch earthquake which influences the Canterbury topic. Exogenous events such as these force politicians to comment, driving conversation across party lines. Topics where there are major differences in trends suggest endogenous drivers, where discussion is the result of conscious decisions by political parties. The National Party’s apparent indifference to the increasingly vocal opposition parties’ discussion around housing over the period examined would suggest that National were consciously choosing not to engage with this topic, while Labour and Green were consciously choosing to engage.
A mixture of mechanisms could also drive changes in the level of discussion. The increase in the discussion of the economy by National appears to occur at the same time as the global financial crisis. It also coincides with the National Party taking power. The change in discussion could be attributed to the crisis, or it could be that the governing party simply talks proportionally more about the economy when they first enter office. Continuing this analysis past another change of government sometime in the future would allow us to identify the drivers of this type of pattern.
From a basic analysis of the topic proportions we were able to identify an increase in the number of topics under discussion per MP, alongside an increase in the total number of words spoken per year. We can also observe that parties in opposition tend to talk more than parties in power. In the and parliaments, the National Party has a greater total word count. In the and after the National Party takes power, the Labour Party becomes the most vocal (see Fig. 4)
Topic models also produce a natural bipartite network that can be decomposed into its projections and analyzed using standard network techniques. Without the use of sophisticated or computationally expensive methodologies we have shown that the networks resulting from a topic model can display useful information such as community structure and interpretable degree distributions.
Much of the popular political analysis since the current National government took power in 2008 (the parliament) has noted the factionalism with the major opposition party, Labour. The community structure we have inferred (Fig. 8) supports this more traditional analysis, with Labour MPs speaking more, on disparate topics while National MPs largely kept to a smaller number of topics. Extracting and examining the top three communities for each parliament we can see the National Party coming to dominate discussion within the core communities in the and parliament. We also see a gradual decline in the participation of smaller parties in these largest communities identified, in particular, having no influence in the largest community in the parliament.
Fig. 9 shows the gradual increase in the number of topics discussed by MPs over time (see Fig. 9(b)), indicating decreasing topic specialization by individual MPs in their parliamentary speeches. This has resulted in MPs becoming more highly connected over time (see Fig. 9(c)). At the same time, the average degree of National MPs has increased significantly (10(a)). When considered in light of the communities shown in 8 this analysis suggests a widening of political discourse in New Zealand with opposition parties talking about a greater number of topics, and the development of tight knit communities mostly consisting of government MPs talking about a smaller range of topics.
The authors would like to thank Te Pūnaha Matatini for funding this project.
The four authors designed the study, contributed to the analysis of the results and to the writing of the manuscript.
- 1. Titov I, McDonald R. Modeling online reviews with multi-grain topic models. InProceedings of the 17th international conference on World Wide Web 2008 Apr 21 (pp. 111-120). ACM.
- 2. Ramage D, Dumais ST, Liebling DJ. Characterizing microblogs with topic models. ICWSM. 2010 May 23;10:1-.
- 3. Zhao WX, Jiang J, Weng J, He J, Lim EP, Yan H, Li X. Comparing twitter and traditional media using topic models. InEuropean Conference on Information Retrieval 2011 Apr 18 (pp. 338-349). Springer, Berlin, Heidelberg.
- 4. Grimmer J. A bayesian hierarchical topic model for political texts: Measuring expressed agendas in senate press releases. Political Analysis. 2009 Dec 3;18(1):1-35.
- 5. Blei DM. Probabilistic topic models. Communications of the ACM. 2012 Apr 1;55(4):77-84.
- 6. Laver M, Benoit K, Garry J. Extracting policy positions from political texts using words as data. American Political Science Review. 2003 May;97(2):311-31.
- 7. Slapin JB, Proksch SO. A scaling model for estimating time series party positions from texts. American Journal of Political Science. 2008 Jul 1;52(3):705-22.
- 8. Laver M, Garry J. Estimating policy positions from political texts. American Journal of Political Science. 2000 Jul 1:619-34.
- 9. Guillaume JL, Latapy M. Bipartite graphs as models of complex networks. Physica A: Statistical Mechanics and its Applications. 2006 Nov 15;371(2):795-813.
- 10. Koskinen J, Edling C. Modelling the evolution of a bipartite network — Peer referral in interlocking directorates. Social Networks. 2012 Jul 31;34(3):309-22.
- 11. Wasserman S, Faust K. Social network analysis: Methods and applications. Cambridge university press; 1994 Nov 25.
- 12. Breiger RL. The duality of persons and groups. Social forces. 1974 Dec 1;53(2):181-90.
- 13. Newman ME. Mixing patterns in networks. Physical Review E. 2003 Feb 27;67(2):026126.
- 14. McPherson M, Smith-Lovin L, Cook JM. Birds of a feather: Homophily in social networks. Annual review of sociology. 2001 Aug;27(1):415-44.
- 15. Girvan M, Newman ME. Community structure in social and biological networks. Proceedings of the national academy of sciences. 2002 Jun 11;99(12):7821-6.
- 16. Newman ME. The structure and function of complex networks. SIAM review. 2003;45(2):167-256
- 17. Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment. 2008 Oct 9;2008(10):P10008.
- 18. Fruchterman TM, Reingold EM. Graph drawing by force directed placement. Software: Practice and experience. 1991 Nov 1;21(11):1129-64
- 19. New Zealand Parliamentary Debates; 2002–2016.
- 20. Campbell JC, Hindle A, Stroulia E. Latent Dirichlet allocation: extracting topics from software engineering data. The art and science of analyzing software data. 2014 Jul 16;1.
- 21. Blei DM, Ng AY, Jordan MI. Latent dirichlet allocation. Journal of machine Learning research. 2003;3(Jan):993-1022.
- 22. McCallum AK. Mallet: A machine learning for language toolkit.
- 23. Griffiths T. Gibbs sampling in the generative model of latent dirichlet allocation.
- 24. Zhou T, Ren J, Medo M, Zhang YC. Bipartite network projection and personal recommendation. Physical Review E. 2007 Oct 25;76(4):046115.
Appendix A Supporting Information
Codes and names of Members of the Parliament
|[ late after line=|
|, late after last line=, before reading=, after reading=]mpcodes.csv1=,2=,3=,4=|
Topics identified and keywords
|[ late after line=|
|, late after last line= , before reading=, after reading=]keywords.csv1=,4=|