Mapping Out Narrative Structures and Dynamics Using Networks and Textual Information

03/24/2016 ∙ by Semi Min, et al. ∙ 0

Human communication is often executed in the form of a narrative, an account of connected events composed of characters, actions, and settings. A coherent narrative structure is therefore a requisite for a well-formulated narrative -- be it fictional or nonfictional -- for informative and effective communication, opening up the possibility of a deeper understanding of a narrative by studying its structural properties. In this paper we present a network-based framework for modeling and analyzing the structure of a narrative, which is further expanded by incorporating methods from computational linguistics to utilize the narrative text. Modeling a narrative as a dynamically unfolding system, we characterize its progression via the growth patterns of the character network, and use sentiment analysis and topic modeling to represent the actual content of the narrative in the form of interaction maps between characters with associated sentiment values and keywords. This is a network framework advanced beyond the simple occurrence-based one most often used until now, allowing one to utilize the unique characteristics of a given narrative to a high degree. Given the ubiquity and importance of narratives, such advanced network-based representation and analysis framework may lead to a more systematic modeling and understanding of narratives for social interactions, expression of human sentiments, and communication.



There are no comments yet.


page 4

page 5

page 8

page 9

page 11

page 13

page 14

page 15

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Recent advances in quantitative methodologies for the modeling and analyses of large-scale heterogeneous data have enabled novel understanding of various complex systems from the social, technological, and biological domains Michel et al. (2011). The field of application is also rapidly expanding, now including the traditional academic fields of cultural studies humanities. It is allowing researchers to obtain novel answers to both long-standing problems by finding complex patterns that were previously hidden. Recent examples include high-throughput analyses of language and literature based on massive digitization of books (e.g., Project Gutenberg gut and Google Books) and proliferation of social media Michel et al. (2011); Dodds et al. (2015), emergent processes in cultural history Schich et al. (2014), and scientific analysis of art Kim et al. (2014).

A theoretical data modeling and analysis framework that has attracted attention for cultural studies is networks Schich et al. (2014); Park et al. (2015). Network science attempts to understand the structure and behavior of a complex system from the connection and interaction patterns between its components Newman (2010); Albert and Barabási (2002); Easley and Kleinberg (2010); Han et al. (2011). Owing to its flexibility as a modeling framework, network science has led to a novel understanding of many systems that are not only easily recognizable as a network such as the Worldwide Web Adamic and Huberman (2000); Albert et al. (1999), the Internet Choi et al. (2006), but also those that have been extensively studied in non-network contexts such as biological systems or social organizations Borgatti and Foster (2003); Grimm et al. (2005).

In this paper we propose a network science-based framework for a cultural system that is ubiquitous in society and boasts a long history of study but we believe still can benefit from one: Narratives. Narratives (or stories) are important in that they are the most common way in which we communicate and recount our experiences. The connection between networks and narratives can also be seen in the very definition of the word: The New Oxford American Dictionary, for instance, defines narrative as “a spoken or written account of connected events.” This suggests that using networks may help us understanding how the various building blocks of narratives are weaved to become a coherent structure for effective delivery of messages and arousal of emotions. This way of thinking about narratives is deeply correlated with an interesting recent movement in literary studies named “distant reading” proposed by Moretti Moretti (2011, 2013); mor . Distant reading is an approach to literature based on processing large amounts of literary data to devise and construct general “models” of narratives to understand them as a class, in contrast to reading each work very closely (hence the term “distant”) to understand it. A model constructed through reduction and abstraction, the reasoning goes, would enable us to grasp the general underlying structures and patterns of a class of complex objects called narrative, as an X-ray machine would allow us to understand the general skeletal features of the human body.

To many of us this way of thinking is familiar as the very principle of research in the natural sciences: To understand a system, one collects data and performs statistical analysis based on abstract models to gain an understanding of the general characteristics of the system. A model of a system has the following characteristics. An abstract representation or notion of a system, a model necessarily excludes some features of the system it is representing. A random exclusion of features, of course, is unlikely to result in a useful model; it is important to make a judicious choice on which features to retain and which to exclude so that the model incorporates important or essential of the system. Of course, it is very difficult to know beforehand which is the best choice of features. One practical starting point can be a common description of the system by people, since such a description is already a type of mental representation which can be viewed as a model, however rudimentary. The network model of a narrative, from this perspective, appears to be sensible and immediately understandable; in many instances when we recount a story, we focus prominently on the characters and their relationships. Take the Star Wars movie franchise, for instance, the top grossing space opera in modern times box ; the . Once the generic physical setting of “a galaxy far, far away” is presented, the story progresses via the character’s actions, adventures, and relationships and interactions with others; that Leia Organa and Luke Skywalker are twins play an crucial role in their fate, and the revelation via “I am your father” is perhaps the most memorable scene in the narrative. In addition to these individual relationships, group-level relationships are important as well for the story, such as the Empire versus the Rebel Alliance, the dark side versus the light side of the Force, etc. Examples abound from history: The story of Oedipus that precedes Star Wars in terms of shocking familial revelation; Dexter, a favorite American TV show of one of the authors, is a series of episodes that portray the titular character navigating his social world of his sibling, family, and rival criminals 111As a mature-rated show, the titular character’s identity and the overarching plot of the drama may be discomforting and too cruel for some readers to describe here; we refer the interested to seek appropriate sources for more information.

These examples all function as empirical bases for approaching narratives from the network modeling framework. One of the earliest models proposed for narratives in the distant reading philosophy introduced above, in fact, was networks. Moretti applied the network framework to Shakespeare’s Hamlet for detecting specific regions in the plot, and performed many experiments such as extracting specific nodes in the network of characters to observe changes and make comparisons between different networks. Other network-based studies of narrative include the study of the community structure of the character network in Victor Hugo’s Les Misérables Newman and Girvan (2004), the social networks of characters based on conversation in 19th-century British novels Elson et al. (2010), networks of mythologies and sagas Carron and Kenna (2012); Mac Carron and Kenna (2013); Kydros et al. (2015), and more recently, a technique for dialog detection in novels applied to writer J. K. Rowling’s Harry Potter series Waumans et al. (2015). While these serve to demonstrate the scientific community’s interest in network-based understanding of narratives, most works are limited to the story of the static topological properties of networks found in the said stories, when we know that narratives are essentially dynamically progressing entities, and the text itself is a source of much information that can be used extensively. Given the wide range of analytical and computational tools that constitute network science, we believe there is much opportunity for further studies of narratives using networks that take into account such essential aspects. This paper is intended to be one such attempt inspired by those works, hopefully laying out potential future directions in utilizing network science and computational linguistics for understanding the dynamics of narratives in a systematic manner.

Ii Materials and Methods

ii.1 Material: Victor Hugo’s Les Misérables

We analyze Victor Hugo’s novel Les Misérables using the methods introduced in this paper. Set around the popular uprising in Paris in 1832 CE, Les Misérables is known for its vivid depiction of the conditions of the tumultuous times and intuition into the human psyche via multiple intersecting plots involving richly developed characters Welsh (1978). Its main plot follows fugitive Jean Valjean’s trajectory that shows him transform into a force for good while being constantly haunted by his criminal past. During his journey he interacts with many characters, some helpful and friendly, and others antagonistic and hostile. The most important characters of the novel include the following:

  • Fantine: A young woman abandoned with daughter Cosette early in the novel. She later leaves Cosette in the care of the Thenadiers, who then abuse her. She is rescued by Valjean when Javert arrests her on charge of assaulting a man.

  • Cosette: Fantine’s daughter, later adopted by Valjean. Under Valjean’s care she grows into a beautiful woman, and falls in love with Marius.

  • Marius: A young man associated with the “Friends of the ABC (Les Amis de l’ABC in French),” a group of revolutionaries. He is critically wounded at the barricade, but is rescued by Valjean. He later marries Cosette.

  • Javert: A police inspector in a relentless pursuit of Valjean. After being rescued by Valjean at the barricades and realizing the immorality of the old French system he has served loyally, he commits suicide.

  • Thenadier: A wretched man who abuses young Cosette. A lifetime schemer of robbery, fraud, and murder, he conspires to rob Valjean until Marius stops him, and gets arrested by Javert.

ii.2 Method: Interacting Timelines and Network Construction

The widely-accepted essential building blocks of a narrative are characters (also called agents or actants), events, and the causal or temporal relationships that weave them together Rimmon-Kenan (2003); Bal and Boheemen (2009). An interrelated sequence composed of those elements is called a plot which may be viewed as the backbone of a narrative. A narrative may also be broken down into formal units such as acts, scenes, chapters, etc. Abbott (2008). Historically there have been many attempts to establish a general form of narrative structure, of which a well-known example is Aristotle’s three-act plot structure theory. It states that Act One presents the central theme and questions, followed by Acts Two and Three that present major turning points and conclusion. Variant forms exist such as the four-act structure theory Field (2007); Vogler (2007).

While these have existed for a long time and been widely applied, we find it difficult to imagine that there is an a priori reason for all narratives to consist of three or four parts. Then, can we deduce the structure of a narrative from the narrative itself? It appears that the increasing availability of narrative texts in digital format and analytical methods for data analysis offer an opportunity for a new look at narrative structures, and the formulation of a flexible framework that can properly capture the complexity of a given narrative.

An interesting pair of concepts helpful for picturing the content of a narrative that serve as the basis for formalism was given by Propp Propp (2010) who, while trying to establish a symbolic notation-based formalism for Russian folktales, proposed that narrative content consists of two layers that he labeled the fabula and the sjuzet. The fabula refers to the entire world that contains the narrative, while the sjuzet refers to those elements of that world explicitly presented to the audience. For instance, if the narrative is depicting a man dining with his family in his home, the sjuzet comprises the man and his family (the characters), the act of dining (the event), and his home (the place), while the fabula is all of the above plus the rest of the story world such as the man’s colleagues at work, their concurrent actions and whereabouts, etc.  The sjuzet therefore can be considered the part of the story world currently under observation, and the rest of the fabula the part that “operates” in the background. Each component of the fabula may or may not become sjuzet (explicitly presented to the audience) at another point in the narrative, but they are nevertheless indispensable for the consistency of the story world and future plot development via implicit action.

We start by representing a narrative as a set of character timelines, basically the record of a character’s appearances in the narrative. The point of appearance is marked in narrative units which can be scenes, chapters, etc., shown in Fig. 1. In our paper we follow the convention used in the construction of the character network from Victor Hugo’s Les Misérables in Ref. Newman and Girvan (2004): two characters are connected in the network when they appear in the same narrative unit. An interaction defined in this fashion would be more general than direct conversations, as it could include a common experience or shared space in addition to a conversation. The narrative format can also present some practical issues in defining an interaction. In a play or movie script, for instance, it would be much easier to identify a conversation between characters as an interaction, which would be more explicit and narrower in scope. An online resource named moviegalaxies provides a collection of social networks of characters in hundreds of movies built in this way, along with static network properties such as the diameters and clustering coefficients mov . Using this narrower definition in a novel is potentially problematic: it is difficult to detect conversations in a novel (though some advances have been recently made Waumans et al. (2015)), but more fundamentally it would miss non-verbal interactions which exist abundantly in a novel. For this reason, it is difficult to state at this point which would be a better approach. Perhaps a comparison study could be illustrating, although it is out of the scope of this work mor . The rest of this paper is dedicated to exploring what the character network based on Fig. 1 can tell us about the narrative structure and how it progresses. The methodology will be demonstrated using the English translation of Victor Hugo’s Les Misérables Hugo , although it will be clear that the formalism itself applicable to any comparable narrative. Our choice of Hugo’s work is based on its stature as a classic known for a set of richly developed characters Welsh (1978), familiarity in network science Newman and Girvan (2004), and the free availability of the complete text on Project Gutenberg. Using the original French version would be ideal, but we point out that the network construction according to Fig. 1 is unaffected, and the wider availability of advanced computational linguistic tools for the English language does provide advantages for incorporating the text for enriched analyses that will be demonstrated in the latter part of the work.

Figure 1: Interacting Timeline Framework for Network Modeling of Narratives. (A) Construction of the character network from a narrative. We represent the narrative as a set of character timelines, the record of appearances of the characters in narrative units (e.g. chapters, scenes, etc.). An interaction can be defined as co-appearance in a narrative unit. (B) A narrative unit is not unique. One may use the author’s designation (i.e. the Volumes, Books, or Chapters in a novel) or define a new one such as the Sequence based on the unit-to-unit continuity of character compositions (defined in Sec. III.0.2). The narrative units in Victor Hugo’s Les Misérables are shown here, from the finest (Chapters, top) to the coarsest (Volumes, bottom).

In Fig. 1 (B) we show the narrative units in Les Misérables on several levels. From top to bottom, they are the Chapters (colored according to their Sentiment Polarity Index defined in Sec. II.3.1), Books (groups of Chapters), Sequences (groups of Books), and Volume (even larger groups of Books). All but the Sequences, whose definition is given later in Sec. III.0.2, are by the author’s designation. It is reasonable to assume that the author intended each unit to represent a theme or subplot. The five Volumes of Les Misérables, for instance, are titled “Fantine,” “Cosette,”, “Marius,” “The Idyll in the Rue Plumet and the Epic in the Rue St. Denis,”, and “Jean Valjean,” indicating their central character or plot. Since the different narrative units offer a varying degree of resolution of the narrative, one may again choose the one that is most useful for their purposes. For our goal of studying the complexity of Les Misérables, however, the five Volumes appear too few; we therefore choose to work with the Chapters (of which there are ) for most purposes, and the Sequence in a later analysis.

ii.2.1 Network Topology and Growth Patterns

From the network of characters built based on the Interacting Timelines of Fig. 1 (A) we can measure various static network properties. But a narrative is essentially a dynamical system that unfolds in time; what interests a reader is how the story is told in time, not necessarily the final, static network of characters. We need to study how the network grows over time and what we can learn about the narrative from it. This is because the network growth is essentially coupled to the narrative flow: Starting from an empty network in the beginning of the narrative, the network grows as new characters are introduced and interact with others. In this sense, we can say that the temporal growth of the network is intimately connected to the concept of the so-called narrative stages. A common classification of narrative stages includes Exposition, Rising Action, Climax, Falling Action, Resolutionetc., named according to their role and nature Freytag (1896) . For example, the Exposition stage introduces the characters and the space they inhabit. Once the motives and allegiances of the characters are presented, in the Rising Action the characters begin to struggle against each other until all conflicts are resolved through the later stages.

We study the network growth pattern on two levels. First, on the aggregate level, we measure the growth the number of nodes and edges of the network. Second, on the individual character level, we measure two values, appearance (the number of chapters in which a character makes an appearance) and degree of the characters.

ii.3 Method: Sentiment Analysis and Topic Modeling

Figure 2: Sentiment Analysis and Topic Modeling of Narratives. (A) The principle of sentiment analysis. Words associated with positive or negative sentiments contribute towards the Sentimental Polarity Index (SPI) of the text ranging from (most negative) to (most positive). (B) The principle of topic modeling. Clusters of words detected from a set of texts that tend to appear together are identified as the topics. (C) SPIs of the chapters of Les Misérables. Vertical gray bars indicate the 21 Sequences of Les Misérables. Each sequence is colored according to the sign of the mean SPI of its constituent chapters (blue for positive, and red for negative). We compare the SPI and content for eight chapters in the narrative: Positive chapters depict uplifting characters or events (e.g., introduction of Myriel, a man of great character in Chapter 12) and happy events (e.g., Fantine going on a picnic, Cosette and Marius falling in love, etc.), while negative chapters depict pain and suffering (e.g., Valjean nearly drowning, Fantine in misery, war, lovers parting, etc.)

An analysis focused solely on the network topology leaves out an essential component of a narrative, the text. This is important because a narrative is in essence much more than a record of who-meets-whom; in the form of text, a narrative contains the details that can vary significantly between interactions Vogler (2007); Propp (2010). In Les Misérables, for instance, the nature of Valjean’s relationships to different characters that is at the center of its drama – at the same time a savior and protector to Cosette, and a fugitive criminal to Javert – is wholly missing in the simple appearance-based network. This means that leveraging the actual text of the narrative may lead to a richer and proper understanding of the narrative, which we perform by using some tools developed in computational linguistics. Here we utilize two: The first tool is Sentiment Analysis that identifies the positive and negative sentimental qualities of a text, which allows us to study the sentimental states of character relationships and the build-up and the resolution of tension in the narrative. The second tool is Topic Modeling that identifies the topics inside the novel, which allow us to associate the characters with the topics at different points in the narrative that define the characters’ states, and quantify the impact of events on the characters.

ii.3.1 Sentiment Analysis

Sentiment Analysis, also called Mood Analysis or Opinion Mining, is a technique for determining the sentimental qualities of a given text based on the words it contains. Its origin can be traced back to an attempt in the 1990’s to translate written reviews of products into numerical rating scores: To this day it is common to produce a numerical Sentiment Polarity Index (SPI) of a given text that shows its positive or negative quality. Basically it count the words of known positive or negative sentimental states from a text to produce SPI. For instance, words such as “admire,” “happy,” and “love” contribute to the text’s positive sentiment, where as “hate,” “pain,” and “sad” would contribute to its negative sentiment. (See Fig. 2 (A)). We note an interesting connection to the Western literary tradition of the generic division of drama into comedy and tragedy, often stylized using two masks – the laughing that represents Thalia, the Muse of comedy in Greek and Roman mythology, and weeping one that represent Melpomene the Muse of tragedy, also shown in Fig. 2 (A). Here we use the LIWC (Linguistic Inquiry and Word Count) Tausczik and Pennebaker (2010) program, one of several available Gonçalves et al. (2013), to determine the SPI of the chapters of Les Misérables. LIWC actually returns two separate values, and , for the positive and negative sentiments for the input text, which we combine, for convenience, into a single SPI variable


Defined in this way, when the text is net positive , when neutral , and when net negative . We now have a set of values , where is the number of chapters in Les Misérables.

From we can compute SPIs of the characters and character pairs using the timeline framework in Fig. 1 (A). If a character , for instance, has appeared in Chapters , , and , we define to be the SPI set of , from which we can calculate quantities such as the character’s average character SPI, . The SPI of a character pair is similar: if two characters and have co-appeared in Chapters and , for instance, their average SPI is .

ii.3.2 Topic Modeling

Our second example of incorporating textual information for network-based narrative study is topic modeling. We will see that it allows us to determine the “topical state” of a character at any given point in the narrative and map out a detailed picture of interaction between characters. Topic modeling is a method for extracting clusters of correlated keywords from a set of documents that can be identified as separate “topics” of the texts. The basic idea is presented in Fig. 2 (B) through a tripartite network composed of three layers of nodes: of documents, of words, and of topics. The goal is to find , essentially “bags of words” appearing often together in documents, from the text data consisting of and . Many studies have reported the success of topic modeling in identifying word sets that match the human understanding of groups of texts, and its practical applicability to problems like word sense induction Jurgens and Stevens (2010); Van de Cruys and Apidianaki (2011); Stevens et al. (2012).

Here we employ the Non-Negative Matrix Factorization (NNMF), famously used for identifying distinguishable parts in images Lee and Seung (1999, 2001); Xu et al. (2003); Zhao and Karypis (2004). It decomposes the word–document TF-IDF (Term-Frequency – Inverse Document Frequency) matrix () into , product of two matrices and such that and . The number of topics is an input parameter typically set to be smaller than and  222The decomposition is also approximate in practice, i.e. with the difference between and (called reconstruction error) measured by the squared Frobenius norm.. We can then interpret the matrix as the association strength between word and topic , and as that between topic and chapter . Using the framework of Fig. 1 (A) we can again define the character-topic association strength between character and topic as follows:


where is the set of chapters where character appears in. It is the normalized sum of all topic-chapters associations from the chapters featuring character . We used the scikit-learn

Python machine learning package to perform NNMF.

Iii Results

iii.0.1 Network Topology and Narrative Structure

This section is a summary of our previous work Min and Park (2016). We start by constructing the network of characters based on Fig. 1. The final network of Les Misérables contains 63 characters after very minor ones are excluded. Drawing an edge between two characters if they have appeared in a chapter together results in edges. The network is shown in Fig. 3. In it, of the character pairs are connected, the mean geodesic length is , the network diameter is (between the pair of Babet and Geborand, and 17 other pairs of relatively minor characters), and the clustering coefficient is  333Although our network appears denser than typical social networks Wasserman and Faust (1994); Marsden (1990), this is likely due to the fact that most characters of the novel are involved in some common plot while the rest of the story world is pushed into the background..

Figure 3: The Character Network of Les Misérables and Its Community Structure. The character network of Les Misérables. The node radius is proportional to its degree (the number of its neighbors). The network shows many common characteristics of a social network such as the small-world property and the community structure. The node color indicates the community to which it belongs (we identify seven, labeled I to VII), while the edge color indicates the sign of the cosentiments of the character pair (blue for positive, and red for negative), defined and discussed further in Sec. III.0.2.

Based on the distinction between different stages in a narrative, we can assume that and would not simply increase linearly in time but nonlinearly in accordance with the nature of the stages. In Fig. 4 we show the growth of and along the narrative time measured in chapters. As expected, the growth is not linear, especially for the number of nodes . After the first batch of characters are introduced at the beginning of the narrative, there are specific points in the narrative where many new characters are introduced simultaneously (noted S1, S2, and S3 in Fig. LABEL:growth) that suggest they are the Exposition stages. An inspection of the actual story confirms this:

  • Stage S1: Fantine’s friends are introduced as her happy days are depicted.

  • Stage S2: Valjean’s former fellow prison inmates testify during the trial of the fake Valjean.

  • Stage S3: “The Friends of ABC” (young progressives) are introduced, shown debating various social issues of the day.

Figure 4: Growth Patterns of the Character Networks The number of characters and the number of edges in Les Misérables grow in a nonlinear fashion, indicating that different stages in narratives contribute differently to the network growth via character introduction or formation of new connections.

There is also a stretch of chapters (S4) where the network shows little growth. This part largely coincides with Volume 2 (“Cosette”) of the novel composed of those chapters that contain no narrative progression (i.e. the author digresses to discuss the battle of Waterloo, religion, the vagrant children of Paris, etc.) or that show no network growth, being mainly about Valjean and Cosette’s flight from the pursuit of Thenadier while avoiding people in general. Finally, near the end of the narrative at S5, it is the number of edges that lead the growth of the network while shows little increase. This – new edges being created between existing nodes without the addition to new ones – implies a convergence of the characters into a common environment: this part in fact describes the scene at the barricade where nearly all major characters (who have been introduced before) converge.

In Fig. 5 we show the appearance and degree of the individual characters. The final histogram of is shown in Fig. 5

 (A). It has a skewed distribution with many characters appearing in a handful of chapters and a few characters appearing in many chapters, for instance Marius appears in

chapters, Valjean in , and Cosette in , whereas the mean and the median are and respectively, nearly an order of magnitude smaller than the most frequent characters. In Fig. 5 (C) we show the temporal growth of each character’s cumulative appearance. Although Marius and Valjean are similar in the total appearances ( and , respectively), how these values are reached are very different. Valjean first appears in the beginning of the novel, then with regularity until there is a noticeable absence between chapters and (indicated by a plateau). During Valjean’s absence, Marius, making his first appearance in chapter , takes the center stage in the novel and appears in almost every chapter until he overtakes Valjean in appearance. This is a direct reflection of the structure of Les Misérables: the first part is mainly about Valjean (with Marius absent), the second part is mainly about Marius (with Valjean absent), and the final part features both as major characters. The degree (Fig. 5 (B) and (D)), on the other hand, differs in interesting ways from . The three highest-degree nodes are Valjean (), Cosette (), and Javert (), whereas Marius is down to . The degree therefore captures the nature of the social sphere around a character that appearance alone cannot tell: Valjean is a well-travelled character linking many different spheres of the story world, whereas Marius associates with a narrow pool of characters (namely the young fellow rebels) and his love interest Cosette.

Figure 5: Centralities of Characters in Les Misérables Histograms of (A) appearance and (B) unweighted degrees of the characters of Les Misérables. The histograms are relatively skewed, with some characters having high values and many having small values. The three most frequently appearing characters are Marius (122), Valjean (121), and Cosette (97), while the three highest-degree characters are Valjean (43), Cosette (41), and Javert (39), and the three highest-weighted degree characters are Valjean (5203), Marius (4148), and Cosette (3977). The discrepancies indicate the differences in the characteristics of their social networks. (C) and (D) are the growths of these quantities for each character, showing the differing points at which the characters are actively depicted.

iii.0.2 Sentiment Analysis and Narrative Progression

The chapter sentiments are shown in Fig. 2 (C). We also study how the sentiments and content of chapters match: Positive chapters tend to depict uplifting characters (e.g. Myriel, a virtuous man) and events (e.g., Fantine going on a picnic, Cosette and Marius falling in love, etc.), whereas negative chapters depict pain and suffering (e.g., Valjean nearly drowning, Fantine in misery, war, lovers parting, etc.). We also see alternating clusters of positive and negative chapters, indicating a certain pattern of emotional fluctuations. This is reminiscent of an interpretation of narrative as a metaphor for life that fluctuates between contradictory states of harmony and peace, and tension and fear McKee (1997). We also note that the average chapter SPI is , i.e. net positive. We believe this is an example of the so-called “Pollyanna effect” referring to a universal positivity bias in human language Dodds et al. (2015).

We show the sentiments for select characters and pairs in Fig. 6. In Fig. 6 (A) we show ten characters – five major (frequently appearing) and five minor (infrequently appearing) – for comparison. While their average values are positive (due to the Pollyanna effect), the joyless Javert is more negative than other main characters such as Marius, Valjean, and Cosette. Nevertheless, major characters experience a wider range of SPIs than the minor ones, which we believe indicates their sentimental complexity. In the figure we see that Valjean appears frequently in both positive and negative chapters, showing his role as the carrier of varying sentimental states, in contrast to short-lived minor ones. In Fig. 6 (B) we show the SPIs of a number of character pairs. Valjean understandably shows a higher average SPI when with his adoptive daughter Cosette than with his archnemesis Javert, although the wide range of SPIs again indicate the sentimental complexity of the leading character pairs. The Pollyanna effect still stands true here; in general, the average SPI of character pairs (dotted line) is a positive value at . Therefore it is sensible to define the cosentiment of a character pair to be . This quantity was already used for edge colors in Fig. 3. We can also use this to study the sentimental states within and between communities, shown in Fig. 6 (C). In the figure, the diagonal elements show the fractions of positive and negative edges inside the communities, whereas the off-diagonal elements show those between two communities. The circle radius indicates the logarithm of the number of edges. Communities II and VI are in general the most negative inside, showing the harsh and tragic nature of the common experiences of the prisoners and revolutionaries. To the contrary, Communities V and VII are the most positive inside. Between communities, II and VI are the most negative, due to Javert’s presence at the tragic barricade scene with the revolutionaries.

Figure 6: Sentiments of Characters, Character Pairs, and Communities. (A) Sentiment Polarity Indices (SPIs) for the characters of Les Misérables. The yellow boxes indicate the SPI ranges of 50% of the chapters around each character’s median (25% below, 25% above). The leading characters (higher in the plot) feature a wider range of SPIs than the marginal ones (lower in the plot), reflecting their role in the sentimental fluctuations of the narrative. The SPIs of the chapters in which Valjean appears are shown below. (B) SPIs for character pairs. Valjean indeed shows an higher SPI when together with protégée Cosette than pursuer Javert, although SPIs for leading characters again show a wide range. (C) The intra- and inter-community cosentiments. Communities II and VI are in general the most negative inside, due to the fact that prisoners and revolutionaries share difficult and tragic experiences (harsh prison terms and death at the barricade). Communities V and VII are the most positive inside. Between communities, II and VI are the most negative, due to Javert’s presence at the barricade with the revolutionaries.

We now study the sentimental qualities of the network and how they change along the narrative progression. It is shown in Fig. 7, where each panel corresponds to a Sequence of the novel first introduced in Fig. 1

 (B). The definition and rationale for the Sequence are as follows: Sometimes a plot or a storyline may span multiple consecutive narrative units, which makes it reasonable to bundle them into a larger one. To achieve it we need to determine the similarity between subsequent narrative units. One possibility we use here is the character composition; consecutive units belonging to the same or highly similar plots are likely to contain similar characters. Specifically, starting from the 40 Books of Les Misérables(excluding eight that contain no characters), we bundle the consecutive ones whose characters are similar above a prescribed threshold. Using the cosine similarity (although others such as the Jaccard index may be used) and setting the threshold to be the average similarity (

) between consecutive book pairs, we end up with the Sequences shown in Fig. 7. We also show the fraction of negative and positive edges.

Figure 7: Network Snapshots Showing Sentiments and Narrative Flow Snapshots of character networks in the 21 Sequences of Les Misérables. Edges are colored according to the cosentiment between the characters. The fractions of positive and negative edges are indicated in each snapshot, along with the summary of major plots in the Sequence. The sentimental fluctuations often reflect the build up of drama, tension, and resolution.

The correlation between sentimental fluctuations and narrative flow are perhaps the best understood from Fig. 7 by studying Marius and his revolutionary friends. When they are first introduced in Sequence 8, the sentiment is overwhelmingly positive, reflecting the air of optimism from their cause. Such initial positivity is not long-lived, however, as they have to struggle with their adversaries in subsequent Sequences 11, 14, and 15. After they overcome these challenges they briefly regain their positive sentiment (Sequence 16), but then are thrust into the most tragic and climactic circumstances (Sequences 17–20) that show high negativity. Finally, at the end of the novel (Sequence 21) the resolution is reached showing a highly positive sentiment. The fluctuations between positive and negative in this fashion are known to be by design McKee (1997)

iii.0.3 Topic Modeling and Mapping Interaction Dynamics via Topical States

Figure 8: Complete List of Topics for Les Misérables. 50 Topics of Les Misérables found via Non-Negative Matrix Factorization (NNMF). Strongly associated keywords are also listed (the strongest keywords in bold). The topics are frequently about the characters (e.g. T1, T2, and T3), places (e.g. T11, T20, and T25), or events (e.g. T7, T22, and T42).

We set . The results for all 50 are given in Fig. 8. The keywords (the strongest ones are in bold) tell us that the topics are often about the characters (e.g. T1, T2, and T3), places (e.g. T11, T20, and T25), or events (e.g. T7, T22, and T42). The character–topic associations are visualized in Fig. 9 (A) for Valjean and Marius, scaled so that the strongest topic fills the space between the two circles. The five strongest topics for each characters are T1, T4, T3, T2, and T7 for Vajean, and T2, T1, T3, T4, and T14 or Marius. From Fig. 8 we see that they are about themselves and related characters or actions (valjean, escape, marius, eponine, etc.). We can also use them to identify topics associated with the communities by summing up the over the chapters that contain two or more of the members of the community, which are shown in Fig. 9 (B). The topics shown are relevant to multiple members of the group, for instance, characters from inside the community (e.g., T1 and T4 for Community I) or outside (e.g., T2 and T29 for Community I), or the events or places, for instance T41 (the trial) for Community II of Javert and Valjean’s fellow prison inmates.

Figure 9: Topics Associated with Valjean, Marius, and The Communities. (A) Topics strongly associated with Valjean (left) and Marius (right). The topic-character association strengths are scaled so that the largest value fills the space between the two circles. Topic T1 is the most relevant to Valjean, while Topic T2 is to Marius. They contain the respective character names as the strongest keywords, but also contain with words closely related to each character. (B) Topics strongly associated with the communities in Fig. 2. The topics can be about the characters inside the community, or even from the outside as long as they are sufficiently associated with multiple members of the community. For instance, T2 (marius), T29 (sister, fantine), and T19 (fauchelevent) are strongly associated with Community I, although the characters belong to other communities. The topics can also be the events involving the community members, for instance T41 (the trial – attorney, jury) for Community II composed of Javert and Valjean’s fellow prison inmates.

We now introduce an interesting use of topics for representing narrative dynamics. An impactful event in a person’s life is one that brings about significant changes in the person’s state. This means that even in a narrative, if one could define a character’s state at a given point, one could measure the impact or significance of an event by comparing the states from before and after the event. We use topic modeling to do exactly this, by interpreting as the topical state of the character. The idea is straightforward: Since an associated topic indicates the action, events, interactions, etc. taking place in the character’s presence, it can be understood as telling us the situation or the state of the character. While Fig. 9 (A) shows the topical states averaged over the entire novel, we can define a character’s topical state at a given point in the narrative by obtaining the topical associations from the corresponding chapter(s). As an example now study the impact that the interactions between Marius and Valjean have on the character’s states. For simplicity we consider Valjean and Marius to be interacting largely two times in Les Misérables, prompting us to partition the novel into the following four phases:

  1. Phase I (Chapters 1 to 233): Before the first interaction. Valjean and Marius lead separate lives.

  2. Phase II (Chapters 234 to 266): The first interaction take place. Marius falls in love with Cosette, causing Valjean to become anxious about losing her.

  3. Phase III (Chapters 272 to 295): Valjean is absent from the narrative, so no interaction takes place. Marius parts from Cosette, then joins the revolutionaries at the barricade.

  4. Phase IV (Chapters 296 to the end of the narrative): The second interaction takes place. Marius gets injured at the barricade, then is rescued by Valjean. Cosette and Marius marry. Valjean dies.

Our strategy now is to observe the changes in characters’ states . We then use them to understand the details of the interaction dynamic. First, the changes in for the characters at the end of each phase are shown in Fig. 10 (A), obtained by subtracting the immediately before the interactions from that from immediately after. At the end of Phase I, Valjean is the most strongly associated with T1, T5, T21, T47, and T29, whereas Marius is with T2, T14, T8, T32, and T37 which represent their trajectories up to that point according to Fig. 8. They share no common topics, as expected from the lack of any interaction up to that point – in fact, the correlation between their is negative at . At the end of Phase II after their first interaction the correlation increases to , showing that an interaction works to correlate the character states. At the end of Phase III (no interaction) it decreases again slightly to . At the end of Phase IV where they interact again for the final time and quite extensively it reaches its highest value of . These show that an interaction functions to assimilate the characters’ states, and an inspection of the changes provides us with more detail of this assimilation dynamics. For simplicity, we again focus on the five topics (for each character) that gain the most in strength after each phase, shown in Fig. 10 (A). After the first interaction, we find that the five such topics for Valjean are T4, T2, T1, T7, and T25, whereas for Marius they are T1, T4, T25, T7, and T45. When we compare the strongly associated topics from before and after the interactions, we find there are some that we can interpret as having been transferred from one character to the other. An example is T2 (marius), the strongest one with Marius before Phase II, which gains the most for Valjean after. The same goes for T1 (valjean), this time from Valjean to Marius. Second, there are topics that have entered the characters states exogenously, i.e. those that not strongly associated with either character. They represent new common experiences or interests that occur during the interactions: T4 (cosette), T7 (revolution), and T25 (garden) are such cases. They again reflect the story accurately: Cosette becomes the focal point of both characters, as a new love interest for Marius that causes severe anxiety to Valjean. Some topics enter only one character’s state, such as T45 (mabeuf) which is about a character Mabeuf who shares his story with Marius at the barricade, but has little to do with Valjean – Valjean’s topical state indeed has near-zero component of T45. Next, during Phase III, T11 (rue), T24 (barricade), T46 (hucheloup), T42 (revolt), and T28 (gavroche) gain the most strength for Marius, reflecting the events and the characters he experiences during that time. Valjean is absent. Finally, during Phase IV, T3 (enjolras), T2, T28 (gavroche), T24 (barricade), and T1 gain the most strength with Valjean, whereas topics T3 T1, T28, T35 (sand), and T12 (javert) gain the most strength with Marius. Note how the directionality of T28 and T24 from Marius to Valjean reflects the actual way things happen between the characters: Gavroche (T28), a friend of Marius’, carries a letter from Marius to Valjean that motivates Valjean to join the barricade (T24) in search of Marius. Our discussion here about topic transfers and entry can be systematically visualized as in Fig. 10 (B) on top of the basic interaction timeline first introduced in Fig. 1, showing that the textual information indeed allows to construct a much more detailed picture of an interaction than a simple occurrence-based network construction.

Figure 10: Mapping Out Interactions Diagrammatically As Dynamic Topic Exchange. Events lead to character transformation, which we quantify via the character’s topical states. With respect to the interaction between Valjean and Marius, we divide Les Misérables into four phases. (A) The net changes in the topical states of the characters at the end of each phase, quantified by the differences . After Phase II, T2 (the strongest topic for Marius before) shows a sharp increase for Valjean. Likewise, T1 (Valjean’s strongest topic before) shows a sharp increase for Marius. T4, T7, and T25 increase for both characters, while T45 increases only for Marius. (B) Diagrammatic representation of the changes of Marius’ and Valjean’s topical states as ’topic transfers’ during each phase; Topics can be exchanged between characters (e.g., T1 and T2 during Phase II) or enter either character’s topical state exogenously (e.g., T4, T7, T25, and T45). The dirctions can also reflect those of actual story elements: during Phase IV (Chapters 296–365), Valjean, prompted by a letter from Marius, joins the barricade. This is directly reflected in the transfer of topics T24 and T28 from Marius to Valjean.

The results provided in this section, by showing that the story of a narrative can be identified, quantified, analyzed, and visualized by making use of appropriate analytical and computational tools, we believe demonstrate the benefits and opportunities of approaching traditional subjects as narrative from a novel perspective that allows us to find new patterns and gain a richer understanding not readily available previously.

Iv Discussions and Conclusions

In this paper we proposed a network-based framework for modeling a narrative by focusing on the characters and their interactions. We started by representing a narrative as a set of interacting character timelines, from which we constructed a growing character network. To legitimize our approach it was necessary to understand how the character network topology and dynamics reflected the narrative structure correctly. We found that character centralities captured the role and the nature of the social spheres of characters in the narrative, while the temporal growth of network showed distinct phases with differing patterns of increasing nodes or edges depending on whether the narrative was focusing on isolated characters (stagnant growth), expanding the story world by introducing new characters (growth led by number of nodes), or when existing characters converge into the building process to the resolution (growth led by number of edges).

An important characteristic of well-written drama is that it evokes emotion in the reader, which in the western literary tradition is conventionally represented by the generic division of drama into comedy and tragedy. This had an interesting connection to a modern computational methodology called sentiment analysis. We found that many characters, especially the central ones, showed significant fluctuations of sentiments during the narrative flow, acting as the carriers of mood and emotions of the narrative. This was true of character relationships as well, and we showed how the sentimental fluctuations correlated with the narrative progression that showed detectable patterns of dramatic tension build-up and resolution.

Finally, we used topic modeling as a way to define the state of a character via the topics (keywords) with which they are associated at various points in the narrative. This allowed us to trace quantitatively the changes in characters’ states, and quantify and map out the details of an event or an interaction between characters. We also demonstrated that the flow of topics between characters can reflect the actual story in interesting ways, providing us with a way to systematically represent the patterns of character interactions that previously resided in the text of the narrative.

We believe that our paper presents a wide range of ideas for studying narrative structures that merit further exploration using the methods of network science, data analysis, and computational linguistics. Looking further, representing a narrative as a dynamically unfolding system of character networks and interactions also sets the stage for using theories and tools for understanding of dynamical systems, not only networks. Advances in this area have practical implications as well, such as an improved algorithm for computer-assisted writing and storytelling which no doubt can benefit from a more robust understanding of the patterns of character relationships and interactions. Given the ubiquity and importance of narratives, we hope that future developments based on our work will be beneficial for a wide range of fields including literature, communication, and storytelling.

The authors would like to thank Kyungyeon Moon, Wonjae Lee, and Bong Gwan Jun for helpful comments. This work was supported by the National Research Foundation of Korea (NRF-20100004910 and NRF-2013S1A3A2055285), BK21 Plus Postgraduate Organization for Content Science, and the Digital Contents Research and Development program of MSIP (R0184-15-1037, Development of Data Mining Core Technologies for Real-time Intelligent Information Recommendation in Smart Spaces).


  • Michel et al. (2011) J.-B. Michel, Y. K. Shen, A. P. Aiden, A. Veres, M. K. Gray, J. P. Pickett, D. Hoiberg, D. Clancy, P. Norvig, J. Orwant, et al., Science 331, 176 (2011).
  • (2) Project Gutenberg, URL
  • Dodds et al. (2015) P. S. Dodds, E. M. Clark, S. Desu, M. R. Frank, A. J. Reagan, J. R. Williams, L. Mitchell, K. D. Harris, I. M. Kloumann, J. P. Bagrow, et al., Proceedings of the National Academy of Sciences 112, 2389 (2015).
  • Schich et al. (2014) M. Schich, C. Song, Y.-Y. Ahn, A. Mirsky, M. Martino, A.-L. Barabási, and D. Helbing, Science 345, 558 (2014).
  • Kim et al. (2014) D. Kim, S.-W. Son, and H. Jeong, Scientific Reports 4, 7370 (2014).
  • Park et al. (2015)

    D. Park, A. Bae, M. Schich, and J. Park, EPJ Data Science

    4, 1 (2015).
  • Newman (2010) M. Newman, Networks: An Introduction (Oxford University Press, Pres2010).
  • Albert and Barabási (2002) R. Albert and A.-L. Barabási, Reviews of Modern Physics 74, 47 (2002).
  • Easley and Kleinberg (2010) D. Easley and J. Kleinberg, Networks, Crowds, and Markets: Reasoning About a Highly Connected World (Cambridge University Press, 2010).
  • Han et al. (2011) J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques (Elsevier, New York, 2011).
  • Adamic and Huberman (2000) L. A. Adamic and B. A. Huberman, Science 287, 2115 (2000).
  • Albert et al. (1999) R. Albert, H. Jeong, and A.-L. Barabási, Nature 401, 130 (1999).
  • Choi et al. (2006) J. H. Choi, G. A. Barnett, and B.-S. Chon, Global Networks 6, 81 (2006).
  • Borgatti and Foster (2003) S. P. Borgatti and P. C. Foster, Journal of Management 29, 991 (2003).
  • Grimm et al. (2005) V. Grimm, E. Revilla, U. Berger, F. Jeltsch, W. M. Mooij, S. F. Railsback, H.-H. Thulke, J. Weiner, T. Wiegand, and D. L. DeAngelis, Science 310, 987 (2005).
  • Moretti (2011) F. Moretti, New Left Review 81, 80 (2011).
  • Moretti (2013) F. Moretti, Distant Reading (Verso, New York, 2013).
  • (18) Pamphlets by Stanford Literary Lab, URL
  • (19) Box office mojo, URL
  • (20) The Numbers, URL
  • Newman and Girvan (2004) M. E. J. Newman and M. Girvan, Phys. Rev. E 69, 026113 (2004).
  • Elson et al. (2010) D. K. Elson, N. Dames, and K. R. McKeown, in Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (Association for Computational Linguistics, 2010), pp. 138–147.
  • Carron and Kenna (2012) P. M. Carron and R. Kenna, EPL 99, 28002 (2012).
  • Mac Carron and Kenna (2013) P. Mac Carron and R. Kenna, The European Physical Journal B 86, 1 (2013).
  • Kydros et al. (2015) D. Kydros, P. Notopoulos, and G. Exarchos, International Journal of Humanities and Arts Computing 9, 115 (2015).
  • Waumans et al. (2015) M. C. Waumans, T. Nicodème, and B. Hugues, PLoS Onene 10, e0126470 (2015).
  • Welsh (1978) A. Welsh, Nineteenth-Century Fiction 33, 8 (1978).
  • Rimmon-Kenan (2003) S. Rimmon-Kenan, Narrative Fiction: Contemporary Poetics (Routledge, London, 2003).
  • Bal and Boheemen (2009) M. Bal and C. V. Boheemen, Narratology: Introduction to the Theory of Narrative (University of Toronto Press, 2009).
  • Abbott (2008) H. P. Abbott, The Cambridge Introduction to Narrative (Cambridge University Press, 2008).
  • Field (2007) S. Field, Screenplay: The Foundations of Screenwriting (Delta, New York, 2007).
  • Vogler (2007) C. Vogler, The Writer’s Journey (Michael Wiese Productions, Seattle, 2007).
  • Propp (2010) V. Propp, Morphology of the Folktale (University of Texas Press, Austin, Texas, 2010).
  • (34) Moviegalaxies, URL
  • (35) Les Misérables, URL
  • Freytag (1896) G. Freytag, Freytag’s Technique of the Drama: An Exposition of Dramatic Composition and Art (Scholarly Press, 1896).
  • Tausczik and Pennebaker (2010) Y. R. Tausczik and J. W. Pennebaker, Journal of language and social psychology 29, 24 (2010).
  • Gonçalves et al. (2013) P. Gonçalves, M. Araújo, F. Benevenuto, and M. Cha, in Proceedings of the first ACM conference on Online social networks (ACM, 2013), pp. 27–38.
  • Jurgens and Stevens (2010) D. Jurgens and K. Stevens, in Proceedings of the ACL 2010 System Demonstrations (2010), pp. 30–35.
  • Van de Cruys and Apidianaki (2011) T. Van de Cruys and M. Apidianaki, in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1 (2011), pp. 1476–1485.
  • Stevens et al. (2012) K. Stevens, P. Kegelmeyer, D. Andrzejewski, and D. Buttler, in

    Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

    (Association for Computational Linguistics, 2012), pp. 952–961.
  • Lee and Seung (1999) D. D. Lee and H. S. Seung, Nature 401, 788 (1999).
  • Lee and Seung (2001) D. D. Lee and H. S. Seung, in Advances in Neural Information Processing Systems (2001), pp. 556–562.
  • Xu et al. (2003) W. Xu, X. Liu, and Y. Gong, in Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval (ACM, 2003), pp. 267–273.
  • Zhao and Karypis (2004) Y. Zhao and G. Karypis, Machine Learning 55, 311 (2004).
  • McKee (1997) R. McKee, Substance, Structure, Style, and the Principles of Screenwriting (HarperCollins, New York, 1997).
  • Min and Park (2016) S. Min and J. Park, in Complex Networks VII: Studies in Computational Intelligence (2016), pp. 257–266.
  • Wasserman and Faust (1994) S. Wasserman and K. Faust, Social Network Analysis: Methods and Applications (Cambridge University Press, 1994).
  • Marsden (1990) P. V. Marsden, Annual Review of Sociology pp. 435–463 (1990).