Understanding narratives requires the ability to interpret character intentions, desires and relationships. The importance of characters and characterization in narratives has been explored in recent works that focus on their roles and representations [4, 24, 7], as against a plot-centric perspective of a narrative as primarily a sequence of events [15, 22, 8]. However, while such approaches can identify characters types, they do not model relationships between characters in a narrative.
In this work, we address the problem of inferring cooperative and adversarial relationships between people in narrative summaries. Identifying character cooperation and conflict is essential for narrative comprehension. It can guide interpretation of narrative events, explain character actions and behavior and steer the reader’s expectation about the plot. As such, it can have value for applications such as machine reading, QA and document summarization.
As a motivating example, let us consider the plot summary in Figure 1 (condensed here for brevity). In this passage, the relations between the principal characters are explicated through a combination of cues, as seen in Figure 1. For instance, one can infer that Alex (A) and Tommy (T) have a cooperative relationship through a combination of the following observations (among others): (1) T initially ‘befriends’ A, (2) A works for T, and its connotation that A is likely to cooperate with T , (3) T aids A in fights, (4) A is a friend of T’s wife , (5) A and T have a common adversary. In particular, we note that cues (4) and (5) cannot be extracted from looking at the relation between A and T in isolation, but depend on their relations with others. In this work, we show that such indirect structural cues can be very significant for inference of character relationships.
Our problem formulation assumes a fixed relation between a pair of characters within a narrative. While this can be problematic since relationships can transform over time; in a wide range of examples, the assumption is reasonable. Even in complex narratives, relationships remain persistent within sub-parts. From a pragmatic perspective, the approximation serves as a useful starting point for research. Our main contributions are:
We introduce the problem of characterizing cooperation in interpersonal relations in narrative texts; and provide an annotated dataset of 153 movie summaries for the task.
We design linguistic, semantic and discourse features for this task, and demonstrate their relative importance.
We formulate the problem as structured prediction, and introduce a joint model that incorporates both text-based cues and structural inferences.
We provide an extension to exploit narrative-specific regularities, enabling content-based models. 111e.g., predict more hostile relations in revenge-dramas, rather than love stories
The layout of the paper is as follows: In section 3, we present our models, and formulation of the problem as a structured prediction. In section 4, we describe the features used by our models in detail. We then describe our dataset, and present quantitative and qualitative evaluations of our models.
2 Related work
Existing research on characterizing relationships between people has almost exclusively focused on dialogue or social network data. Such methods have explored aspects of relations such as power , address formality  and sentiment  in conversations. Recently,  studied the problem of parsing movie screenplays for extracting social networks. However, analysis of character relationships in narrative texts has largely been limited to simplistic schemes based on counting character co-occurrences in quoted conversations  or social events . We believe this is the first attempt to infer relation polarities in narrative texts.
In terms of approach, our use of structural triads as features is most closely related to  who use an unsupervised joint probabilistic model of text and structure for the task of inducing formality from address terms in dialogue, and  who empirically analyze signed triads in social networks from a perspective of structural theories. Such social triads have previously been studied from perspectives of social psychology and networks [17, 6].
3 Relation classification as Structured Prediction
We formulate the problem of relation classification to allow arbitrary text-based and structural features. We consider the problem as a structured prediction, where we jointly infer the collective assignment of relations-labels for all pairs of characters in a document. Let denote a narrative document for which we want to infer relationship structure . We could think of
as a graph with characters as nodes, and relationship predictions corresponding to edge-labels. We assume a supervised learning setting where we have labeled training set. For each , we have a set of allowed assignments (consisting of combinations of binary assignments to each edge-label in
). Following standard approaches in structured classification, we consider linear classifiers of form:
is a feature vector that can depend on both the narrative documentand a relation-polarity assignment , is a weight vector, and denotes a linear score indicating the goodness of the assignment. Finding the best assignment corresponds to the decoding problem, i.e. finding the highest scoring assignment under a given model. On the other hand, the model parameters
can be learnt using a voted structured perceptron training algorithm. The structured perceptron updates can also be seen as stochastic sub-gradient descent steps minimizing the following structured hinge loss:
For our problem, we define the feature vector as a concatenation of features based on text and structural components: . The text-based component can be defined by extending the traditional perceptron framework as . Here consists of the set of annotated character-pair relationships for the narrative text , denotes the text-based feature-representation for the character-pair (as described in Section 4.1), and is the binary assignment label () for the pair in . On the other hand, our structural features focus on configurations of relationship assignments of triads of characters, and are motivated in our discussion of transitive relations in Section 4.2 . We note that while this is not the case in the current work, structural features can also encode character attributes (such as age or gender) in conjunction with assignment labels .
Learning and Inference: Structured perceptrons have been conventionally used for simple structured inputs (sequences and trees) and locally factoring models, which are amenable to efficient dynamic programming inference algorithms. This is because updates require inference over an exponentially large space (solving the decoding problem in Equation 1), and updates from inexact search can violate convergence properties. However,  show that exact search is not needed for convergence as long as we can guarantee updates on ‘violations’ only, i.e. it suffices to find a labeling assignment with higher score than the correct update. Additionally, edge labels are expected to be relatively sparse for our domain since character graphs in most narratives are not fully connected. Hence, the inference problem decomposes for relation-edges which are not parts of structural triangles, and the decoding problem can be exactly solved for the vast majority of narrative texts.
For inference on a new document where the edge relations are not known, decoding can proceed by initializing the narrative graph to high confidence edges from the text-based model only (character relationships firmly embedded in text), and appending single edges which complete triads. To avoid speculative inference of relations between character pairs that are ungrounded in the text, we only consider structural triads for which at least two edges are grounded in the text while decoding with the structural model.
3.1 Accounting for narrative types
The framework described in the previous section provides a simple model to incorporate text-based and structural features for relation classification. However, a shortcoming in the approach is that the model is agnostic to narrative types. Ideally, a model could allow differential weights to features depending on the narrative type. As speculative illustrations, ‘Mexican standoffs’ might be common in ‘revenge/gangster’ narratives, or family-relations might be highly indicative of cooperation in children stories; and a model would ideally learn and leverage such regularities in the data.
We present a clustering-based extension to our structured model, which can incorporate features descriptive of the narrative text to infer regularities, and make content-based predictions. Let us surmise that the data consists of natural clusters of narrative-types, with a specific structured model for each cluster (specified by weights ). For each narrative text , we associate a vector that represents content and determines narrative type. Examples of such representations could be keywords for a document, genre information for a movie or novel, topic proportions of words in the text from a topic-model, etc. We model the membership of narrative to the cluster by a softmax logistic multinomial.
From our observation of the loss objective for the structured perceptron in Equation 2, we can define the expected loss for a narrative text () under the clustering model as:
Then the overall objective loss over the training set is:
We jointly minimize the overall objective through a block-coordinate descent procedure. This consists of a two-step alternating minimization of the objective w.r.t. the prediction model weights and the clustering parameters , respectively. In the first step, we optimize the prediction model weights while fixing the clustering parameters . This can be done by weighting the training examples for each cluster by their cluster membership; and invoking the structured perceptron procedure for each cluster. In the alternating step, we fix the predictions model weights; and update the clustering parameters using gradient descent:
This can be interpreted as a bootstrapping procedure, where given cluster assignments of points, we update the prediction model weights; and given losses from the prediction model, update data clusters parameters to reassign the most violating data-points. We note that the objective is non-convex due to the softmax, and hence different initializations of the procedure can lead to different solutions. However, since each sub-procedure decreases the objective value; the overall objective decreases for small enough step sizes. The procedure is summarized in Algorithm 2. For prediction, each narrative text is assigned to the most likely cluster with the clustering model.
To efficiently use training data, we allow parameter-sharing across cluster-specific prediction models, drawing from methods in multi-task learning . In particular, we model each as composed of a shared base model, and additive cluster-specific weights:
Implementationally, we can do this by simply augmenting cluster-specific feature representations as follows: Here is a hyper-parameter between 0 and 1, which specifies the weighting of the shared and cluster-specific models. negates clustering, and reduces the clustering model to the plain structured model without clustering. Conversely, implies no parameter sharing across clusters.
In this section, we outline the text-based and structural features used by our classification models. The text-based features make use of existing linguistic and semantic resources, whereas the structural features are based on counts of specific signed social triads, which can be enumerated for any assignment. We provide implementations for these features, as well as the complete pipeline for annotating relationship polarities for a new text document on our project webpage.
4.1 Text-based cues ()
These features aim to identfiy relationships between pairs of characters in isolation. These are based on resources such as sentiment lexicons, syntactic and semantic parsers, distributional word-representations, and multi-word-expression dictionaries, and are engineered to capture:
Overall polarities of interactions between characters (from text-spans between coreferent mentions) based on lexical and phrasal-level polarity clues.
Semantic connotations of actions one agent does to the other, actions they share as agents or patients, and how often they act as a team.
Character co-occurrences in semantic frames that evoke positivity, negativity or social relationship .
Character similarity based on whether they are described by similar adjectives; and the narrative sentiment of adverbs describing their actions.
Existence of familial relations between characters.
We base our entity-tracking on the Stanford Core-NLP system; and augment the computation of all sentiment features with basic negation handling. Based on such features extracted for each character pair, relationship characterization can be treated as supervised classification (with
corresponding to cooperative or adversarial relations). Our baseline unstructured approach uses only these features with a logistic regression model.
Feature details: Texts are initially processed with the Stanford Core NLP system to identify personal named entities, and obtain dependency parses. As basic post-processing, we mark coreferent text-spans with their corresponding personal entities, using basic string-matching and pronominal resolution from the Stanford coreference system. For enumerating actions by/on two characters of interest, we identify verbs in sentences for which the characters occurred in a agent or a patient role (identified using ‘nsubj’ or ‘agent’; and ‘dobj’ or ‘nsubjpass’ dependencies respectively). We extend this for conjoint verbs, identified by the ‘conj’ relation (e.g., ‘A shot and injured B’). The dependency relation ‘neg’ is used to determine negation status of verbs.
Given a pair of characters, we identify the set of sentences which contain mentions of both (identified by coreferent text-spans). For this set, we extract the arithmetic means, maximum and cumulative sums for the following sentence-level cues as text-based features (whenever meaningful):
Are Team: This models if the two characters participate in acts together. It is a binary feature indicating if the two characters were both agents (or patients) of a verb in a sentence e.g., ‘Malik provokes Tommy and Axel’.
Acts Together: These features count the number of actions with positive and negative connotations that either character (in an agent role) does to the other (in a patient role). There are three variants based on different word connotation resources, viz., semantic lexicons for subjectivity clues , sentiment  and prior-polarity  of verbs. The feature does not fire for neutral verbs. e.g, ‘Malik blackmails Axel’.
Surrogate Acts Together: Coverage for the above features suffers from limitations of NLP processing tools. e.g., In ‘On being provoked by Malik, Tommy…’ , Tommy is not a direct patient of the verb. These features extend coverage to verbs which have either of the characters as the agent or the patient in sentences that did not contain any other character apart from the two of interest.
Adverb Connotations: This feature models the narrative’s overall bias in describing either character’s actions by summing the semantic connotations of adverbs that modify their joint(or surrogate) acts. e.g., ‘Tommy nobly befriends Axel’. Positive adverbs count as +1, negative as -1. Uses the same connotation resources as 2.
Lexical sentiment: These features count the number of positive and negative sentiments words or multi-word phrases in spans between mentions of the two characters using sentiment lexicons (similar to 2). For multi-word phrases (identified from a list of MWEs), we use a dictionary to map these to single words if possible, and look for these words in connotation lexicons. e.g., ‘kick the bucket’ maps to ‘die’. This helps with phrases like ‘fell in love’, where ‘fell’ has a negative connotation by itself.
Relation keywords: This feature indicates presence of keywords denoting familial ties between characters (‘father’, ‘wife’, etc.) in spans between character mentions.
Frame semantic: These are based on Framenet-style parses of the sentence from the Semafor parser . We compiled lists of frames associated with: (i) positive or (ii) negative connotations (e.g., frames like cause hurt or rescue), (iii) personal or professional relationships (e.g., forming relationships). Three binary features denote frame evocation for each of these lists.
4.2 Structural cues ()
As motivated earlier, relationships between people can also be inferred from their relationships with others in a narrative. Our thesis is that a joint inference model that incorporates both structure and text would perform better than one that considers pairwise relations in isolation. In some domains, observed relations between entities can directly imply unknown relations among others dues to natural orderings. For example, temporal relations among events yield natural transitive constraints. For the current task; such constraints do not apply. While structural regularities like ‘a friend of a friend is a friend’ might be prevalent, these configurations are not logically entailed; and affinities for such structural regularities must be directly learnt from the observed data.
In Figure 2, we characterize the primary triadic structural features that we use in our models, along with our informal appellations for them. The values of the four structural features for a narrative document and relation polarity assignment are simply the number of such configurations in the assignment, and are easily computed. The empirical affinities for such configurations, as reflected in corresponding weights can then be learnt from the data.
We processed the CMU Movie Summary corpus, a collection of movie plot summaries from Wikipedia, along with aligned meta-data ; and set up an online annotation task using BRAT . We use Stanford Core NLP annotations and basic post-processing to identify personal entities in each text.
Annotators could choose pairs of characters in the text, and characterize a directed relationship between them on an ordinal five-point scale as ‘Hostile’, ‘Adversarial’, ‘Neutral’, ‘Cooperative’ or ‘Friendly’. This resulted in a dataset of 153 movie summaries, consisting of 1044 character relationship annotations.222Most relations were annotated symmetrically. For relations with asymmetric labels, we ‘averaged’ the annotations in the two directions to get the annotation for the relation. The dataset is made publicly available for research on our project webpage.
For evaluation, we aggregated ‘hostile’ and ‘adversarial’ edge-labels, and ‘friendly’ and ‘cooperative’ edge-labels to have two classes (neutral annotations were sparse, and ignored in the evaluation). Of these, 58% of the relations were classified as cooperative or friendly, while 42% were hostile or adversarial. The estimated annotator agreement for the collapsed classes on a small subset of the data was0.95.
6 Evaluation and Analysis
In this section, we discuss quantitative and qualitative evaluation of our methods. First, we make an ablation study to assess the relative importance of families of text-based features. We then make a comparative evaluation of our methods in recovering gold-standard annotations on a held-out test set of movie summaries. Finally, we qualitatively analyze the performance of the model, and briefly discuss common sources of errors.
Feature ablation: Figure 4 shows the cross-validation performance of major feature families of text features on the training set. We note that Frame-semantic features and adverbial connotations of character actions do not add significantly to model performance. This is perhaps because both these families of features were sparse. Additionally, frame-semantic parses were observed to have frequent errors in frame evocation, and frame element assignment. On the other hand, we observe that joint participation in actions (as agent or patient) is a strong indicator of cooperative relationships. In particular, incorporating these (‘Are Team’
) features was seen to improve both precision and recall for the cooperative class; while not degrading recall for the non-cooperative class. Further, while ignoring sentiment and connotation features for surrogate action features results in marginal degradation in performance; the most significant features are seen to be sentiment and connotation features for actions where characters occur in SVO roles (‘Acts Together’ features); and overall sentiment characterizations for words and phrases in spans of text between character mentions (span based ‘Lexical sentiment’ features).
Structured vs unstructured models:
We now analyze the performance of our proposed models; and evaluate the significance of adding structural features to our models. In our experiments, we found the structured models to consistently outperform text-based models. We tune values of hyper-parameters, i.e. number of training epochs for the structured perceptron (10), the weighting parameter for the clustering model (=0.8), and the number of clusters (=2) through cross validation on training data.333For representations of movie summaries, we use genre keywords from the metadata for movies (provided with dataset) and the average of text-feature vectors for all character pairs Table 1 compares the performance of the models on our held-out test set of 27 movie summaries (comprising about 20% of the all annotated character relations). For the structured models, reported results are averages over 10 initializations.
We observe that the structured perceptron model for relations (SPR) outperforms the text-only model trained with logistic regression (LR) by a very large margin. These results are consistent with our cross-validation findings, and vindicate our hypothesis that structural features can significantly improve inference of character relations. Further, we observe that the narrative-specific model (with ) slightly outperforms the structured perceptron model.
|Avg P||Avg R||Avg F1||Acc|
|Naive (majority class)||0.269||0.520||0.355||0.520|
|SPR +Narrative types||0.806||0.804||0.805||0.804|
Let us consider the affinities for structural features learnt by the model. Over 10 runs of SPR, the average weights were: , , and . From the perspective of structural balance, the social configurations and are inherently unstable. Hence, the learnt affinity for the configuration seems higher than expected. This is unsurprising, however, if we consider the domain of the data (movies), where it might be a common plot element. We also note that the ‘friend of a friend is a friend’ maxim is not supported by the feature weights (even though it is a stable configuration), and hence a model based on this as a hard transitive constraint could be expected to perform poorly.
Cluster analysis: We briefly analyze a particular run of the clustering model for . In Figure 5, we plot the overall feature weights () for a run (we plot 8 most significant features from the text model, and the primary structural features). We note that the two clusters are reasonably well delineated; and thus clustering is meaningful. For this run, Cluster 1 appears correlated with higher weights of positive polarity features. Cluster 2 appears less differentiated in terms of structural features than Cluster 1 or the non-clustering structured model.
Qualitative evaluation We observe that relation characterizations for character pairs are reasonable for most narrative texts in the test set. Figure 6 shows labels inferred by the model for two well-known movies in the test set. Further, analysis of highest contributing evidences that lead to predictions indicated that the model usually provides meaningful justifications for relationship characterization in terms of narrative acts, or implied relations from structural features.
Error analysis revealed that mismatched coreference labelings are the most common source of errors for the model. Secondly, in some cases, the text-based features mistakenly identify negative sentiments due to our coarse model of sentiment. For example, consider the following segment for the movie Smokin’ Aces 2: ‘Baker drags the wounded Redstone to the ”spider trap” … used to safeguard people’. Here, the model mistakenly predicts the relation between Redstone and Baker as adversarial because of the negative connotation of ‘drag’, inspite of other structural evidence.
We have presented a framework for automatically inferring interpersonal cooperation and conflict in narrative summaries. While our testbed was movie summaries, the framework could potentially apply to other domains of texts with social narratives, such as news stories. Our clustering framework provides a natural approach for such domain adaptation. In the future, the framework could be extended to handle nuanced relation categorizations and asymmetric relationships. Conceptually, a natural extension would be to use predictions about character relations to infer subtle character attributes such as agenda, intentions and goals.
-  A. Agarwal, S. Balasubramanian, J. Zheng, and S. Dash. Parsing screenplays for extracting social networks from movies. EACL 2014, pages 50–58, 2014.
A. Agarwal, A. Kotalwar, and O. Rambow.
Automatic extraction of social networks from literary text: A case
study on alice in wonderland.
the Proceedings of the 6th International Joint Conference on Natural Language Processing (IJCNLP 2013), 2013.
-  D. Bamman, B. O’Connor, and N. A. Smith. Learning latent personas of film characters. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, ACL 2013, 4-9 August 2013, Sofia, Bulgaria, Volume 1: Long Papers, pages 352–361, 2013.
-  D. Bamman, T. Underwood, and N. A. Smith. A bayesian mixed effects model of literary character. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, volume 1, pages 370–379, 2014.
-  P. Bramsen, M. Escobar-Molano, A. Patel, and R. Alonso. Extracting social power relationships from natural language. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, pages 773–782. Association for Computational Linguistics, 2011.
-  D. Cartwright and F. Harary. Structural balance: a generalization of Heider’s theory. Psychological review, 63(5):277, 1956.
-  N. Chambers. Event schema induction with a probabilistic entity-driven model. In EMNLP, volume 13, pages 1797–1807, 2013.
-  N. Chambers and D. Jurafsky. Unsupervised learning of narrative event chains. In ACL 2008, Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics, June 15-20, 2008, Columbus, Ohio, USA, pages 789–797, 2008.
Discriminative training methods for hidden markov models: Theory and experiments with perceptron algorithms.In Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10, pages 1–8. Association for Computational Linguistics, 2002.
R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa.
Natural language processing (almost) from scratch.
The Journal of Machine Learning Research, 12:2493–2537, 2011.
-  D. Das, D. Chen, A. F. T. Martins, N. Schneider, and N. A. Smith. Frame-semantic parsing. Computational Linguistics, 40(1):9–56, 2014.
-  D. K. Elson, N. Dames, and K. R. McKeown. Extracting social networks from literary fiction. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 138–147. Association for Computational Linguistics, 2010.
-  T. Evgeniou and M. Pontil. Regularized multi–task learning. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 109–117. ACM, 2004.
-  S. Feng, J. S. Kang, P. Kuznetsova, and Y. Choi. Connotation lexicon: A dash of sentiment beneath the surface meaning. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, ACL 2013, 4-9 August 2013, Sofia, Bulgaria, Volume 1: Long Papers, pages 1774–1784, 2013.
-  M. A. Finlayson. Learning narrative structure from annotated folktales. PhD thesis, Massachusetts Institute of Technology, 2012.
-  A. Hassan, A. Abu-Jbara, and D. Radev. Extracting signed social networks from text. In Workshop Proceedings of TextGraphs-7 on Graph-based Methods for Natural Language Processing, pages 6–14. Association for Computational Linguistics, 2012.
-  F. Heider. Attitudes and cognitive organization. The Journal of psychology, 21(1):107–112, 1946.
-  L. Huang, S. Fayong, and Y. Guo. Structured perceptron with inexact search. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 142–151. Association for Computational Linguistics, 2012.
-  V. Krishnan and J. Eisenstein. ”You’re Mr. Lebowski, I’m the dude”: Inducing address term formality in signed social networks. In Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, (to appear in Proceedings), 2015.
-  J. Leskovec, D. Huttenlocher, and J. Kleinberg. Signed networks in social media. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pages 1361–1370. ACM, 2010.
-  B. Liu, M. Hu, and J. Cheng. Opinion observer: analyzing and comparing opinions on the web. In Proceedings of the 14th international conference on World Wide Web, WWW 2005, Chiba, Japan, May 10-14, 2005, pages 342–351, 2005.
-  R. C. Schank and R. P. Abelson. Scripts, plans, goals and understanding: an inquiry into human knowledge structures. hillsdale, nj: L. N.].: Erlbaum, 1977.
-  P. Stenetorp, S. Pyysalo, G. Topić, T. Ohta, S. Ananiadou, and J. Tsujii. BRAT: a web-based tool for NLP-assisted text annotation. In Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics, pages 102–107. Association for Computational Linguistics, 2012.
J. Valls-Vargas, J. Zhu, and S. Ontañón.
Toward automatic role identification in unannotated folk tales.
Tenth Artificial Intelligence and Interactive Digital Entertainment Conference, 2014.
T. Wilson, J. Wiebe, and P. Hoffmann.
Recognizing contextual polarity in phrase-level sentiment analysis.In HLT/EMNLP 2005, Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, 6-8 October 2005, Vancouver, British Columbia, Canada, 2005.