Text mining techniques aim to extract insights from a text and discover patterns within it using different kinds of information from that text. As an example, the information contained in a text could be related to its syntactical structure, to its semantical meaning or it can even consider information sources such as the affective value of the text like, for instance, if a text inspires a certain emotion when read.
This is the base for many researches, such as sentiment analysis. Sentiment analysis, also called opinion mining, is the field of study that analyses people’s opinions, feelings, assessments, attitudes, and emotions towards entities such as products, services, organizations, individuals, problems, events, topics and its attributes[liu2012sentiment].
Thus, this is applicable to the field of text mining, where these feelings can be inferred using input information derived from the text itself. This is the case of the inference of the General Affective Meaning (GAM) [aryani2016measuring] of the text. In a text, GAM is obtained with its direct information, which can be semantic information, affective information of the individual words that compose it, the type of text used and its syntactic characteristics… That initial information are the features used to represent a text will serve as input for a function that outputs the corresponding GAM tags.An example of such features are valence or arousal [russell2003core, tsur1992makes, watson1985toward, wundt1874grundzuge].
Regarding those functions, they can use the value of manual GAM tags (supervised) or not (unsupervised). An example for the first case is the usage of Supervised Machine Learning (ML) algorithms. Here, we need to know the GAM of some texts in order to train the supervised ML model in the input features in order to be able to obtain the GAM for the texts where it is not known.
However, those approaches are not applicable for the unsupervised scenario where there are no GAM tags available.
GAM are not the only type of labels that can be used to model the global meaning of a text. Any kind of labels can be considered, including words related to the semantic meaning of the text, such as the relation between definitions and their associated words in a dictionary [amigo2018axiomatic].
All the different kinds of information contained within a text (semantic, syntactic, affective…) will depend according on the type of texts considered. Because of this, the approach will be different depending on whether the text is, for example, prose or verse. It will also depend on the language of the texts used.
That said, there are not many corpora available to perform data mining tasks on poetry texts, and much less for the Spanish language. There are even few options related to GAM and poetry. It is true that there are available corpora for Spanish poetry, such as the corpus DISCO [ruiz2018disco], but the annotations included in it do not provide information that can directly for text modelling tasks such as obtaining the GAM mentioned before. The reason is that DISCO includes only metadata about authors, sonnet scansion, rhyme-scheme and enjambment. There are, indeed, some previous works that provide a GAM modelling of Spanish poetry.
This is the reason why the present research increases the available copora for text mining tasks with Spanish poetry by presenting a corpus of Spanish sonnets from different time periods annotated with both affective and semantic labels in order to contribute to the research of text mining in both areas. The article will present DISCO PAL, Diachronic Spanish Sonnet Corpus with Psychological and Affective Labels (together with this paper), a corpus annotated by POSTDATA111Poetry Standardization and Linked Open Data, Ref. ERC-2015-STG-679528 proyect Starting Grant from European Research Council within the horizon H2020. experts in both literature and digital humanities. POSTDATA project aims to make ”poetry available online as machine-readable data will open a great world of possibilities of linking, indexing and extracting new information.”. This corpus will include binary labels for a group of concepts depending on whether that concept appears within the text or not. The concepts used all belong to the psychological domain.
Overall, the main contributions of this article are:
Define a methodology for unsupervised GAM modelling of a corpora of Spanish sonnets, based on previous works of GAM modelling for poetry in other languages. The proposal uses as data source public corpora with the affective meaning of individual words in Spanish in order to build affective features that infer the GAM of the whole text.
Validate the unsupervised GAM proposal by using an annotated corpus of Spanish sonnets (DISCO PAL) by different domain experts. This corpus contains annotations for the same features generated by the GAM modelling. The annotations values depend on the intensity of that affection within each sonnet
Analyse how the content influence the GAM generated. For this, the experts also annotate values for labels of psychological concepts that are expressed through that sonnet.
Provide the DISCO PAL corpus for future research, highlighting possible ways to use it for data mining of poetry, mainly through the affective and semantic modelling of texts.
The structure of this article is as follows: after the Introduction presented in this first chapter, the second chapter will summarize the state of the art (SOTA) for the areas relevant for this article. First, the SOTA related to data mining of affective information from poems. Then, the SOTA related to affective modelling of Spanish language by using public corpora for modelling individual words. After that, the third chapter will present the DISCO PAL corpus used annotated by POSTDATA experts in digital humanities, analysing the agreement between the annotators and the reliability of the corpus. It will also follow the research applied on poetry in different languages in order to build features based on the affective value of individual words (using public corpora for affective word modelling in Spanish) and see if those generated features could capture the GAM of the sonnets, by checking the affective values inferred against the ones labeled by the POSTDATA experts. The last chapter will mention the potential lines of research that could be carried out thanks to this corpus. It will also include a summary with the conclusions of this article.
This chapter presents a brief review for the related work relevant for this article. As it was mentioned in the Introduction, a text contains information related to different areas such as semantics, syntactics or affections. This is applicable to any kind of text, including poetic ones. Since this article will provide a research related to affective modelling for Spanish poetry, the main area covered in this chapter will be related to data mining of affective information in poetry, followed by a section describing some public corpora for the affective modelling of individual Spanish words.
Data mining of affective information in poetry
As previously indicated, texts in general, and particularly poetic ones, contain affective information that can be extracted using different techniques, like for instance aggregating the affective values of the individual words present in the text. It is important to quantify this affective contribution of poetic texts in order to be able to work with them computationally. Thus, the task consists in detecting which poetic elements are especially relevant in order to calculate through them the affective contribution of the whole poem. The articles shown below analyse precisely different ways of extracting and quantifying affective aspects from poetic texts.
In order to model the GAM of a poetry text, that poem needs to be expressed through a set of relevant features that are linked to GAM using a relationship that is expressed with a mathematical function. From here, there are two possibilities. First, if there is information about the GAM value of some poems, the purpose of the function may consist in generalizing a relationship between those values and the features extracted from the poem in order to be able to infer the GAM in poems where it is not known. This case is approached in[sreeja2018emotion] with the usage of supervised ML models. Here, the authors provide a corpus of 736 English poems annotated with 9 affective labels (love, anger, hate, sadness, joy, surprise), and use it to train an ensemble of supervised ML models. They begin extracting a set of relevant features from the poems related to semantic, linguistic and orthographic aspects, as well as some statistical features (term frequency and inverse document frequency). They also use poetic features extracted with rule-based methods which include information related to simile and metaphors. Those features are used together with the annotated affective labels in order to train ML models that can predict the GAM label for new sonnets. It is also worth mentioning how the authors state that this article is ”the first attempt to identify emotions from English poems”.
A similar approach was considered before for Arabic poetry in [alsharif2013emotion]
. Here, the authors first built a corpus of Arabic poetry annotating the poems with different emotions. Then, they extracted a set of relevant features from the poems based on the occurrences of different words (unigrams) in them. With these feature vectors, they trained different supervised ML models (Support Vector Machines, Naïve Bayes, Voting Features Intervals and Hyperpipes) to predict the emotion labels.
Beyond these supervised proposals, other authors have tackled the problem in an unsupervised manner. In [barros2013automatic], the authors obtain the GAM by counting how many instances of words such as fear or joy appear within a set of Quevedo’s poems. Therefore, no prior annotated values are used to infer the GAM of a poem. The final extracted GAM values are used to automatically annotate that corpus, and the author’s provide it with the paper.
This last paper, in fact, deals with the GAM extraction from Spanish poems. However, there are no more corpora beyond this one to the best of our knowledge. In fact, more recent researches of the topic for data mining with poetry, such as [kaur2017punjabi] only list that corpus of Quevedo’s poems annotated with sentiment labels according to the presence of certain words as Spanish corpora sources for data mining and GAM modelling.
Features can be obtained by modelling the whole text or by modelling the individual stanzas of the poem. For the case of affective features, they can be inferred using as input the individual affective values of the words that appear in the text as long as there is an available corpus that contain those individual affective values, such as BAWL in [ullrich2017relation]. This article performs data mining of affective content in poetic texts for German language. The article explores how the features of a poetic text (at sub-lexical, lexical and inter-lexical level) influence in GAM that is perceived. Thus, this article serves as an example to see which affective features are relevant to a text based on how related they are to the GAM as well as how to calculate them. To calculate those features they use the BAWL database for German words. This database contains affective values for individual words that belong to German, and they aggregate these individual values into a global value that models the whole poem.
As texts they use a corpus of poems is composed by 57 poems from the German author H.M. Enzenberg. These poems are annotated by a group of readers with the following features:
Score on a scale of 7 for the valence (valence or level of positive or negative affect of the text), where -3 would be very negative, 0 neutral and 3 very positive.
Score on a scale of 5 for the arousal (level of excitement of the text of the poem, which goes from texts that inspire peacefulness or calm to others who seek to motivate or are more exciting), where 1 is very quiet and 5 very exciting.
Score on a scale of 1 to 5 for the level of friendliness, where 1 indicates that the text is not friendly and 5 that it is very friendly.
Score on a scale of 1 to 5 for the level of sadness, where 1 would be that the text is nothing sad and 5 that it is very much.
Score from 1 to 5 for the level of malevolence, with 5 being much the level.
Score from 1 to 5 indicating if they liked the poem a lot or a little (5 a lot).
Score from 1 to 5 for the level of poeticity, where 5 would indicate that the poem is very poetic and 1 that it is little.
Score from 1 to 5 for the level of onomatopoeia (level that quantifies the use of this literary resource). 5 would indicate a lot of their presence.
These annotations by users at a global level serve to analyse the correlation of them against different features derived from the individual value of the words that appear within the text, not considering stopwords. The purpose of these study is to check if the features could serve to predict a GAM for the poem. As mentioned before, the features are from three different levels: sub-lexical, lexical and inter-lexical. The lexical level captures the valence and arousal average values from the words present in the text, the inter-lexical level quantifies peaks, ranges and changes within the lexical affective content, and the sub-lexical level considers sources such as phonological information of the poems. All these specifications are considered to define 55 affective features (using the 3 levels described above). Approximately the 50 percent of the explained variance is reached using only the lexical features, and together with the inter-lexical ones, the explained variance reaches 75 percent. This indicate that the best predictors would the ones related to these two levels, particularly the average of valence and the average of arousal derived from the individual words.
Of course, considering only these two features would indicate that the order of the words in the text is irrelevant for the affective impact, and that is not the case; the order matters, and experiencing crescendos or affective decrescendos is something fundamental, so the span of the level of excitation is another key aspect to consider. Together with that, the article also considers how the valence and arousal level evolve during the poem. This is important because, for example, poems are generally perceived as sadder when the valence of words is becoming less and less (more negative) and when the arousal at the end is lower, and poems are perceived as friendlier when the valence of the last words of the text is more positive. In this way it is important to consider the correlation coefficient between the vector of affectivity (arousal / valence) of the individual words with the vector of their positions in the text.
We find this article particularly relevant for our studies since it presents a thorough methodology for GAM extraction that conclude in good results.
It is important to remind that poetry is a huge genre, where there are different types of styles, and that will influence the affective modelling and the GAM extracted. This is indicated in the work of [obermeier2013aesthetic]
, where a study of the influence of poetry on affections is presented thanks to certain aesthetic and emotional elements such as the metric of the poem and its rhyme. Thus, the starting hypothesis is that metrics and rhyme have an impact on aesthetic perception, emotional involvement and valence. This indicates that the GAM of a poem will be different depending on whether the poem’s style includes metric and rhyme or not. To verify this, the authors analyse the influence of metrics and rhyme in the aesthetic and emotional perception of poetry, as well as their interaction with the lexicon, using the stanzas of the poems as references for the study. For that, they work with a group of 60 adults that listened to audios of German poems (100 poems from the 19th and 20th centuries). The poems had stanzas of 4 verses in which there were sets of poems with lexical differences (for instance, real words vs pseudowords; pseudowords were modified original words that kept the vowels but changed some consonant ensuring that they were still pronounceable). Poems also were divided depending on whether they had rhymes or not, or if they had accent or not. With this, the users scored four metrics for the poems that they were listening: liking (aesthetic appreciation), intensity (power of emotional response), perceived emotion (emotion that was expressed within the stanza) and felt emotion (emotion experienced by the users).
The results are as follows:
Liking: results had better aesthetic ratings for poetry with metric as well as for stanzas with rhyme compared to those without it.
Intensity rating: for all kind of poems the results were better with the stanzas that contain real words and not pseudowords.
Perceived emotion: influence of lexicon, metric and rhyme (especially the last two); best score for stanzas with pseudowords if they don’t have metric versus those that do. This last difference does not appear for poems with only real words.
Felt emotion: the main influence is the rhyme. There is also a triple interaction between lexical-metric-rhyme. When there’s rhyme the emotion felt is stronger.
Thus, this means that metric and rhyme reinforce the perceived emotion of a poem, which is expressed through the GAM. THis serve as a basis to consider sonnets as good candidates for our studies regarding GAM extraction, since they are structured poems with rhyme and metric. Due to this, we will focus our analyses not only in Spanish poetry, following some of the steps of [ullrich2017relation], but particularly in sonnets, as they will always guarantee the metric structure that enhances the text GAM.
As a last comment, however, the literature indicates some caveats and difficulties regarding the affective modelling of poetry. This appears in [eastman2015making], where the authors propose a solution for affective computing in relationship with poetry. This article addresses two relevant issues in this regard. On the one hand, it reminds how poetry widely uses metaphors and figurative language (words open to many meanings and interpretations). This makes the extraction of affective information not always as obvious as simply assigning to each word a value contained in a repository and then composing all the individual values. Metaphors are also interpreted in a large part from the subjectivity of the reader and from his personal experience, so it is not trivial and immediate to incorporate all the possible information. On the other hand, it also mentions that the understanding of the words of a poetic text should not be done only based on the text itself but that a poem by a given author can be understood in greater depth if compared with other poems by that author or with poems of other authors. Due to this, it is important to note that the understanding of a text, and hence the context for the individual words, it is best done if the words are understood not only within the context of a specific poem or a specific author but in a bigger context that includes poems from other authors. This is something important in any text comprehension task, but it is even more critical for poems where the language used is sometimes full of metaphors and other stylistic figures not so easily understood. The proper comprehension of the text is important for both the semantic modelling of the poem but also for the affective one, which means that the GAM extraction will be influenced if it did or did not consider that bigger context.
These previous works show how GAM extraction for poetry is tackled both with supervised and unsupervised approaches, covering poems from many different languages. However, there are few studies regarding Spanish poetry, with no references to sonnets in particular. Just like there are works that both analyse and provide an annotated poetry corpus with GAM values for German, Arabic and English texts, there are no equivalent, to the best of our knowledge, for Spanish. Therefore, we find a research need regarding both GAM extraction process for Spanish poetry, as well as offering an annotated corpus for future researches. Due to this, we will focus our analyses in Spanish poetry, using sonnets in particular because of their stable structure and the presence of metrics and rhyme. We will follow the steps of [ullrich2017relation], since they reach good results in the GAM extraction process while also referring to an annotated corpus. We will extract the GAM for sonnets in an unsupervised manner, and check the quality of those GAM values comparing the results against their counterpart values annotated by different experts.
Corpora for affective word modelling in Spanish
Just as BAWL, as mentioned in [ullrich2017relation] is a corpus used as source information for the affective modelling of individual words, there are similar corpora for Spanish vocabulary. Some of these corpora are described below. In [ferre2017moved] 2267 words are written in Spanish (along with their English translation) with the following fields222All the fields have ranges from 1 (minimum) to 5 (maximum).:
Spanish_Word: word in Spanish.
English_Translation: translation of that word into English.
Hap_Mean: average value associated with this feeling (happiness) thanks to the set of users.
Hap_SD: typical deviation associated with this feeling (happiness) thanks to the set of users.
Ang_Mean: idem for this feeling (anger).
Ang_SD: idem for this feeling (anger).
Sad_Mean: idem for this feeling (sadness).
Sad_SD: idem for this feeling (sadness).
Fear_Mean: idem for this feeling (fear).
Fear_SD: idem for this feeling (fear).
Disg_Mean: idem for this feeling (disgust).
Disg_SD: idem for this feeling (disgust).
N: number of subjects used in the sample.
In [guasch2016spanish] 1400 words are written in Spanish with the following fields:
ID: mere auto incremental field
Word: word in Spanish
English Trans.: translation of the words into English
POS: Part of Speech tag for that word
VAL_M: average value of the valence for the subjects that there are
: standard deviation of the valence for the subjects that there are
VAL_N: number of subjects used to obtain valence values
ARO_M: idem for excitation level
ARO_SD: idem for excitation level
ARO_N: idem for excitation level
CON_N: idem for concreteness
CON_SD: idem for concreteness
CON_N: idem for concreteness
IMA_M: idem for imageability
IMA_SD: idem for imageability
IMA_N: idem for imageability
AVA_M: idem for context availability
AVA_SD: idem for context availability
AVA_N: idem for context availability
FAM_M: idem for familiarity
FAM_SD: idem for familiarity
FAM_N: idem for familiarity
Regarding the concepts used, Concreteness is defined as the degree of specificity of the word, being 1 when the word is very abstract and 7 when it is very concrete. Words like ‘object’ are more abstract than others like ‘table’.
Imageability is defined as the easiness or difficulty of constructing a mental image associated with that word, being 1 when the word is very difficult to imagine and 7 when it is very easy. It is easier to imagine something with words like ‘flag’ than with others like ‘charity’.
Context availability is defined as the easiness or difficulty in associating that word with a context in which it could appear, being 1 when the word is very difficult to associate with a context and 7 when it is very easy. It is easier to construct sentences or search for examples of usage for words like ‘table’ than for others like ‘citizenship’
Familiarity is defined as the degree of familiarity, being 1 when the word is not very familiar and 7 when it is a lot. A word like ‘fish’ is more familiar than another like ‘quark’.
In [stadthagen2017norms] the following fields are collected for 14031 words333All the fields have ranges between 1 (minimum) and 9 (maximum).:
Word: dictionary word.
ValencieMean: average value of the inferred valence of the different subjects of the analysis.
ArousalMean: average value of the level of excitation inferred from the different subjects of the analysis.
ValenceSD: standard deviation of the valence values given by the different subjects.
ArousalSD: standard deviation of the excitation level values given by the different subjects.
% ValenceRaters: percentage of total subjects that have given a value to the valence.
% ArousalRaters: percentage of the total of subjects that has given a value to the level of excitation.
Finally [alonso2015subjective] describes for 7040 words other characteristics such as the average age at which a word is usually learned (averageAoA), the minimum (Min) and maximum (Max) age and the deviation in these age data (SD), as well as the literary frequency with which it is usually found.
This are, to the best of our knowledge, the main corpora for affective values of Spanish words that will serve as an equivalent to BAWL. We also consider this corpora since the affective values associated to the words were obtained considering a general public from different ages, as opposed to more recent corpora like [sabater2020spanish], where the people involved were children and adolescents.
Poetry and Psychology
As we mentioned before, poetry contains an affective dimension that may evoke different sentiments, which can be quantified by inferring it’s GAM. But the affective dimension is not the only one present in a poem. Poems are also a way to express the psychological state of it’s author, as indicated in [czernianin2016poetry]. Here, the article shows how poetry is used as a way to discharge the mood of it’s authors. In fact, they analyse several poems to see how some of its content reflect psychological states such as suffering, happiness or hedonism.
Following this, the psychological state of the author is reflected in the poem, and that also evokes a particular psychological state in the reader, as mentioned in [kao2012computational]. Here, the authors mention both how poetry is used as a way to explore and express emotions, as well as how it causes in the readers psychological states such as catharsis. In fact, [parastoo2016effect] conduct a study in which is analysed how reading poetry can be used as a therapy to treat psychotic patients. Thus, poetry can influence the reader’s state to a point that it can even be used as a therapy to change or mitigate a particular pernicious psychological state. Complementing this, [shapiro2003can] show how including poetry within a medical student program enhances dimensions such as empathy, altruism, compassion, and caring toward patients.
Therefore, the psychological state of the author and the psychological state evoked in the reader converge in the content of the poem. This content both captures that initial state and serves as a source to evoke it later in the reader.
Thus, it is interesting not only to know what affections and sentiment does a poem evoke (captured in the GAM), but also know what psychological state the poem evoke, in order to contribute to its usage within all those contexts aforementioned. However, to the best of our knowledge there are no corpora that identify different groups of poems according to the psychological states that they reflect. Due to this, we find a research need in providing an annotated corpus of poems that identifies different subsets according to some psychological states, identified by tags.
Also, since poetry both evoke affections and psychological states intertwined, it is important to quantify how GAM changes according to the psychological state represented in its content.
The methodology proposed consists in inferring the GAM of a sonnet based on the individual contribution of its words, and then validate that using supervised annotations. Thus, we define an unsupervised approach to build the GAM and then we use domain knowledge to check it.
This Chapter first introduces the corpus included in this paper, Diachronic Spanish Sonnet Corpus with Psychological and Affective Labels, DISCO PAL. We begin by presenting the participants that annotated the corpus, and after that we will describe the corpus itself. We conclude introducing the methodology used, which includes the input data sources and the features built around them.
As mentioned before, the labels were annotated by three experts in digital humanities, literature and linguistics, belonging to POSTDATA project.
DISCO PAL is a subset of a larger corpus, DISCO [ruiz2018disco] DISCO that consists of 4085 sonnets in Spanish language from the 15th to 19th century. From that corpus, in order to create DISCO PAL, the experts of POSTDATA have annotated a subset of 230 sonnets, with 167 belonging to the 19th century, 9 belonging to the 18th century and the other 72 belonging to the interval of 15th to 17th century. This is a relevant fact to consider because some sonnets are written in old Spanish, something that can significantly affect all the text mining analysis applied to the poems. Also, the number of authors used is 47. Additionally, since the number of sonnets is 230, much bigger than 30, there are enough sonnets to propose a statistical analysis of significant value. With that, the corpus provided is very rich, with many different authors belonging to different centuries, in line with the proposals of the scientific literature [eastman2015making].
There are two types of features annotated: affective and psychological. Affective features are detailed in Table 1 and have a range of 1 to 4, being 1 the minimum value (the sonnet does not inspire that affection very much) and 4 the maximum (the sonnet does inspire that affection very much). The scale only uses integer values. Psychological features are binary values and indicate whether the sonnet is related to that concept (1) or not (0). These features are described in Table 9.
|Anger||Concreteness444Same as ‘Concreteness’ defined in the SOTA|
|Sadness||Imageability555Same as ‘Imageability’ defined in the SOTA|
|Fear||Context availability666Same as ‘Context availability’ defined in the SOTA|
The features mentioned before were annotated by three different domain experts belonging to POSTDATA project. Each of those experts has individually annotated the same 230 sonnets for all those features.
Regarding the psychological features, they were chosen considering their relevance in the literature [francotmf2016owl]. All these annotations allow to calculate different metrics (such as precision, for example) in the recovery of poems. The experts have annotated the sonnets independently (without knowing the annotations from the other experts) and following the same sonnet order. The experts did not know nor the author nor the time period of the different sonnets; they only had access to the text itself. This was done in order to mitigate bias in their judgement. They used a csv file with rows containing the sonnet texts and columns with the different variables. Each of them assigned a value within the available range in the corresponding cells.
The methodology used is divided in two parts. First, we build the GAM values from the individual words of the sonnets.
The corpora used to assign an affective values to the individual Spanish words were some of the ones already introduced previously in the Related Work Chapter. We use as input corpora [ferre2017moved], [guasch2016spanish] and [stadthagen2017norms].
We then use those feature values at a word level to build different GAM features for the whole sonnet, aggregating those individual values. Thus, we have:
ValenceMean: Mean of the ValenceMean values for the individual words.
ValenceSD: Standard deviation of the ValenceSD values for the individual words.
ArousalMean: Mean of the ValenceMean values for the individual words.
ArousalSD: Standard deviation of the ArousalSD values for the individual words.
Hap_Mean: Mean of the Hap_Mean values for the individual words.
Hap_SD: Standard deviation of the Hap_SD values for the individual words.
Ang_Mean: Mean of the Ang_Mean values for the individual words.
Ang_SD: Standard deviation of the Ang_SD values for the individual words.
Sad_Mean: Mean of the Sad_Mean values for the individual words.
Sad_SD: Standard deviation of the Sad_SD values for the individual words.
Fear_Mean: Mean of the Fear_Mean values for the individual words.
Fear_SD: Standard deviation of the Fear_SD values for the individual words.
Disg_Mean: Mean of the Disg_Mean values for the individual words.
Disg_SD: Standard deviation of the Disg_SD values for the individual words.
VAL_M: Mean of the VAL_M values for the individual words.
VAL_SD: Standard deviation of the VAL_SD values for the individual words.
ARO_M: Mean of the ARO_M values for the individual words.
ARO_SD: Standard deviation of the ARO_SD values for the individual words.
CON_M: Mean of the CON_M values for the individual words.
CON_SD: Standard deviation of the CON_SD values for the individual words.
IMA_M: Mean of the IMA_M values for the individual words.
IMA_SD: Standard deviation of the IMA_SD values for the individual words.
AVA_M: Mean of the AVA_M values for the individual words.
AVA_SD: Standard deviation of the AVA_SD values for the individual words.
MaxAro: Maximum value of arousal.
MinAro: Minimum value of arousal.
MaxVal: Maximum value of valence.
MinVal: Minimum value of valence.
ValenceSpan: Difference between MaxVal and MinVal.
ArousalSpan: Difference between MaxAro and MinAro.
CorAro: Spearmann correlation between the arousal value and their position in the sonnet.
CorVal: Spearmann correlation between the valence value and their position in the sonnet.
SigmaAro: with N the number of words in the sonnet.
SigmaVal: with N the number of words in the sonnet.
With Ang, Sad, Disg, ARO, VAL, CON, IMA, AVA correspond to anger, sadness, disgust, arousal, valence, concreteness, imageability and context availability respectively. There are two possible valence and arousal features since they can be inferred from two of the available corpora. We will use both and focus the analysis in the one that yields best results.
Then, we define the features to be annotated in the sonnet DISCO PAL corpus in order to analyse later on the quality of the GAM features. These features were the ones described in the Materials section. Thus, we will compare every feature associated to anger, sadness, disgust, arousal, valence, concreteness, and imageability to the value annotated by the experts. We will also analyse these comparisons considering the psychological tags. For that, we will consider separately sonnets that belong to a particular psychological and analyse the aggregated value of the word features against the annotated values of the affective features for that subset of sonnets only.
The evaluation steps are the following ones:
Study the reliability of the DISCO PAL corpus annotated by the POSTDATA experts. Here, we check the level of agreement between the annotators in order to see if the discrepancies between them are or not significant. If the level of agreement is enough, proceed to the next point.
Analyse the relationship between the values annotated by the experts and the ones obtained through the GAM infer methodology shown before. This analysis is carried out at three levels.
First, we analyse the bivariate correlation between the inferred feature values and the annotated ones for those same features, checking if they are above a minimum threshold. Literature [schober2018correlation] indicates a basic reference of [0.1-0.39] as a weak correlation, [0.4-0.69] as a moderate correlation, [0.7-0.89] strong correlation, and ¿0.9 very strong correlation.
Then, we analyse the partial correlation between those same inferred features and the annotated ones. This is done by building a regression model over the GAM features (independent variables) and each label at a time (dependent feature), analysing the level of significance of the p-value for the inferred feature, the r-squared value, and the feature coefficient.
Each of this analyses is done considering all the annotated DISCO PAL corpus, as well as separating by their different psychological tag values, in order to see if the results differ significantly. We will use the median values between the results of every psychologically tagged subset as a reference for the comparisons.
Finally, in order to analyse differences in the GAM depending on the psychological tag, we will perform a One-way ANOVA hypothesis contrast. Here, we check if there are significant differences in the mean value of each GAM between the subset of sonnets with a psychological label equal to 1, and the ones with that label at 0. There will be differences if the p-value is less than 0.05.
The materials included in this article are three csv files with the annotations made by the experts, as well as a csv file with metadata information about the annotated sonnets. This metadata csv is included in order to allow the reference between the DISCO PAL and the original source DISCO. The fields included in the metadata csv are:
author: author of the sonnet.
year: year or century of publication.
title: title of the sonnet.
id_sonnet: unique id used by DISCO for that sonnet.
file_path: file name path to that sonnet in the per-sonnet folder in DISCO.
All data provided is located at [barbado2019pal].
Reliability and validity of DISCO PAL corpus
A first approach to study the reliability and validity of DISCO PAL is to analyze the agreement between the three annotators. This is accomplished by obtaining the Krippenndorff Alpha [krippendorff2011computing], or k-alpha, for the annotations made by the 3 experts for each of the variables.
K-alpha is a metric that generalizes other metrics that are responsible for quantifying the reliability between annotators (inter-rater reliability), being able to work for both ordinal and nominal annotations, as well as with any number of annotators. K-alpha will be a value between 0 and 1, where 1 represents a full agreement. However, there are different criteria regarding when to consider that there is agreement between scorers. Sometimes strict criteria are used, in which only expert annotations are accepted as truly valid if there is a k-alpha of at least 0.8 [carletta1996assessing]. Other laxer criteria set the minimum at 0.21, defining the following thresholds [landis1977measurement]:
: Very low
The k-alpha results considering the three annotators together are shown in Table 2. This shows the following information.
There is acceptable coincidence () for the following variables considering the annotations of the 3 experts: ’Anxiety’, ’Aversion’, ’Depression ’, ’Disappointment’, ’Dramatización’, ’Illusion’, ’Helplessness’, ’Inestability’, ’Anxiety’, ’Aversion’, ’Depression ’, ’Disappointment’, ’Dramatización’, ’Illusion’, ’Helplessness’, ’Inestability’, ’Insecurity’, ’Anger’, ’Obsesión’, ’Pride’, ’Prejudice’, ’Temor’, ’Vulnerability’, ’concreteness’, ’context availability’. This corresponds to 54.84% of the total existing variables. It should be mentioned that all the terms that pass this check correspond to the set of psychological terms.
There is a slight coincidence () for the other variables considering the annotations of the 3 experts. In some cases the K is high (close to 0.21 although somewhat lower), something that occurs especially in psychological terms. The greatest discrepancies are seen in the variables: arousal 0.0912, happiness 0.0279, valence 0.0051. In addition to the joint analysis of the scorers, Tables 3, 4 and 5 include comparisons between scorers (one versus one) to check which ones gave the greatest discrepancy.
Tables 3, 4 and 5 help to see how k-alpha values change significantly according to which pair of annotators are compared. Again, the validation of variables is done for those with k-alpha greater than 0.21. In particular, the following is checked:
For annotators 1 and 2 (Table 4) only the following variables are validated, representing 25.81% of the total variables: ’Anxiety’, ’Aversion’, ’Dramatización’, ’Inestability’, ’Insecurity’, ’Obsesión’, ’Temor’, ’Vulnerability’.
For annotators 1 and 3 (Table 5), only one variable is validated, which represents 3.23% of the total variables: ’Illusion’.
For annotators 2 and 3 the following variables are validated, representing 77.42% of the total variables (that includes all psychological variables): ’Anxiety’, ’Aversion’, ’Compulsion’, ’Depression ’, ’Disappointment’, ’Dramatización’, ’Daydream’, ’Grandeur’, ’Idealization’, ’Illusion’, ’Helplessness’, ’Inestability’, ’Insecurity’, ’Anger’, ’Irritability’, ’Obsesión’, ’Pride’, ’Prejudice’, ’Solitude’, ’Temor’, ’Vulnerability’, ’anger’, ’disgust’, ’fear’.
For annotators 2 and 3 the k-alpha is higher for most of the variables comparing it with the previous analysis (including for all the psychological terms). However, it is especially low (including one negative value) for the following variables: ‘happiness’, ‘valence’.
The biggest discrepancies are with the annotator 1, especially between 1 and 3 where only one variable has k-alpha greater than 0.21: ‘Illusion’.
In order to conduct further analyses, those three annotation sets should be combined into only one label vector. A proposal to do it is using the median value between the labels of the three experts. In that way, if there is a discrepancy between two annotators and a third one, the final value used will be the one that agrees with most of them.
This median value will act as a proxy ”annotator” than agrees with the three experts. Indeed, as shown in Table 12, the agreement versus each annotator is very high.
Analysis of DISCO PAL corpus for individual affective word modelling
As mentioned previously, the corpus consists of 4085 sonnets in the Castilian language from 15th to 19th century, collected from the corpus DISCO from POSTDATA (UNED), which have been annotated with specific affective features, inspired by the literature, in particular [ullrich2017relation].
That article indicates how they work with the BAWL corpus that contains 6000 words in German, and how, in order to associate the value of features to individual words, they use the different words available in the poems. To increase the number of words that match the entries in these tables, the words of the poems are lemmatized while stopwords are also removed. In this way, it is possible to find the affective value for 90% of the words that appear in the poems, with the remaining 10% being a set of words that do not appear in these tables because they are, mainly, proper names. In this way, the next point to consider will be to analyze how many words of the set of sonnets available in Spanish appear in the tables used.
In the case of DISCO PAL corpus, a comparison is made in which it is analyzed what percentage of corpus words are present in each of the source corpora proposed to obtain the affective features.
Tables 6, 7 and 11 shows the words of the DISCO PAL corpus that match the ones in the different source corpora. Several scenarios are proposed, in which we show the results of the original words for both the DISCO PAL corpus and the source corpora, as well as the words after applying lemmatization and stemming (with SnowBall stemming algorithm [porter2001snowball]) techniques. It can be seen that using lemmatization or stemming techniques improves, as expected, the number of words from the sonnets present in the source corpora. Since there is not a huge difference between lemmatization and stemming percentages, the analyses carried out in this paper will deal with lemmatization words. Lemmatization and stemming scenarios also include the elimination of stopwords.
|categories||n words||n words lem||n words stem|
However, it is worth mentioning that the percentages are not high in many of the source corpora, quite different from the 90% of matching that occurs at [ullrich2017relation]. Table 8 shows the top 9 most common missing words from the DISCO PAL corpus in all of the source corpora.
|Words||Number of occurrences|
As shown in Table 8, most of the common missing words represent archaic verbs (p.e. ’Airar’, ’Osar’, ’Porfiar’…), not frequently used. They also include some proper nouns (p.e. ’Apolo’).
This scenario will probably hinder the results from the GAM in comparison to[ullrich2017relation] since there are more absent words in the source corpora, even after removing stopwords and performing lemmatization.
Following a similar approach to [ullrich2017relation], the source corpora mentioned are going to be used as an input source in order to infer the GAM value of the sonnets. The results are going to be validated against the labels annotated by the POSTDATA experts.
As mentioned in the previous section, the evaluation is going to be assessed against the median value derived from the three annotators. The analyses will consist in aggregating the individual values of each word in the source corpora from the different sonnets. Those sonnets have their words lemmatized and without stopwords. Since a lemmatized word can appear multiple times in the source corpora (p.e. ”bees” and ”bee” will be the same word after lemmatization), the final value assigned to that word is the average between all the words with the same lemma.
In order to analyse the GAM value inferred from individual words, this paper studies the results over the whole DISCO PAL corpus. Then, we will obtain the GAM for the subset of sonnets that belong to each psychological tag to check how it affects the GAM obtained. However, those subsets need to have at least 30 sonnets so the results have statistical significance. Table 10 shows the number of sonnets from DISCO PAL corpus that are tagged with that psychological category. Since ”Orgullo” y ”Prejuicio” have too few sonnets annotated, we will not use them in the analyses.
With that, Table 13 shows the aggregated features correlation compared to their annotated counterpart, considering only features that have at least one case of significant bivariate correlation (significant correlation is above 0.3, or below -0.3). For the correlation calculus we are using the Spearman correlation [croux2010influence].
The bivariate correlations results for the remaining features mentioned previously are included in Figures 1 and 3, comparing their correlation with the different source corpora that can be used to build them. They did not provide strong results, and because of that, they are not considered in the partial dependence analysis. Those Figures, together with 1 contain the full correlation matrices for all the features corresponding to each one of the three source corpora used.
show the partial dependence between each GAM feature and their counterpart annotated by the experts. As mentioned before, a linear regression model is trained over all sonnets, using all GAM features as independent variables, and using one of the annotated features as dependent variable. Then, we get the p-value of the corresponding GAM feature, and see if that value is relevant, using a threshold of 0.1. (meaning it is significant). We also check the coefficient of that feature to see that it is (a negative coefficient would mean that even if the model is fitted properly, the relationship between both features is not coherent). We also check the adjusted r-squared value in order to see if the model is well-fitted.
Regarding all sonnets, the features arousal, fear, happiness, imageability, sadness and valence have significant results, all of them having a p-value less than 0.1 and a coefficient value higher than 0. The model itself has also a good adjusted r-squared for every one of those features. However, for the analysis of the features per psychological tag, the relationships are poorer. The only tags with relevant features are Aversión (sadness, valence), Compulsión (imageability), Depresión (happiness), Dramatización (imageability, sadness), Grandiosidad (fear), Idealización (anger, arousal), Ira (valence), Temor (anger, sadness) y Vulnerabilidad (sadness).
This yields two conclusions. First, there are some features for which their GAM values seem related enough to their annotated counterparts, considering the whole DISCO PAL corpus. Mainly Fear, Happiness, Sadness and Arousal. Second, the previous statement is also true for some subsets of psychological tags. Particularly, for Aversión, both Sadness and Valence have good bivariate correlations and also pass the partial dependence check. For Dramatización, both Imageability and Sadness. For Grandiosidad, Fear. For Idealización, Anger. For Temor, Anger and Sadness. For Vulnerabilidad, Sadness. This shows that Sadness has particularly robust feature.
Combining these insights with the information about the level of agreement using Table 2, Fear and Sadness are the features with the higher agreement value (though it is not very high). Regarding the results for each psychological tag, Aversión, Vulnerabilidad and Dramatización have an agreement level higher than 0.21. Placing this together with the other k-values, we can consider acceptable results those regarding Fear, Anger, Sadness, but for Valence, the level of discrepancy was too high (K-value for the three annotators is almost 0).
If we compare the results of our GAM extraction process against the ones on [ullrich2017relation], we need to focus on the subset of features that appear in both of the papers. That features are Valence and Arousal. Thus, we can compare the annotated GAM value for those features against their inferred counterparts, as well as to other features related to them, like CorAro or ValenceSpan. For Valence, the bivariate correlation of the inferred Valence value in [ullrich2017relation] is 0.65. For Arousal is 0.54. In both cases the partial correlation analysis show statistical significance while using those inferred features as predictors for the annotated one. However, as we can see in 2 and 3 for the inferred Valence and Arousal depending on the source corpus used, in neither of them we reach good enough bivariate correlation values. Thus, since we do not reach significant results for these main features, we will not include a comparison for the remaining derived features for Valence and Arousal (such as CorAr or ValenceSpan).
Finally, we analyse if there are significant differences in the GAM (using the annotated value) between subsets depending on whether they refer to a specific psychological label or not. We perform a One-way ANOVA hypothesis contrast for each combination between GAM feature and psychological label. The results for those combination that had p-values less than 0.05 are included in Table 17. That table also includes the mean value for the GAM considering the sonnets annotated with that psychological tag, M (=1), and the other ones, M (=0). As we can see, from among the 190 possible combinations, 96 of them yielded significant differences in the GAM depending on the subset considered.
Limitations of our Approach
The principal limitation to mention related to our proposal is that the analysis is applied only over a group of sonnets in Castilian from the 15th to 19th century. Those sonnets contain many archaic words, and that limits the presence of them within the corpora used to assign the value of individual words. In fact, the ratio of words in those corpora, as already mentioned, is lower than other analysis in the literature, influenced in part by these aspect.
Also, though there is an acceptable agreement between the annotators for most of the features, that agreement is not extremely high. This is something that also influences the results obtained.
Finally, there could possibly be a bias due to the fact that the expert annotators have a profile specialised in digital humanities. If the annotators were experts in psychology, for instance, the results may differ.
Conclusion and Future Work
This chapter concludes with a final reflection based on the results of the analyses carried out as well as indicating possible lines of research that can be pursued.
This article presents a methodology to infer GAM feature values for Spanish poetry, using available corpora that contains feature values for individual words. This GAM methodology is unsupervised, needing no prior information about the sonnets themselves.
The proposal is evaluated using a subset of sonnets annotated by domain experts. This article includes a corpus of 230 sonnets with features annotated. The sonnets are from Spanish authors from different time periods (from 15th to 19th century). These sonnets are annotated using both affective features that indicate the intensity level of that affection within the sonnet, and concepts that belong to the psychological domain, indicating whether a sonnet content is related to that concept or not. The features were annotated by three domain experts that belong to POSTDATA project (UNED). Those experts have annotated the sonnets independently (without knowing the annotations from the other experts) and following the same sonnet order. The experts did not know nor the author nor the time period of the different sonnets; they only had access to the text itself. This article also conducts an analysis on the level of agreement of the features annotated by the three experts. The result is that around the 54% of the features have an adequate agreement, all of them belonging to the psychological labels. This analysis was made comparing the three annotators together. The article proposes the usage of the median value for the three annotators since it seems that two of them are similar and is the third one that is introducing more disagreement. This median vector reaches an acceptable agreement for almost all the variables and all the annotators.
Using the median vector, we validate that it is feasible to build GAM features for the whole sonnet from their individual words, since some of them do have a significant bivariate correlation and partial dependence with their annotated counterparts, according to their Spearman correlation result and their p-value contrast respectively. Here, the results are specially good for arousal, fear, happiness, imageability, sadness and valence features.
Finally, after considering results for all the sonnets together, we have also analysed the results for different subsets according to their psychological tag. This was done by performing a One-Way ANOVA hypothesis contrast for the feature values of the subgroup of sonnets with a particular tag against the values of those same features belonging to the remaining sonnets. Thanks to this, we saw how depending on whether the sonnet evokes a specific psychological state, the GAM values differ significantly.
This subsection details the possible lines of research that can be pursued following the results presented in this article. There are two main group of research lines that are considered at this point. One is related to the improvement of the data quality involved in the GAM methodology, and the other is related to the applications of the DISCO PAL corpus.
Related to the data quality research areas, there are two fields of improvement. First, all the source corpora used for the feature values of the individual words lack many archaic words that are present in the sonnets. It would be useful to enrich those corpora with these missing words in order to check if there is an improvement over the results shown in this paper. Second, as shown in the agreement analysis between annotators, there are some discrepancies in the values assigned for the features, something that potentially affected the results obtained in this paper. Though we proposed using the median value and this yielded robust results for some features, it would be interesting to see other proposals to combine those annotations and mitigate the differences.
Regarding the usage of the DISCO PAL corpus itself, there are two possible approaches.First, there are research lines that can be pursued related to the psychological tags provided. As we mentioned before, to the best of our knowledge there are no poetry corpora that include annotations regarding psychological states evoked by the poems. This article then provides a curated corpus (DISCO PAL) that may help the research regarding the usage of poetry for therapeutic purposes.
The other line is related to the affective modelling of poetry. DISCO PAL includes 10 affective labels that can be used to study how to infer the GAM of a Spanish sonnet. This could be accomplished by using ML models that predict the GAM labels based on the semantic vector of the sonnet, or it could also be done by using the corpora for the individual affective value of words, described previously, trying to map the values of all the individual words of a poem to the global GAM annotated.
Author note: This work was possible thanks to the POSTDATA project, and particularly because of Salvador Ros Muñoz, Laura Alises, Marie Olivier and Aroa Rabdán.
|Soledad (Solitude)||Ansiedad (Anxiety)|
|Ilusión (Illusion)||Ira (Anger/Wrath)|
|Ensoñación (Daydream)||Inestabilidad (Inestability)|
|Grandiosidad (Grandeur)||Idealización (Idealization)|
|Orgullo (Pride)||Depresión (Depression)|
|Irritabilidad (Irritability)||Desilusión (Disappointment)|
|Irritabilidad (Irritability)||Prejuicio (Prejudice)|
|Aversión (Aversion/Loathing)||Inseguridad (Insecurity)|
|Impotencia (Helplessness)||Vulnerabilidad (Vulnerability)|
|Temor (Fear)||Obsesión (Obsession)|
|Category||Number of sonnets|
|categories||[ferre2017moved] lem||[stadthagen2017norms] lem||[guasch2016spanish] lem||[ferre2017moved] stem||[stadthagen2017norms] stem||[guasch2016spanish] stem|
|Feature||k-alpha 1||k-alpha 2||k-alpha 3|
|all||context availability||0.86||0.76||-0.01||no||Compulsión||context availability||0.98||1||0||no|
|Ansiedad||context availability||0.91||0.5||0.05||no||Depresión||context availability||1||0.26||-0.18||no|
|Aversión||context availability||0.9||0.21||0.08||no||Desilusión||context availability||0.98||0.04||-0.15||no|
|Dramatización||context availability||0.9||0.96||0||no||Idealización||context availability||0.86||0.72||0.02||no|
|Ensoñación||context availability||0.94||0.69||-0.09||no||Ilusión||context availability||0.95||0.66||0.03||no|
|Grandiosidad||context availability||0.88||0.27||0.08||no||Impotencia||context availability||0.94||0.81||-0.01||no|
|Inestabilidad||context availability||0.97||0.41||0.07||no||Irritabilidad||context availability||1||0.67||-0.11||no|
|Inseguridad||context availability||0.98||0.13||0.27||no||Soledad||context availability||0.96||0.81||-0.01||no|
|GAM||Psycho. Tag||M (=0)||M (=1)||p||GAM||Psycho. Tag||M (=0)||M (=1)||p|