Automatic text-based personality recognition, as an important topic in computational psycho-linguistics, focuses on determining one’s personality traits from text. The Big Five Hypothesis is usually used for measuring one’s personality in five binary traits: agreeableness (AGR), conscientiousness (CON), extraversion (EXT), openness (OPN), neuroticism (NEU)
. Recently, majumder2017deep (2017) use Convolutional Neural Networks (CNN) with static word embeddings and outperform the previous feature-based systems[4, 7] on Essays dataset . However, previous works have neither explored dialogue data nor use attentive networks and contextual embeddings such as BERT  and RoBERTa  for the task.
To address these issues, we create the first dialogue dataset
FriendsPersona for automatic personality recognition with a novel and scalable dialogue extraction algorithm, MainSpeakerFinder. Besides, we introduce both attentive networks and contextual embeddings (BERT and roBERTa) to the task. We not only outperform the previous models on the benchmark Essays dataset, but also achieve strong baseline results on our
We focus on Essays and
FriendsPersona datasets. Essays Dataset  is the benchmark dataset for text-based personality recognition with 2,468 self-report essays. Our new
FriendsPersona111https://github.com/emorynlp/personality-detection dataset is developed upon the public Friends TV Show Dataset  and contains 711 extracted conversations. Each essay or conversation in the two datasets is annotated by the binary Big Five personality traits.
MSF Extraction Algorithm
To build our own dataset, we develop a novel dialogue extraction algorithm, MainSpeakerFinder (MSF), to extract sub-scenes from full scenes and mark each sub-scene with a main speaker for three annotators to annotate. First of all, we slide a window of size 5 across the full dialogue to track the utterance count per speaker at each step. This allows us to obtain a smoothed utterance count curve per speaker. Then, we find peaks in each speaker’s utterance count curve. At last, we extract conversations around peaks as sub-scenes, in which the speaker of the curve is always the main speaker. This extraction step is necessary for two reasons. First, it allows annotators to focus on a short dialogue text. Moreover, the algorithm reuses full scenes to generate many short sub-scenes, which is beneficial to building a comparatively large dataset for training.
Due to limited funding, we annotated 711 sub-scenes from the first 4 seasons of the Friends TV Show on Amazon Mechanical Turk. Each sub-scene is annotated by 3 annotators for Big Five personality traits with -1, 0, and 1. We sum scores from 3 annotators and convert them to binary class with the median split.
In terms of inter-annotator agreement, we achieve an average pair-wise kappa of 54.92% between 2 annotators and Fleiss’ kappa of 20.54% among 3 annotators across five personality traits. The low Fleiss’ kappa is rather expected because text-based personality recognition is highly subjective so that annotators often judge different personality traits that are all acceptable for the same utterance. This may also be attributed to the limitation of our data; a higher agreement could be achieved if a multimodal dataset (e.g. text, image, audio) is provided.
To be consistent with previous works , Essays and
FriendsPersona datasets use accuracy and 10-fold cross validation with a constant seed for sampling. In
FriendsPersona, we replaced speaker names with marks like ’speaker0’ and ’speaker1’. Both datasets have binary class labels for each personality trait.
Experiment on Essays Dataset
First, we experiment baseline models on Essays dataset with FastText embeddings. In Table 1, Majority represents the percentage of the dominant class. LIWC (2016) represents the best LIWC-based model’s performance . HCNN is Hierarchical CNN model . ABCNN and ABLSTM represents CNN and Bidirectional LSTM models with attention mechanism. HAN is Hierarchical Attention Network. Besides, we fine-tuned pre-trained base BERT and RoBERTa embeddings for the task. Overall, ABCNN achieves the best score on CON, whereas RoBERTa gets the best on the other four traits. We improve 2.49% for 5 traits on average (AGR by 2.22%, CON by 2.83%, EXT by 2.53%, OPN by 3.18%, NEU by 1.69%).
Adaptation to FriendsPersona Dataset
We also experiment these models on
. We experiment with three ways of feeding dialogue text to the classifiers: 1.single (S): use only the concatenation of the single target speaker’s utterances; 2. single + context (S+C): use S + the concatenation of other speakers’ utterances; 3. full (F): use the full dialogue text in the natural order.
First, the models perform the best on S out of 3 formats for 5 traits (Table 2). It makes sense because our models are originally designed to classify simple monologue text instead of multi-party dialogue text and S converts dialogue to the target speaker’s monologue. Second, BERT and RoBERTa together achieve the most best results (10 out of 15 cases). But BERT and RoBERTa do not beat other models for CON on both datasets. At last, HAN achieves 3 best results out of 15 cases on
FriendsPersona, better than its performance (0 case) on Essays dataset. This is because HAN encodes dialogue on both utterance and token levels, which allows HAN to attend the main speaker’s utterances. In the future, we need a customized model to leverage dialogue information between speakers.
In this paper, we have two major contributions. We create a new dialogue corpus
FriendsPersona for automatic personality recognition with a novel and scalable MSF dialogue extraction algorithm. Besides, we introduce both attentive neural networks and contextual embeddings to the task. We significantly outperform the state-of-art results on the monologue Essays dataset, and establish a solid benchmark on
FriendsPersona. In the future, we want to design a BERT-based attention network to model utterances in dialogue and improve the performance on our dataset. Besides, we plan to assign more annotators to improve annotation quality and to expand the corpus size.
-  (2016) Character identification on multiparty conversation: identifying mentions of characters in tv shows.. In SIGDIAL Conference, pp. 90–100. Cited by: Dataset.
-  (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. Cited by: Introduction.
-  (2019) RoBERTa: a robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692. Cited by: Introduction.
Using linguistic cues for the automatic recognition of personality in conversation and text.
Journal of artificial intelligence research30, pp. 457–500. Cited by: Introduction.
-  (2017) Deep learning-based document modeling for personality detection from text. IEEE Intelligent Systems 32 (2), pp. 74. Cited by: Data Preparation, Experiment on Essays Dataset.
-  (1999) Linguistic styles: language use as an individual difference.. Journal of personality and social psychology 77 (6), pp. 1296. Cited by: Introduction, Dataset.
Personality trait classification of essays with the application of feature reduction.. In SAAIP@ IJCAI, pp. 22–28. Cited by: Introduction, Experiment on Essays Dataset.