The state-of-the-art in text-based automatic personality prediction

Personality detection is an old topic in psychology and Automatic Personality Prediction (or Perception) (APP) is the automated (computationally) forecasting of the personality on different types of human generated/exchanged contents (such as text, speech, image, video). The principal objective of this study is to offer a shallow (overall) review of natural language processing approaches on APP since 2010. With the advent of deep learning and following it transfer-learning and pre-trained model in NLP, APP research area has been a hot topic, so in this review, methods are categorized into three; pre-trained independent, pre-trained model based, multimodal approaches. Also, to achieve a comprehensive comparison, reported results are informed by datasets.



There are no comments yet.


page 3

page 25

page 26


Automatic Personality Prediction; an Enhanced Method Using Ensemble Modeling

Human personality is significantly represented by those words which he/s...

Deep Learning Based HPV Status Prediction for Oropharyngeal Cancer Patients

We investigated the ability of deep learning models for imaging based HP...

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Transfer learning, where a model is first pre-trained on a data-rich tas...

Machine Reading of Hypotheses for Organizational Research Reviews and Pre-trained Models via R Shiny App for Non-Programmers

The volume of scientific publications in organizational research becomes...

BNLP: Natural language processing toolkit for Bengali language

BNLP is an open source language processing toolkit for Bengali language ...

HETFORMER: Heterogeneous Transformer with Sparse Attention for Long-Text Extractive Summarization

To capture the semantic graph structure from raw text, most existing sum...

Unsupervised Pronoun Resolution via Masked Noun-Phrase Prediction

In this work, we propose Masked Noun-Phrase Prediction (MNPP), a pre-tra...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The development of language is part of the development of the personality, for words are the natural means of expressing thoughts and establishing understanding between people.”, Maria Montessori.

The above quotation becomes the basis of what is present in this article, studying natural language processing in individual personality. Personality is defined as the characteristic set of behaviours, cognitions, and emotional patterns corr2009cambridge as well as thinking patterns kazdin2000encyclopedia, and its external appearance can be seen in writing, speech, decision and other aspects of the social and personal lives of people. Language is the most prominent and the most available aspects of individuals’ personality. Meanwhile, written text is one of the most utilized appearance of language. Developing the Internet based infrastructure such as social media, e-mails, and different texting contexts, have made the language appearance of people more available. Consequently, considering the increasing of internet based communications, it would be so exciting to became aware of individuals’ personality, inspite of their absence. Therefore, the involvement of computers in determining the personality of people seems necessary and turned into a study field in computer science.

Automatic Personality Prediction (or Perception) (APP) is the automatic prediction of the personality of individuals and usually done by computers. With the increasing variety of data types available for analysing the personality of people, aspects of view to APP increases likewise. In this point of view to the assortment of APP, data types can be named as: speech Jothilakshmi2017; Su2018; Gilpin2018; Mohammadi2012, image Sang2016; Allen2016; Chaudhari2019; Lokhande2017, video Kindiroglu2017; Aslan2019, text Ramezani2020; Han2020; Xue2021, social media activities Zhu2020; tadesse_2018; Lima2014, touch screen interaction kuster2018; roy_roy_sinha_2018, and so on. Also, each of these has subsets and divisions of text-based APP which can be mentioned are email Shen2013, SMS Yakoub2015, and tweets & posts on social media Arnoux2017. Thereby, the key standpoint of this study is analysing APP methods through text data type and NLP.

Personality should be measured and classified to make it more comparative, and this goes back to psychology. Psychologist put forward many personality trait models; such as Allport’s trait theory

Allport1937, Cattell’s 16 Factor Model Cattell1970 (Table 14 shows characteristics of traits), Eysenck Personality Questionnaire(EPQ) Eysenck1975, Myers-Briggs Type Indicator (MBTI) Briggs1976, and Big Five P.John1999. Among these, two models; MBTI and Big Five are popular and widely used models, especially in APPs. MBTI has four main dimensions, Introversion versus Extraversion (I-E), Sensing versus iNtuiting (S-N), Thinking versus Feeling (T-F), and Judging versus Perceiving (J-P), that each people categorized in two dimensions. Figure 1 defines each MBTI dimension characteristics. The second popular personality model is Big Five. This model consists of five traits, and people may get in one or more trait. Also, two different approaches are taken thus binary modelling (0 and 1 for each trait) or continuous modelling (each trait get a value in range 0 to 1); those two approaches are being used in APP datasets. Openness, Conscientiousness, Extroversion, Agreeableness, and Neuroticism are traits of Big Five which are called OCEAN in abbreviative. Table 1 illustrates characteristics in each OCEAN trait.

Figure 1: An overview of MBTI personality traits model Amirhosseini2020.

max width=
Description of LOW values -
Personality trait + Description of HIGH values
Openness (O)

• Dislikes changes
• Does not enjoy new things • Conventional • Resists new ideas • Prefers familiarity • Not very imaginative • Has trouble with abstract or theoretical concepts • Skeptical • Traditional in thinking • Consistent and cautious
• Very creative • Clever, insightful, daring, and varied interests • Embraces trying new things or visiting new places • Unconventional • Focused on tackling new challenges • Intellectually curious • Inventive • Happy to think about abstract concepts • Enjoys the art • Eager to meet new people

Conscientiousness (C)

• Easy going and careless • Messy and less detailed-oriented • Dislikes structure and schedule • Fails to return things or put them back, where they belong • Procrastinates on important tasks and rarely completes them on time • Fails to stick to a schedule • Is always late when meeting others

• Competent and efficient • Goal- and detail-oriented • Well organized, self-discipline and dutiful • Spends time preparing • Predictable and deliberate • Finishes important tasks on time • Does not give in to impulses • Enjoys adhering to a schedule • Is on time when meeting others • Works hardly • Reliable and resourceful • Persevered

Extroversion (E)

• Introspective • Solitary and reserved • Dislikes being at the center of attentions • Feels exhausted when having to socialize a lot • Finds it difficult to start conversations • Dislikes making small talks • Carefully thinks things before speaking • Thoughtful

• Outgoing and energetic • Assertive and talkative • Able to be articulate • Enjoys being the center of attentions • Likes to start conversations • Enjoys being with others and meeting new people • Tendency to be affectionate • Finds it easy to make new friends • Has a wide social circle of friends and acquaintances • Says things before thinking about them • Feels organized when around other people • Social confidence

Agreeableness (A)

• Challenging and detached • Takes little interest in others • Can be seen as insulting or dismissive of others • Does not care about other people’s feelings or problems • Can be manipulative • Prefers to be competitive and stubborn • Insults and belittles others

• Friendly and compassionate toward others • Altruist and unselfish • Loyal and patient • Has a great deal of interest in and wants to help others • Feels empathy and concern for other people • Prefers to cooperate and be helpful • Polite and trustworthy • Cheerful and considerate • Modest

Neuroticism (N)

• Emotionally stable • Deals well with stress • Rarely feels sad or depressed • Does not worry much and is very relax • Confident and secure • Optimist

• Anxious of many different things and nervous
• Experiences a lot of stress • Irritable • Impulsive and moody • Jealous • Lack of confidence • Self-criticism • Oversensitive • Instable and insecure • Timid • Pessimist

Table 1: An overview of Big Five personality traits model Ramezani2020.

In recent years, the NLP field is faced with a revolution that APP does not get its benefits. In this study, APP articles from 2010, which involve textual inputs, reviewed in three categories: classical text representation and feature extraction methods, articles assisted novel pre-trained word representations, and methods with multimodal approaches (besides text, other data types included).

The rest of this study is organized in the following manner: in section 2, materials and methods which are used as the baselines of APP studies introduced briefly. Section 3 are the overview of methods and consist of three sub-sections. The results of studies are structured based on datasets and are shown in section 4; the datasets are also explained in this section. Finally, some concluding remarks are given in section 5.

2 NLP materials in personality prediction

The approach of this study is to overview researches in APP, which conducted on texts. To this end, material and methods of text analysis are given in this section briefly. Linguistic Inquiry and Word Count (LIWC) is one the most used and developed tools of APP and in section 2.1

has been reported. The next material based on NLP is called MRC and is a dictionary psycholinguistic in English. The last one is about Embedding techniques that represent words for text analysis, typically in the form of a real-valued vector that encodes the meaning of the word such that the words that are closer in the vector space are expected to be similar in meaning.

2.1 Linguistic Inquiry and Word Count (LIWC)

LIWC (Linguistic Inquiry and Word Count) is introduced by Pennebaker1996 and developed in years Pennebaker1999; Tausczik2010 as NLP tools for the psychological purpose. LIWC is a text analysis tool that provides statistical reports that are very useful in determining texts to aim for emotional and cognitive analysis of people. Since 2001, two updated versions of LIWC are introduced in 2007 and 2015. In each version some features were added, and table 2 show all features reported by LIWC and the deferences between the last two versions. LIWC reports consist of 91 features in 15 categories. This test could be done online in

LIWC Dimension Output Label LIWC2015 Mean LIWC2007 Mean LIWC 2015/2007 Correlation
Word Count WC 11,921.82 11,852.99 1.00
Summary Variable
Analytical Thinking Analytic 56.34
Clout Clout 57.95
Authentic Authentic 49.17
Emotional Tone Tone 54.22
Language Metrics
Words per sentence WPS 17.40 25.07 0.74
Words 6 letters Sixltr 15.60 15.89 0.98
Dictionary words Dic 85.18 83.95 0.94
Function Words
Function Words function 51.87 54.29 0.95
  Total pronouns pronoun 15.22 14.99 0.99
    Personal pronouns ppron 9.95 9.83 0.99
      1st pers singular i 4.99 4.97 1.00
      1st pers plural we 0.72 0.72 1.00
      2nd person you 1.70 1.61 0.98
      3rd pers singular shehe 1.88 1.87 1.00
      3rd pers plural they 0.66 0.66 0.99
    Impersonal pronouns ipron 5.26 5.17 0.99
Articles article 6.51 6.53 0.99
Prepositions prep 12.93 12.59 0.96
Auxiliary verbs auxverb 8.53 8.82 0.96
Common adverbs adverb 5.27 4.83 0.97
Conjunctions conj 5.90 5.87 0.99
Negations negate 1.66 1.72 0.96
Grammar Other
Regular verbs verb 16.44 15.26 0.72
Adjectives adj 4.49
Comparatives compare 2.23
Interrogatives interrog 1.61
Numbers number 2.12 1.98 0.98
Quantifiers quant 2.02 2.48 0.88
Affect Words
Affect Words affect 5.57 5.63 0.96
   Positive emotion posemo 3.67 3.75 0.96
   Negative emotion negemo 1.84 1.83 0.96
     Anxiety anx 0.31 0.33 0.94
     Anger anger 0.54 0.60 0.97
     Sadness sad 0.41 0.39 0.92
Social Words
Social Words social 9.74 9.36 0.96
   Family family 0.44 0.38 0.94
   Friends friend 0.36 0.23 0.78
   Female referents female 0.98
   Male referents male 1.65
Cognitive Processes
Cognitive Processes cogproc 10.61 14.99 0.84
   Insight insight 2.16 2.13 0.98
   Cause cause 1.40 1.41 0.97
   Discrepancies discrep 1.44 1.45 0.99
   Tentativeness tentat 2.52 2.42 0.98
   Certainty certain 1.35 1.27 0.92
   Differentiation differ 2.99 2.48 0.85
Perpetual Processes
Perpetual Processes percept 2.70 2.36 0.92
   Seeing see 1.08 0.87 0.88
   Hearing hear 0.83 0.73 0.94
   Feeling feel 0.64 0.62 0.92
Biological Processes
Biological Processes bio 2.03 1.88 0.94
   Body body 0.69 0.68 0.96
   Health/illness health 0.59 0.53 0.87
   Sexuality sexual 0.13 0.28 0.76
   Ingesting ingest 0.57 0.46 0.94
Core Drives and Needs
Core Drives and Needs drives 6.93
   Affiliation affiliation 2.05
   Achievement achieve 1.30 1.56 0.93
   Power power 2.35
   Reward focus reward 1.46
   Risk/prevention focus risk 0.47
Time Orientation
Past focus focuspast 4.64 4.14 0.97
Present focus focuspresent 9.96 8.10 0.92
Future focus focusfuture 1.42 1.00 0.63
Relativity relativ 14.26 13.87 0.98
   Motion motion 2.15 2.06 0.93
   Space space 6.89 6.17 0.96
   Time time 5.46 5.79 0.94
Personal Concerns
Work work 2.56 2.27 0.97
Leisure leisure 1.35 1.37 0.95
Home home 0.55 0.56 0.99
Money money 0.68 0.70 0.97
Religion relig 0.28 0.32 0.96
Death death 0.16 0.16 0.96
Informal Speech
Informal Speech informal 2.52
   Swear words swear 0.21 0.17 0.89
   Netspeak netspeak 0.97
   Assent assent 0.95 1.11 0.68
   Nonfluencies nonfl 0.54 0.30 0.84
   Fillers filler 0.11 0.40 0.29
All Punctuation
Periods Period 7.46 7.91 0.98
   Commas Comma 4.73 4.81 0.98
   Colons Colon 0.63 0.63 1.00
   Semicolons SemiC 0.30 0.24 0.98
   Question marks QMark 0.58 0.95 1.00
   Exclamation marks Exclam 1.0 0.91 1.00
   Dashes Dash 1.19 1.38 0.98
   Quotation marks Quote 1.19 1.38 0.76
   Apostrophes Apostro 2.13 2.83 0.76
   Parentheses (pairs) Parenth 0.52 0.25 0.90
   Other punctuation OtherP 0.72 1.38 0.98
Table 2: Comparison of LIWC 2015 and LIWC 2007 LIWC_Comparison.

Mairesse2007 developed a method based on LIWC on the Essays dataset. The authors fed the Essays dataset to LIWC, and the outputs contain LIWC features with a personality trait label in the Big Five dimension. Since the LIWC queries are not free, Mairesse dataset deploys as LIWC features in most research. Mairesse developed a framework and is available in, but it should be noted that it is a while that the framework did not support.

2.2 MRC Psycholinguistic Database

MRC is a publicly available machine usable dictionary that includes different (up to 26) linguistic and psycholinguistic attributes for 150,837 English words. Different semantic, syntactic, phonological, and orthographic details about the words have made it suitable for miscellaneous purposes of researches in psychology, linguistics and artificial intelligence. Word association data are also included in the database. The first version was introduced in 1981

Coltheart1981 and the second and last version wilson_1988 is now available in with the details and statistics.

2.3 Embedding techniques

Any input should model to understand by computer and writings are no exception, so embeddings duty is that. The smallest meaningful segment of writing is words, and that is why fundamental is word embedding in this task. Typically, each word or token represent by a vector. So basic embedding is one-hot thus a dictionary of words generate then each word represented by a vector that only one cell’s value is one and other are zero, and vector size equals to dictionary size. In this representation, vectors are orthogonal, so there is no semantics and relation between words. Moreover, in large corpuses vector size get large and make need storage to save and handle it.

The problems and disabilities of one-hot vector generate a felt need for new embeddings. Word2Vec Mikolov2013; Mikolov2013a was the first word embedding able to map words to vectors considering to semantic. Word2Vec became the cornerstone of other embedding techniques with their facilities. FastText Bojanowski2017 and GloVe pennington2014glove

are examples in evolution word embedding techniques. All of the embeddings should be trained, so there are some pre-trained language models (PLMs) with differences in trained database and attitude of training (CBOW and Skip grams).

Transformers vaswani2017attention 111More information on minaee2020deep introduced in 2018, made a revolution in embedding techniques thus make more parallelism possible than other architectures (such as CNNs and RNNs). Hence, computers get able to train larger models, since large-scale Transformer-based PLMs appeared. The most well-known transformer-based PLM could be named BERT Devlin2018. Based on BBERT, numerous models arise by a different point of view; namely, RoBERTa Liu2019 (robust and larger), AlBERT lan2020albert (high-speed training and lower memory), DistilBERT sanh2020distilbert (40% smaller and 60% faster).

3 Methods (Overview)

In natural language processing, representation of input is the most important component, and the smallest part of a text is words. In recent years novel representations called pre-trained word embeddings have been got trends and make revolutions in text mining. APP is not unaffected, and novel methods based on word embedding techniques have appeared. In reviewing APPs, at first methods without pre-trained word embedding techniques describes, secondly pre-trained word embedding based methods details, and at the end methods with more than text input introduces.

3.1 PLM free APPs

Poria2013 proposed a combinational algorithm for detecting personality using LIWC and MRC on textual features. 81 LIWC features plus 26 MRC features extracted from the Essays dataset. The proposed EmoSenticSpace was a novel representation method on a graph of EmoSenticNet havasi2007conceptnet and fed to a blending algorithm. The output is a 100-dimensional vector and by ”bag of concept”, averaging done to vectors to represent a text. The authors trained five Sequential Minimal Optimization (SMO) classifier, one for each of the big five traits.

Verhoeven2013, proposed an ensemble model for recognizing personality from Facebook. The authors trained three SVM classifiers, the first one trained on Facebook. The second is trained on the Essays dataset, and as the meta classifier, the output of two classifiers is fed to the third SVM classifier.

Part of speech (POS) is a basic feature of texts. In Wright2014

, authors used POS & POS n-grams of texts beside bag of words, word sentiment, negations, and vocabulary size features to predicting personality. As a classifier, SVM is deployed in two formats; 2-class and 3-class. 2588 university students essays collected on the Big Five personality scale to evaluate the method.

LIWC consists of 19 features, and Tighe2016

tried to reduce feature size for the purpose of achieving better results on the Essays. In this way, Information Gain and Principal Component Analysis (PCA) feature reduction techniques have been examined. It is proved that some LIWC features do not have adequate information on Essays for APP.

The early word embedding models, e.g. Word2Vec, have some common practical problems that make using them hard and somehow restricted. The first one is unseen words; if a word does not see in the learning stage, make the model in trouble. The second problem occurs once you want to train a model, and there are too many parameters. Due to these problems, FLiu2016

proposed an embedding model to embed personality texts. C2W2S4PT (Character to Word to Sentence for Personality Traits) is a three-stage Bi-RNN based model; firstly, characters modelling; secondly, word modelling based on the first stage; third, sentence embedding using words representation using feedforward neural network. In figure

4 the proposed architecture is illustrated. In Liu2017 the proposed C2W2S4PT evaluated on English, Spanish and Italian languages to proving language independency of C2W2S4PT.

In Zheng2019 research, to take advantage of huge unlabeled data, Pseudo Multi-view Co-training (PMC) chen2011automatic

is adopted, an effective Semi-supervised learning algorithm, to build a personality prediction model. To extract adequate linguistic features, both LIWC and n-gram, along with the Word2Vec word embedding technique, is trained on mypersonality dataset to predict personality through textual data. Figure

8 illustrates the overall framework of method.

Personality2Vec Guan2020 is the name of a user-personality embedding technique through a user has generated texts. Semantic and linguistic features of texts constructs graph and a biased walk strategy has been proposed to divide users into groups with maximum similar personality users in the same group. As shown in fig 10, in the first linguistic and semantic features extract in order to pass into the learning part. Linguistic features are 103-dimension of LIWC and 10-dimension of special linguistic features, which have been proposed by the authors.

Paper Sun2020, deployed network representation learning (NRL) as the novelty and for the first time in APP. AdaWalk generates the graph of documents on personality datasets in two approaches: classification and regression. NRL presented node(words or token) by using SkipGram.

Another graph-based approach called personality graph convolutional networks (personality GCN) introduced by Wang2020_Encoding. The authors aimed to create a graph to model users, documents, and words by the core of the co-occurrence of words in a document. The weight of edges is calculated by TF-IDF for a document-word edge and pointwise mutual information (PMI) for a word-word edge. The classification layer is the last one, five classifier for Big Five traits. Figure 9 illustrate an overview of the three layers of GCN. It is worthwhile to say that words, users, and documents represented by a one-hot vector.

One of the most recent research in this area proposed five new APP methods: term frequency vector-based, ontology-based, enriched ontology-based, latent semantic analysis (LSA)-based, and deep learning-based (BiLSTM), by a contribution of enhancing accuracy Ramezani2020. These five introduced models used as the base to hierarchical attention network (HAN) ensemble model. The authors evaluated the methods on the Essays dataset and achieved enhancing accuracy on the ensemble method. The architecture of the proposed HAN stacking model is shown in figure 2.

Figure 2: The architecture of a Hierarchical Attention Network (HAN) combined with base methods’ predictions to carry out stacking in document level classification proposed by Ramezani2020.

3.2 PLM-based APPs

The first research that deployed a PLM to APP is done by Majumder2017 based on Word2Vec. This research deployed two approaches, Document-level using Mairesse features and Word-level using Word2vec and CNN model to achieve 300 dimension representation of words to model sentences and documents, respectively, by n-gram sigh of view. Both approaches’ representations concatenated to fed to a fully connected layer for classification (Figure 3 illustrate the architecture of the proposed method). Five models, one for each of five traits, trained on the Essays dataset.

2CLSTM is the name of the model proposed by Sun2018 that tries to learn structural features based on latent sentence group (LSG). At the first step, each word embedded into a 100-d dimension vector through GloVe pre-trained model. 2LSTM encodes the vectors into sentences that pass to the next section. The following sections have to model relationships of sentences, and LSG does this. Latent Sentence Group (LSG) is defined as a synthesis that consists of a number of sentence vectors which are closely connected in some coordinates. To do this, CNN networks deployed to learn 1,2,3-grams. Figure 7

illustrates the layers and schema of 2CLSTM. A dense layer and max-pooling layer follow immediately to generate the final vector to pass into a classifier which there was softmax.

In Darliansyah2019, a personality prediction methods by using sentiment of short texts is introduced by naming SENTIPEDE. The aim was to determine personality in Big Five by using textual features with sentiment features of short texts which the base was Twitter. The sentiment of text compute by Valence Aware Dictionary and Sentiment Reasoner (VADER) hutto2014vader, a rule-based framework for sentiment classification in English, to gain a label from positive, negative, or nonpartisan. Then, GloVe word embedding is deployed to vectorise words to pass to a CNN-LSTM model along with the sentiment labels. The case study of the paper was Uber and related tweets.

In introducing a new dataset, some base methods and evaluation of them by a standard dataset are necessary. FriendPersona is the name of a new dataset introduced by Jiang2020. In Jiang2020, five models have been developed, which names are: ABCNN (CNN with attention mechanism), ABLSTM (Bidirectional LSTM attention mechanism), HAN (Hierarchical Attention Network), BERT, RoBERTa. BERT and RoBERTa as a PLM are fine-tuned on both datasets, Essays and FriendPersona.

In paper Mehta2020, two approaches have been studied; personality prediction based on psycholinguistic features and language model features. In both approaches, features extracted from texts are fed to a classifier, SVM and MLP, to classify texts into personality traits. Mairesse, SenticNet cambria2018senticnet

, NRC Emotion Lexicon

mohammad2013crowdsourcing, VAD Lexicon mohammad-2018-obtaining, and Readability222

A number of calculated readability mea- sures based on simple surface characteristics of the text. These measures are basically linear regressions based on the number of words, syllables, and sentences.

are the extracted psychological features. On the other hand, BERT PLM is deployed as a language model features extractor.

Kazameini2020 is segmented documents into 250 tokens length sub-documents in order to feed to BERT(base) PLM. Based on the essays dataset used in this paper, documents length is 650 tokens on average, so four layers of [CLS] concatenated with 84 Mairesse features as features of a document. SVM is the classifier of the proposed method, which is trained on sub-documents in parallel mode like a bagged classifier, and at last, the final trait is predicted by majority voting. Figure 11 shows the authors proposed method.

One of the recent APP methods combined semantic and emotional features in order to determine personality trait from multi-text Ren2021. On the semantic side, BERT is deployed to vectorize texts and using a self-attention mechanism, sentence-level representation is generated. On the other side, SenticNet5 AAAI1816839 has extracted the sentiment of the sentences to map to a vector. Both vectors are concatenated and fed to a classification network. CNN, GRU and LSTM are different neural networks that trained to label personality trait.

SEPRNN (semantic-enhanced personality recognition neural network) Xue2021

is proposed with the goal of avoiding dependency to feature selection in APP and modelling semantic from word-level representations. GloVe plm is deployed to vectorize words, then a BiGRU model learned to extract left and right context of words, but since semantics did not consider. To capture the higher level of semantic representations from contexted textual data, vectors are fed to a fully connected network to text modelling of documents. In the end, a fully connected network with sigmoid activation function is adopted to learn a two-dimensional vector for binary classification of personality.An illustration of the proposed method is displayed in figure


Transformer-MD (multi-document Transformer) is the name of the method proposed by Yang2021. The core of the proposed method is to put together information in multiple posts to represent an overall personality for each user. Authors tried to solve two problems: post-orders bias into posts on personality and individually posts processing of a person to personality detection. In order to this, encoding each post by Transformer-MD allows access to information in the other posts of a user through Transformer-XL’s memory tokens that share the position embedding. For the cases of multi-trait personality detection, a dimension attention mechanism on top of Transformer-MD was set. The overview of the proposed methods is shown in figure 13.

3.3 Multimodal APPs

One of the first deep learning used methods in multimodal approaches deployed text in couple with authors information to achieve the writer’s personality Yu2017. According to the limitations and obstacles of pre-trained word embedding methods in 2017, word2vec, the authors trained their word embedding model based on skip-gram using the mypersonality dataset. In the trained model, modelling words position was not taken into account, so applying N-grams seemed the best approach. In order to apply this approach, two approaches were taken, CNN-based and Bi-RNN-based. Figure 5 shows the architecture of Yu2017. After the word modelling part, the author’s information is 7-D vector concatenate and goes on. Each of the five personality traits has been gotten trained in a neural network.


explored the use of machine learning techniques for inferring a user’s personality traits from their Facebook status updates. four kinds of numeric features:

  • LIWC features

  • Social Network features: 7 features related to the social network of the user: (1) network size, (2) betweenness, (3) nbetweenness, (4) density, (5) brokerage, (6) nbroker- age, and (7) transitivity.

  • Time-related features: 6 features related to the time of the status updates (we assume that all the times are based on one time zone): (1) frequency of status updates per day, (2) number of statuses posted between 6-11 am, (3) num- ber of statuses posted between 11-16, (4) number of sta- tuses posted between 16-21, (5) number of statuses posted between 21-00, and (6) number of statuses posted be- tween 00-6 am.

  • Other features: 6 features not yet included in the categories above: (1) total number of statuses per user, (2) number of capitalized words, (3) number of capital let- ters, (4) number of words that are used more than once, (5) number of urls, and (6) number of occurrences of the string PROPNAME— a string used in the data to replace proper names of persons for anonymisation purposes.

Different concatenation of features is analysed with 3 classification algorithms named SVM, KNN, and Naive Bayes.


proposed a comprehensive view to APP multimodal approach that accompanied texts, avatars, and emojis. Pearson correlation, Text-CNN, and Bag-of-Word (BOW) clusters are the textual-based features extracted from the Weibo tweets collected in the research. Pearson correlation computed between words and the personality traits to selecting top 2000 words are strongly correlated to personality. In order to LIWC limited capability in representing users linguistic patterns in short and informal texts, 1,500 Chinese words and all the punctuations in the bag-of-words format clustered using the k-means algorithm and then the count the number of items within each cluster used instead of LIWC. As the last feature, a convolutional architecture called Text-CNN trained to models words in vector form in reference to

kim-2014-convolutional model. The structure of the proposed algorithm is shown in figure 6

. As seen, each type of inputs lasts for a classifier ((Logistic Regression) to specify the trait. As the final step, a stacked ensemble algorithm, generalization-based ensemble method, attached to the classifiers as the final result classifier.

It is common to utilize acoustically features simultaneously with transcripts of speech to achieve the personality of people. In An2018

, four features demonstrated in order to predict personality, namely, acoustic-prosodic low-level descriptor features (LLD), LIWC, Dictionary of Affect in Language (DAL), and word embeddings. DAL features extracted using Whissell’s Dictionary of Affect in Language (DAL), and 19 features extracted in this research. In the word embedding part, Google’s pre-trained skip-gram vectors and Stanford’s pre-trained GloVe has been used. Two approaches have been adopted for modelling documents based on embedding vectors, averaging and LSTM neural network. One strange thing in the proposed method is that both of PLMs vectors are fed to the model and concatenated with three other features. Moreover, two strategies proposed by the authors, first, concatenate features and then five fully-connected layers end with five neurons as the classifier; second, each of five features fed to a three fully-connected layers block before concatenating and then fed to five neurons for classification similarly.

4 Evaluating methods

Every proposed method should be evaluated to prove its performance. In APP evaluations consist of two parts, dataset is the first and metric of assessment is the second part. Five datasets are available in text-based APP, and metrics vary on the concept of evaluation. This section consists of two parts; part one details the datasets and part two defines results of methods in datasets and metrics.

4.1 Datasets

Each method should evaluate and compare it with other methods, and this requires a fair condition. To achieving fair compare, ground truths should be same including metrics and datasets. In this part, five datasets, Essays, MyPersonality, YouTube, FriendPersona, and Kaggle MBTI are benchmark in text-based APPs are introduced.


Essays (also called stream-of-consciousness essays) is the first and most cited text dataset in Automatic personality prediction. The dataset introduced by Pennebaker1999 that consist of 2468 anonymous essays in English annotated in Big Five scale. The dataset annotated in two modes; classification and regression. Thus each essay has two Big Five values; first, each trait has a binary value; second, each trait is real value in the aim of regression. The Essay mainly deployed in classification purpose and table 3 shows the number of essays in each trait. In should be notated that one row of data was an error and dismissed in the table’s values. Moreover, the distribution of essay’s components reported in table 4.

Label Opn Con Ext Agr Neu
Yes 1271 1253 1276 1310 1233
No 1196 1214 1191 1157 1234
Table 3: Distribution of the Essays dataset labels.
Sentences Characters Spaces Words Punctuations Stop words
mean 46.6250 3296.9059 702.8232 668.6643 53.8293 318.8873
std 26.7925 1287.5402 286.0818 262.4843 31.6477 134.2972
min 1 159 34 35 0 0
25% 28 2407.5 506 482.5 32 227
50% 43 3165 676 646 49 302
75% 61 4081 867.5 828 69 396
max 307 12855 3859 2631 348 1340
Table 4: The elements count of Essay dataset.


myPersonality333 is a collection of 250 anonymous Facebook users profile updates scored in Big Five by questioning users to answer them. Since 2018 creators, Stillwell and Kosinski stopped sharing and developing the dataset. There are some versions available on the internet that do not match on records but approximately it contained 9900 records. The myPersonality annotated in two forms, classification and regression, and table 5 & 6 illustrate the distribution of status updates of myPersonality in each form.

Label Opn Con Ext Agr Neu
Yes 176 130 96 134 99
No 74 120 154 116 151
Table 5: Distribution of labels in myPersonality classificational mode. Wang2020_Encoding
Opn Con Ext Agr Emo
Max 5 5 5 5 4.75
Min 2.25 1.45 1.33 1.65 1.25
Avg 4.0786 3.5229 3.2921 3.6003 2.6272
Std 0.5751 0.7402 0.8614 0.5708 0.7768
Table 6: The distribution of myPersonality dataset labels in regresional mode. Darliansyah2019


Youtube is the most popular video-sharing platform and has attracted many vloggers. YouTube dataset is a collection of 404 YouTube vloggers personality scores in big five. The dataset is recorded talking in front of webcam about a variety of topics and annotated using Amazon Mechanical Turk and the Ten-Item Personality Inventory (TIPI). As mentioned, YouTube dataset is a multimodal, video, speech, and transcript to text Biel2013; Biel2013text. Based on the aim of the article, speech transcripts to text Biel2013text is analyzed in this section. As the routine, distribution of traits has been shown in table 7 and textual elements details are shown in table 8.

Opn Con Ext Agr Emo
Max 6.3 6.2 6.6 6.5 6.5
Min 2.4 1.9 2 2 2.2
Avg 4.6642 4.4971 4.6248 4.6825 4.7662
Std 0.7154 0.7701 0.9771 0.8793 0.7790
Table 7: The distribution of labels in YouTube dataset.
sentences characters spaces words duplicates punctuations stop words
mean 41.3490 3040.1905 595.0767 615.6014 95.5792 149.0519 261.5272
std 28.3491 2133.3875 415.7767 429.1693 51.3635 107.4581 189.7467
min 2 155 24 25 5 5 5
25% 21 1472 287.75 299.75 55 71.25 120
50% 34 2351.5 466 483.5 85 117.5 201.5
75% 55 3936.25 767.50 794 123 192.25 345
max 158 10814 2031 2099 254 573 961
Table 8: The elements count of Texts in YouTube dataset.


The newest dataset introduced in the context of text-based APP is FriendPersona. This is developed on Friends TV Show Dataset444 Chen2016 and 711 conversation extracted from. FriendPersona annotated by three experts and to make it binary, split from the median. The dataset could be found in github555

Kaggle MBTI

Kaggle MBTI666Available on has gathered 50 last posts through the PersonalityCafe forum in MBTI personality model. There are 8675 rows of data that each row represents a person. The dataset is started on cognitive functions by Carl Jung and finally personality tags done by Jungian Typology in MBTI.

4.2 Results

In this part, firstly evaluation metrics used in APP introduced and then reported results of methods appeared in the division of datasets. Each following tables survey a dataset presented in the previous part.

4.3 Evaluation metrics

Precision, recall, accuracy, and F-measure are well-known Classification evaluation metrics are using in scientific reports. For calculating these measures, four concepts should be defined. TP, TN, FP, and FN are the notions and denotes to true positive, true negative, false positive, and false negative, respectively. These concepts make sense based on ground truth in confronting with the output of the system. There are other measurements for evaluating classification (Regression) methods such as RMSE (Root Mean Square Error), MAE (Mean Absolute Error), and Coefficient of Determination (). Still, most articles prefered the binary classification measures and reported in four first measures. The following equations are the calculation of the metrics.


F-1 score (Eq.4

) translates to harmonic mean of precision (eq.

1) and recall (eq.2

) and that is why most studies report F1 score to make readers understand easier instead of reporting precision and recall.


As respect of lack of text-based personality prediction studies, researchers reported the results on their aspect without much comparison, and this makes variety in results measurement units. RMSE (eq. 5) and MAE (eq. 6) has been used in Carducci2018, Xue2018, however can not be compared cause of non comparable studies in same dataset.

4.4 Discussion

For evaluating and making results comparative and comprehensive, the results of each dataset are given in separate tables. The Essays is the most popular dataset in APP, and the number of results is more than the others. According to the classification essence of Essays and MyPersonality datasets, the evaluations are done by F-measure and Accuracy metrics. Augmentation of APP methods have been accelerated more rapidly with the advent of deep learning, and the increase in the quality of results is evident in Essays datasets more obviously. As it is shown in table 9, the results of each method that powered the newer PLMs and novel deep learning method are achieved higher values compared with the previous method. The point that has to be considered is that some methods reported results insufficiently; thus, solely one evaluation metric is listed by personal opinion. Among the methods that do not use word embeddings and evaluated on the Essays dataset, Ramezani2020 is got better results. Since the MyPersonality dataset is not available by the creators anymore, it is not used in recent researches. However, Wang2020_Encoding achieved the best results, as it is shown in table 10, which does not apply PLMs for embedding.

Method F-measure Accuracy
Opn Con Ext Agr Neu Avg Opn Con Ext Agr Neu Avg
Xue2021 67.84 63.46 71.50 71.92 62.36 67.416 63.16 57.49 58.91 57.49 59.51 59.312
Ren2021 80.35 80.23 79.94 80.30 80.14 80.192
Ramezani2020 57.37 59.74 65.80 61.62 60.69 61.04 56.30 59.18 64.25 60.31 61.14 60.24
Wang2020_Encoding 67 68 67 69 69 68 64.80 59.10 60 57.70 63 60.92
Jiang2020 65.86 58.55 60.62 59.72 61.04 61.158
Mehta2020 64.6 59.2 60 58.8 60.5 60.62
Kazameini2020 62.09 57.84 59.30 56.52 59.39 59.028
Majumder2017 62.68 57.30 58.09 56.71 59.38 58.832
Tighe2016 61.9 56 55.6 55.7 58.3 57.7 61.95 56.04 55.75 57.54 58.31 57.918
Verhoeven2013 56 54 53 50 54 53.4
Poria2013 66.1 63.3 63.4 61.5 63.7 63.6
Mohammad2013 60.57 56.46 56.28 53.9 58.15 57.072
Table 9: Experimental results on Essay dataset. The boldface values indicate the best values in each trait.
Method F-measure Accuracy
Opn Con Ext Agr Neu Avg Opn Con Ext Agr Neu Avg
Wang2020_Encoding 76.8 75 85 70 79 77.16 80 76 80 68 79 76.6
Zheng2019 65 62 71 68 70 67.2
Hassanein2019 65 64
Farnadi2013 61 54 56 45 49 53
Table 10: Experimental results on MyPersonality dataset. The boldface values indicate the best values in each trait.
Method F-measure RMSE
Opn Con Ext Agr Emo Opn Con Ext Agr Emo
Xue2021 78.57 77.20 82.35 83.08 79.55
Sun2020 0.6813 0.6898 0.8897 0.7743 0.6872
Table 11: Experimental results on YouTube dataset.

As mentioned in Sec 4.1, FriedndPersona is the most recent APP dataset that was only introduced for the proposed method on Jiang2020 and are shown in table 12. But in comparison with reported results on Essays (table 9) the evaluations are acceptable because the proposed model achieved in range of 60% (63.01% on FriendPersona and 61.158% on Essays).

Method F-measure Accuracy
Opn Con Ext Agr Neu Opn Con Ext Agr Neu
Jiang2020 68.47 56.78 64.21 65.58 60.06
Table 12: Experimental results on FriendPersona dataset.

The evaluations on the Kaggle dataset are shown in table 13 and done using f-measure and accuracy metrics. The best-performed method is Khan2020

that deployed XGboost ensemble algorithm to gain high values on both metrics. To the rest methods, three reported accuracies and one reported f-measure only that does not have any subscription, so methods with the same reported metrics should be compared.

Method F-measure Accuracy
Khan2020 98.56 99.75 94.72 96.19 99.37 99.92 94.55 95.53
Yang2021 66.08 69.10 79.19 67.50
Amirhosseini2020 78.17 86.06 71.78 65.70
Mehta2020 78.8 86.3 76.1 67.2
Table 13: Experimental results on Kaggle MBTI dataset. The boldface values indicate the best values in each trait.

In the end, as it can be concluded from the tables, deployment of PLMs for words and documents embeddings in couple with deep neural network models makes a significant improvement in textual-based APPs. However, somehow, it seems that the hybrid and ensemble models (especially PLM-free) are achieving better results in their own class and could be a progressive area of research. Also, the APP research area suffers from a lack of golden standard datasets and evaluations. As the final recommendation for future work, introducing novel datasets labelled on more than one personality trait model, with more samples and different lengths of documents, can make the APP research area more attractive and competitive.

5 Conclusion

Automatic personality prediction (perception) (APP) system provides an opportunity to predict personality traits based on human behaviour manifestations, especially texts in this review. This paper is reviewed text-based APP methods since 2010 and reported results by five well-known benchmark datasets. Also, the framework of overviewed methods has been collected. The aim of this review is to give a general overview of the steps of getting meliorate of APPs to researchers in this field.

This project is supported by a research grant of the University of Tabriz (number S/806).



This project is supported by a research grant of the University of Tabriz (number S/806).

Conflict of interest

The authors declare that they have no conflict of interest.

5.1 Data availability

Data sharing not applicable to this article as no datasets were generated or analysed during the current study.


Appendix A Appendix

max width=
Description of LOW values -
Personality trait + Description of HIGH values
Warmth (A)
• Impersonal • Distant • Cool • Reserved • Detached • Formal • Aloof
• Warm • Outgoing • Attentive to others • Kindly • Easygoing • Participating • Likes people

Reasoning (B)

• Concrete-thinking • less intelligent • lower general mental capacity • unable to handle abstract problems

• Abstract-thinking • more intelligent • higher general mental capacity • bright • fast-learner

Emotional Stability (C)

• Reactive emotionally • changeable • affected by feelings • emotionally less stable • easily upset

• Emotionally stable • adaptive • faces reality calmly • mature

Dominance (E)

• Deferential • cooperative • avoids conflict • submissive • humble • obedient • easily led • docile • accommodating

• Dominant • forceful • assertive • aggressive • competitive • stubborn • bossy

Liveliness (F)

• Serious • restrained • prudent • taciturn • introspective • silent

• Lively • animated • spontaneous • enthusiastic • happy-go-lucky • cheerful • expressive • impulsive

Rule-Consciousness (G)

• Expedient • nonconforming • disregards rules • self-indulgent

• Rule-conscious • dutiful • conscientious • conforming • moralistic • staid • rule-bound

Social Boldness (H)

• Shy • threat-sensitive • timid • hesitant • intimidated

• Socially bold • venturesome • thick-skinned • uninhibited

Sensitivity (I)

• Utilitarian • objective • unsentimental • tough-minded • self-reliant • no-nonsense • rough

• Sensitive • aesthetic • sentimental • tender-minded • intuitive • refined

Vigilance (L)

• Trusting • unsuspecting • accepting • unconditional • easy

• Vigilant • suspicious • skeptical • distrustful • oppositional

Abstractedness (M)

• Grounded • practical • prosaic • solution oriented • steady • conventional

• Abstract • imaginative • absentminded • impractical • absorbed in ideas

Privateness (N)

• Forthright • genuine • artless • open • guileless • naive • unpretentious • involved

• Private • discreet • nondisclosing • shrewd • polished • worldly • astute • diplomatic

Apprehension (O)

• Self-assured • unworried • complacent • secure • free of guilt • confident • self-satisfied

• Apprehensive • self-doubting • worried • guilt-prone • insecure • worrying • self-blaming

Openness to Change (Q1)

• Traditional • attached to familiar • conservative • respecting traditional ideas

• Open to change • experimental • liberal • analytical • critical • freethinking • flexibility

Self-Reliance (Q2)

• Group-oriented • affiliative • a joiner and follower dependent

• Self-reliant • solitary • resourceful • individualistic • self-sufficient

Perfectionism (Q3)

• Tolerates disorder • unexacting • flexible • undisciplined • lax • self-conflict • impulsive • careless of social rules • uncontrolled

• Perfectionistic • organized • compulsive • self-disciplined • socially precise • control • exacting will power • self-sentimental

Tension (Q4)

• Relaxed • placid • tranquil • torpid • patient • composed low drive

• Tense • high-energy • impatient • driven • frustrated • over-wrought • time-driven

Table 14: An overview of Cattell’s 16 cattel16-Wikipedia.
Figure 3: The architecture of method proposed in Majumder2017. The input layer corresponds to the sequence of input sentences (only two are shown). The next two layers include three parts, corresponding to trigrams, bigrams, and unigrams. The dotted lines delimit the area in a previous layer to which a neuron of the next layer is connected—for example, the bottom-right rectangle shows the area comprising three-word vectors connected with a trigram neuron.
Figure 4: The architecture of proposed model for representation of texts called C2W2S4PT (Character to Word to Sentence for Personality Traits). Dotted boxes indicate concatenation. FLiu2016, Liu2017
Figure 5: The proposed architecture inYu2017
(a) Framework of Heterogeneous Information Ensemble (HIE)
(b) The structure of Text-CNN.
Figure 6: The structure of proposed algorithm called HIE and text representation model. Wei2017
Figure 7: The architecture of proposed method in Sun2018 called 2CLSTM.
Figure 8: Pseudo Multi-view Co-training (PMC) based framework for persinality prediction Zheng2019.
Figure 9: Three layer architecture overview of proposed personality GCN Wang2020_Encoding.
Figure 10: Overall framework of Personality2Vec Guan2020.
Figure 11: The architecture of Bagged SVM over BERT Word Embedding technique Kazameini2020.
Figure 12: Illustration of SEPRNN Xue2021.
(a) Overview of Transformer-MD.
(b) Overview of dimension attention module.
Figure 13: The schema of proposed method named Transformer-MD Yang2021.