”The development of language is part of the development of the personality, for words are the natural means of expressing thoughts and establishing understanding between people.”, Maria Montessori.
The above quotation becomes the basis of what is present in this article, studying natural language processing in individual personality. Personality is defined as the characteristic set of behaviours, cognitions, and emotional patterns corr2009cambridge as well as thinking patterns kazdin2000encyclopedia, and its external appearance can be seen in writing, speech, decision and other aspects of the social and personal lives of people. Language is the most prominent and the most available aspects of individuals’ personality. Meanwhile, written text is one of the most utilized appearance of language. Developing the Internet based infrastructure such as social media, e-mails, and different texting contexts, have made the language appearance of people more available. Consequently, considering the increasing of internet based communications, it would be so exciting to became aware of individuals’ personality, inspite of their absence. Therefore, the involvement of computers in determining the personality of people seems necessary and turned into a study field in computer science.
Automatic Personality Prediction (or Perception) (APP) is the automatic prediction of the personality of individuals and usually done by computers. With the increasing variety of data types available for analysing the personality of people, aspects of view to APP increases likewise. In this point of view to the assortment of APP, data types can be named as: speech Jothilakshmi2017; Su2018; Gilpin2018; Mohammadi2012, image Sang2016; Allen2016; Chaudhari2019; Lokhande2017, video Kindiroglu2017; Aslan2019, text Ramezani2020; Han2020; Xue2021, social media activities Zhu2020; tadesse_2018; Lima2014, touch screen interaction kuster2018; roy_roy_sinha_2018, and so on. Also, each of these has subsets and divisions of text-based APP which can be mentioned are email Shen2013, SMS Yakoub2015, and tweets & posts on social media Arnoux2017. Thereby, the key standpoint of this study is analysing APP methods through text data type and NLP.
Personality should be measured and classified to make it more comparative, and this goes back to psychology. Psychologist put forward many personality trait models; such as Allport’s trait theoryAllport1937, Cattell’s 16 Factor Model Cattell1970 (Table 14 shows characteristics of traits), Eysenck Personality Questionnaire(EPQ) Eysenck1975, Myers-Briggs Type Indicator (MBTI) Briggs1976, and Big Five P.John1999. Among these, two models; MBTI and Big Five are popular and widely used models, especially in APPs. MBTI has four main dimensions, Introversion versus Extraversion (I-E), Sensing versus iNtuiting (S-N), Thinking versus Feeling (T-F), and Judging versus Perceiving (J-P), that each people categorized in two dimensions. Figure 1 defines each MBTI dimension characteristics. The second popular personality model is Big Five. This model consists of five traits, and people may get in one or more trait. Also, two different approaches are taken thus binary modelling (0 and 1 for each trait) or continuous modelling (each trait get a value in range 0 to 1); those two approaches are being used in APP datasets. Openness, Conscientiousness, Extroversion, Agreeableness, and Neuroticism are traits of Big Five which are called OCEAN in abbreviative. Table 1 illustrates characteristics in each OCEAN trait.
In recent years, the NLP field is faced with a revolution that APP does not get its benefits. In this study, APP articles from 2010, which involve textual inputs, reviewed in three categories: classical text representation and feature extraction methods, articles assisted novel pre-trained word representations, and methods with multimodal approaches (besides text, other data types included).
The rest of this study is organized in the following manner: in section 2, materials and methods which are used as the baselines of APP studies introduced briefly. Section 3 are the overview of methods and consist of three sub-sections. The results of studies are structured based on datasets and are shown in section 4; the datasets are also explained in this section. Finally, some concluding remarks are given in section 5.
2 NLP materials in personality prediction
The approach of this study is to overview researches in APP, which conducted on texts. To this end, material and methods of text analysis are given in this section briefly. Linguistic Inquiry and Word Count (LIWC) is one the most used and developed tools of APP and in section 2.1
has been reported. The next material based on NLP is called MRC and is a dictionary psycholinguistic in English. The last one is about Embedding techniques that represent words for text analysis, typically in the form of a real-valued vector that encodes the meaning of the word such that the words that are closer in the vector space are expected to be similar in meaning.
2.1 Linguistic Inquiry and Word Count (LIWC)
LIWC (Linguistic Inquiry and Word Count) is introduced by Pennebaker1996 and developed in years Pennebaker1999; Tausczik2010 as NLP tools for the psychological purpose. LIWC is a text analysis tool that provides statistical reports that are very useful in determining texts to aim for emotional and cognitive analysis of people. Since 2001, two updated versions of LIWC are introduced in 2007 and 2015. In each version some features were added, and table 2 show all features reported by LIWC and the deferences between the last two versions. LIWC reports consist of 91 features in 15 categories. This test could be done online in https://liwc.wpengine.com/.
|LIWC Dimension||Output Label||LIWC2015 Mean||LIWC2007 Mean||LIWC 2015/2007 Correlation|
|Words per sentence||WPS||17.40||25.07||0.74|
|Words 6 letters||Sixltr||15.60||15.89||0.98|
|1st pers singular||i||4.99||4.97||1.00|
|1st pers plural||we||0.72||0.72||1.00|
|3rd pers singular||shehe||1.88||1.87||1.00|
|3rd pers plural||they||0.66||0.66||0.99|
|Core Drives and Needs|
|Core Drives and Needs||drives||6.93|
|Informal Speech informal||2.52|
Mairesse2007 developed a method based on LIWC on the Essays dataset. The authors fed the Essays dataset to LIWC, and the outputs contain LIWC features with a personality trait label in the Big Five dimension. Since the LIWC queries are not free, Mairesse dataset deploys as LIWC features in most research. Mairesse developed a framework and is available in http://farm2.user.srcf.net/research/personality/recognizer, but it should be noted that it is a while that the framework did not support.
2.2 MRC Psycholinguistic Database
MRC is a publicly available machine usable dictionary that includes different (up to 26) linguistic and psycholinguistic attributes for 150,837 English words. Different semantic, syntactic, phonological, and orthographic details about the words have made it suitable for miscellaneous purposes of researches in psychology, linguistics and artificial intelligence. Word association data are also included in the database. The first version was introduced in 1981Coltheart1981 and the second and last version wilson_1988 is now available in https://websites.psychology.uwa.edu.au/school/mrcdatabase/mrc2.html with the details and statistics.
2.3 Embedding techniques
Any input should model to understand by computer and writings are no exception, so embeddings duty is that. The smallest meaningful segment of writing is words, and that is why fundamental is word embedding in this task. Typically, each word or token represent by a vector. So basic embedding is one-hot thus a dictionary of words generate then each word represented by a vector that only one cell’s value is one and other are zero, and vector size equals to dictionary size. In this representation, vectors are orthogonal, so there is no semantics and relation between words. Moreover, in large corpuses vector size get large and make need storage to save and handle it.
The problems and disabilities of one-hot vector generate a felt need for new embeddings. Word2Vec Mikolov2013; Mikolov2013a was the first word embedding able to map words to vectors considering to semantic. Word2Vec became the cornerstone of other embedding techniques with their facilities. FastText Bojanowski2017 and GloVe pennington2014glove
are examples in evolution word embedding techniques. All of the embeddings should be trained, so there are some pre-trained language models (PLMs) with differences in trained database and attitude of training (CBOW and Skip grams).
Transformers vaswani2017attention 111More information on minaee2020deep introduced in 2018, made a revolution in embedding techniques thus make more parallelism possible than other architectures (such as CNNs and RNNs). Hence, computers get able to train larger models, since large-scale Transformer-based PLMs appeared. The most well-known transformer-based PLM could be named BERT Devlin2018. Based on BBERT, numerous models arise by a different point of view; namely, RoBERTa Liu2019 (robust and larger), AlBERT lan2020albert (high-speed training and lower memory), DistilBERT sanh2020distilbert (40% smaller and 60% faster).
3 Methods (Overview)
In natural language processing, representation of input is the most important component, and the smallest part of a text is words. In recent years novel representations called pre-trained word embeddings have been got trends and make revolutions in text mining. APP is not unaffected, and novel methods based on word embedding techniques have appeared. In reviewing APPs, at first methods without pre-trained word embedding techniques describes, secondly pre-trained word embedding based methods details, and at the end methods with more than text input introduces.
3.1 PLM free APPs
Poria2013 proposed a combinational algorithm for detecting personality using LIWC and MRC on textual features. 81 LIWC features plus 26 MRC features extracted from the Essays dataset. The proposed EmoSenticSpace was a novel representation method on a graph of EmoSenticNet havasi2007conceptnet and fed to a blending algorithm. The output is a 100-dimensional vector and by ”bag of concept”, averaging done to vectors to represent a text. The authors trained five Sequential Minimal Optimization (SMO) classifier, one for each of the big five traits.
Verhoeven2013, proposed an ensemble model for recognizing personality from Facebook. The authors trained three SVM classifiers, the first one trained on Facebook. The second is trained on the Essays dataset, and as the meta classifier, the output of two classifiers is fed to the third SVM classifier.
Part of speech (POS) is a basic feature of texts. In Wright2014
, authors used POS & POS n-grams of texts beside bag of words, word sentiment, negations, and vocabulary size features to predicting personality. As a classifier, SVM is deployed in two formats; 2-class and 3-class. 2588 university students essays collected on the Big Five personality scale to evaluate the method.
LIWC consists of 19 features, and Tighe2016
tried to reduce feature size for the purpose of achieving better results on the Essays. In this way, Information Gain and Principal Component Analysis (PCA) feature reduction techniques have been examined. It is proved that some LIWC features do not have adequate information on Essays for APP.
The early word embedding models, e.g. Word2Vec, have some common practical problems that make using them hard and somehow restricted. The first one is unseen words; if a word does not see in the learning stage, make the model in trouble. The second problem occurs once you want to train a model, and there are too many parameters. Due to these problems, FLiu2016
proposed an embedding model to embed personality texts. C2W2S4PT (Character to Word to Sentence for Personality Traits) is a three-stage Bi-RNN based model; firstly, characters modelling; secondly, word modelling based on the first stage; third, sentence embedding using words representation using feedforward neural network. In figure4 the proposed architecture is illustrated. In Liu2017 the proposed C2W2S4PT evaluated on English, Spanish and Italian languages to proving language independency of C2W2S4PT.
In Zheng2019 research, to take advantage of huge unlabeled data, Pseudo Multi-view Co-training (PMC) chen2011automatic
is adopted, an effective Semi-supervised learning algorithm, to build a personality prediction model. To extract adequate linguistic features, both LIWC and n-gram, along with the Word2Vec word embedding technique, is trained on mypersonality dataset to predict personality through textual data. Figure8 illustrates the overall framework of method.
Personality2Vec Guan2020 is the name of a user-personality embedding technique through a user has generated texts. Semantic and linguistic features of texts constructs graph and a biased walk strategy has been proposed to divide users into groups with maximum similar personality users in the same group. As shown in fig 10, in the first linguistic and semantic features extract in order to pass into the learning part. Linguistic features are 103-dimension of LIWC and 10-dimension of special linguistic features, which have been proposed by the authors.
Paper Sun2020, deployed network representation learning (NRL) as the novelty and for the first time in APP. AdaWalk generates the graph of documents on personality datasets in two approaches: classification and regression. NRL presented node(words or token) by using SkipGram.
Another graph-based approach called personality graph convolutional networks (personality GCN) introduced by Wang2020_Encoding. The authors aimed to create a graph to model users, documents, and words by the core of the co-occurrence of words in a document. The weight of edges is calculated by TF-IDF for a document-word edge and pointwise mutual information (PMI) for a word-word edge. The classification layer is the last one, five classifier for Big Five traits. Figure 9 illustrate an overview of the three layers of GCN. It is worthwhile to say that words, users, and documents represented by a one-hot vector.
One of the most recent research in this area proposed five new APP methods: term frequency vector-based, ontology-based, enriched ontology-based, latent semantic analysis (LSA)-based, and deep learning-based (BiLSTM), by a contribution of enhancing accuracy Ramezani2020. These five introduced models used as the base to hierarchical attention network (HAN) ensemble model. The authors evaluated the methods on the Essays dataset and achieved enhancing accuracy on the ensemble method. The architecture of the proposed HAN stacking model is shown in figure 2.
3.2 PLM-based APPs
The first research that deployed a PLM to APP is done by Majumder2017 based on Word2Vec. This research deployed two approaches, Document-level using Mairesse features and Word-level using Word2vec and CNN model to achieve 300 dimension representation of words to model sentences and documents, respectively, by n-gram sigh of view. Both approaches’ representations concatenated to fed to a fully connected layer for classification (Figure 3 illustrate the architecture of the proposed method). Five models, one for each of five traits, trained on the Essays dataset.
2CLSTM is the name of the model proposed by Sun2018 that tries to learn structural features based on latent sentence group (LSG). At the first step, each word embedded into a 100-d dimension vector through GloVe pre-trained model. 2LSTM encodes the vectors into sentences that pass to the next section. The following sections have to model relationships of sentences, and LSG does this. Latent Sentence Group (LSG) is defined as a synthesis that consists of a number of sentence vectors which are closely connected in some coordinates. To do this, CNN networks deployed to learn 1,2,3-grams. Figure 7
illustrates the layers and schema of 2CLSTM. A dense layer and max-pooling layer follow immediately to generate the final vector to pass into a classifier which there was softmax.
In Darliansyah2019, a personality prediction methods by using sentiment of short texts is introduced by naming SENTIPEDE. The aim was to determine personality in Big Five by using textual features with sentiment features of short texts which the base was Twitter. The sentiment of text compute by Valence Aware Dictionary and Sentiment Reasoner (VADER) hutto2014vader, a rule-based framework for sentiment classification in English, to gain a label from positive, negative, or nonpartisan. Then, GloVe word embedding is deployed to vectorise words to pass to a CNN-LSTM model along with the sentiment labels. The case study of the paper was Uber and related tweets.
In introducing a new dataset, some base methods and evaluation of them by a standard dataset are necessary. FriendPersona is the name of a new dataset introduced by Jiang2020. In Jiang2020, five models have been developed, which names are: ABCNN (CNN with attention mechanism), ABLSTM (Bidirectional LSTM attention mechanism), HAN (Hierarchical Attention Network), BERT, RoBERTa. BERT and RoBERTa as a PLM are fine-tuned on both datasets, Essays and FriendPersona.
In paper Mehta2020, two approaches have been studied; personality prediction based on psycholinguistic features and language model features. In both approaches, features extracted from texts are fed to a classifier, SVM and MLP, to classify texts into personality traits. Mairesse, SenticNet cambria2018senticnet
, NRC Emotion Lexiconmohammad2013crowdsourcing, VAD Lexicon mohammad-2018-obtaining, and Readability222
A number of calculated readability mea- sures based on simple surface characteristics of the text. These measures are basically linear regressions based on the number of words, syllables, and sentences.are the extracted psychological features. On the other hand, BERT PLM is deployed as a language model features extractor.
Kazameini2020 is segmented documents into 250 tokens length sub-documents in order to feed to BERT(base) PLM. Based on the essays dataset used in this paper, documents length is 650 tokens on average, so four layers of [CLS] concatenated with 84 Mairesse features as features of a document. SVM is the classifier of the proposed method, which is trained on sub-documents in parallel mode like a bagged classifier, and at last, the final trait is predicted by majority voting. Figure 11 shows the authors proposed method.
One of the recent APP methods combined semantic and emotional features in order to determine personality trait from multi-text Ren2021. On the semantic side, BERT is deployed to vectorize texts and using a self-attention mechanism, sentence-level representation is generated. On the other side, SenticNet5 AAAI1816839 has extracted the sentiment of the sentences to map to a vector. Both vectors are concatenated and fed to a classification network. CNN, GRU and LSTM are different neural networks that trained to label personality trait.
SEPRNN (semantic-enhanced personality recognition neural network) Xue2021
is proposed with the goal of avoiding dependency to feature selection in APP and modelling semantic from word-level representations. GloVe plm is deployed to vectorize words, then a BiGRU model learned to extract left and right context of words, but since semantics did not consider. To capture the higher level of semantic representations from contexted textual data, vectors are fed to a fully connected network to text modelling of documents. In the end, a fully connected network with sigmoid activation function is adopted to learn a two-dimensional vector for binary classification of personality.An illustration of the proposed method is displayed in figure12.
Transformer-MD (multi-document Transformer) is the name of the method proposed by Yang2021. The core of the proposed method is to put together information in multiple posts to represent an overall personality for each user. Authors tried to solve two problems: post-orders bias into posts on personality and individually posts processing of a person to personality detection. In order to this, encoding each post by Transformer-MD allows access to information in the other posts of a user through Transformer-XL’s memory tokens that share the position embedding. For the cases of multi-trait personality detection, a dimension attention mechanism on top of Transformer-MD was set. The overview of the proposed methods is shown in figure 13.
3.3 Multimodal APPs
One of the first deep learning used methods in multimodal approaches deployed text in couple with authors information to achieve the writer’s personality Yu2017. According to the limitations and obstacles of pre-trained word embedding methods in 2017, word2vec, the authors trained their word embedding model based on skip-gram using the mypersonality dataset. In the trained model, modelling words position was not taken into account, so applying N-grams seemed the best approach. In order to apply this approach, two approaches were taken, CNN-based and Bi-RNN-based. Figure 5 shows the architecture of Yu2017. After the word modelling part, the author’s information is 7-D vector concatenate and goes on. Each of the five personality traits has been gotten trained in a neural network.
explored the use of machine learning techniques for inferring a user’s personality traits from their Facebook status updates. four kinds of numeric features:
Social Network features: 7 features related to the social network of the user: (1) network size, (2) betweenness, (3) nbetweenness, (4) density, (5) brokerage, (6) nbroker- age, and (7) transitivity.
Time-related features: 6 features related to the time of the status updates (we assume that all the times are based on one time zone): (1) frequency of status updates per day, (2) number of statuses posted between 6-11 am, (3) num- ber of statuses posted between 11-16, (4) number of sta- tuses posted between 16-21, (5) number of statuses posted between 21-00, and (6) number of statuses posted be- tween 00-6 am.
Other features: 6 features not yet included in the categories above: (1) total number of statuses per user, (2) number of capitalized words, (3) number of capital let- ters, (4) number of words that are used more than once, (5) number of urls, and (6) number of occurrences of the string PROPNAME— a string used in the data to replace proper names of persons for anonymisation purposes.
proposed a comprehensive view to APP multimodal approach that accompanied texts, avatars, and emojis. Pearson correlation, Text-CNN, and Bag-of-Word (BOW) clusters are the textual-based features extracted from the Weibo tweets collected in the research. Pearson correlation computed between words and the personality traits to selecting top 2000 words are strongly correlated to personality. In order to LIWC limited capability in representing users linguistic patterns in short and informal texts, 1,500 Chinese words and all the punctuations in the bag-of-words format clustered using the k-means algorithm and then the count the number of items within each cluster used instead of LIWC. As the last feature, a convolutional architecture called Text-CNN trained to models words in vector form in reference tokim-2014-convolutional model. The structure of the proposed algorithm is shown in figure 6
. As seen, each type of inputs lasts for a classifier ((Logistic Regression) to specify the trait. As the final step, a stacked ensemble algorithm, generalization-based ensemble method, attached to the classifiers as the final result classifier.
It is common to utilize acoustically features simultaneously with transcripts of speech to achieve the personality of people. In An2018
, four features demonstrated in order to predict personality, namely, acoustic-prosodic low-level descriptor features (LLD), LIWC, Dictionary of Affect in Language (DAL), and word embeddings. DAL features extracted using Whissell’s Dictionary of Affect in Language (DAL), and 19 features extracted in this research. In the word embedding part, Google’s pre-trained skip-gram vectors and Stanford’s pre-trained GloVe has been used. Two approaches have been adopted for modelling documents based on embedding vectors, averaging and LSTM neural network. One strange thing in the proposed method is that both of PLMs vectors are fed to the model and concatenated with three other features. Moreover, two strategies proposed by the authors, first, concatenate features and then five fully-connected layers end with five neurons as the classifier; second, each of five features fed to a three fully-connected layers block before concatenating and then fed to five neurons for classification similarly.
4 Evaluating methods
Every proposed method should be evaluated to prove its performance. In APP evaluations consist of two parts, dataset is the first and metric of assessment is the second part. Five datasets are available in text-based APP, and metrics vary on the concept of evaluation. This section consists of two parts; part one details the datasets and part two defines results of methods in datasets and metrics.
Each method should evaluate and compare it with other methods, and this requires a fair condition. To achieving fair compare, ground truths should be same including metrics and datasets. In this part, five datasets, Essays, MyPersonality, YouTube, FriendPersona, and Kaggle MBTI are benchmark in text-based APPs are introduced.
Essays (also called stream-of-consciousness essays) is the first and most cited text dataset in Automatic personality prediction. The dataset introduced by Pennebaker1999 that consist of 2468 anonymous essays in English annotated in Big Five scale. The dataset annotated in two modes; classification and regression. Thus each essay has two Big Five values; first, each trait has a binary value; second, each trait is real value in the aim of regression. The Essay mainly deployed in classification purpose and table 3 shows the number of essays in each trait. In should be notated that one row of data was an error and dismissed in the table’s values. Moreover, the distribution of essay’s components reported in table 4.
myPersonality333https://sites.google.com/michalkosinski.com/mypersonality is a collection of 250 anonymous Facebook users profile updates scored in Big Five by questioning users to answer them. Since 2018 creators, Stillwell and Kosinski stopped sharing and developing the dataset. There are some versions available on the internet that do not match on records but approximately it contained 9900 records. The myPersonality annotated in two forms, classification and regression, and table 5 & 6 illustrate the distribution of status updates of myPersonality in each form.
Youtube is the most popular video-sharing platform and has attracted many vloggers. YouTube dataset is a collection of 404 YouTube vloggers personality scores in big five. The dataset is recorded talking in front of webcam about a variety of topics and annotated using Amazon Mechanical Turk and the Ten-Item Personality Inventory (TIPI). As mentioned, YouTube dataset is a multimodal, video, speech, and transcript to text Biel2013; Biel2013text. Based on the aim of the article, speech transcripts to text Biel2013text is analyzed in this section. As the routine, distribution of traits has been shown in table 7 and textual elements details are shown in table 8.
The newest dataset introduced in the context of text-based APP is FriendPersona. This is developed on Friends TV Show Dataset444https://github.com/emorynlp/character-mining Chen2016 and 711 conversation extracted from. FriendPersona annotated by three experts and to make it binary, split from the median. The dataset could be found in github555https://github.com/emorynlp/personality-detection.
Kaggle MBTI666Available on https://www.kaggle.com/datasnaek/mbti-type/ has gathered 50 last posts through the PersonalityCafe forum in MBTI personality model. There are 8675 rows of data that each row represents a person. The dataset is started on cognitive functions by Carl Jung and finally personality tags done by Jungian Typology in MBTI.
In this part, firstly evaluation metrics used in APP introduced and then reported results of methods appeared in the division of datasets. Each following tables survey a dataset presented in the previous part.
4.3 Evaluation metrics
Precision, recall, accuracy, and F-measure are well-known Classification evaluation metrics are using in scientific reports. For calculating these measures, four concepts should be defined. TP, TN, FP, and FN are the notions and denotes to true positive, true negative, false positive, and false negative, respectively. These concepts make sense based on ground truth in confronting with the output of the system. There are other measurements for evaluating classification (Regression) methods such as RMSE (Root Mean Square Error), MAE (Mean Absolute Error), and Coefficient of Determination (). Still, most articles prefered the binary classification measures and reported in four first measures. The following equations are the calculation of the metrics.
F-1 score (Eq.4
) translates to harmonic mean of precision (eq.1) and recall (eq.2
) and that is why most studies report F1 score to make readers understand easier instead of reporting precision and recall.
As respect of lack of text-based personality prediction studies, researchers reported the results on their aspect without much comparison, and this makes variety in results measurement units. RMSE (eq. 5) and MAE (eq. 6) has been used in Carducci2018, Xue2018, however can not be compared cause of non comparable studies in same dataset.
For evaluating and making results comparative and comprehensive, the results of each dataset are given in separate tables. The Essays is the most popular dataset in APP, and the number of results is more than the others. According to the classification essence of Essays and MyPersonality datasets, the evaluations are done by F-measure and Accuracy metrics. Augmentation of APP methods have been accelerated more rapidly with the advent of deep learning, and the increase in the quality of results is evident in Essays datasets more obviously. As it is shown in table 9, the results of each method that powered the newer PLMs and novel deep learning method are achieved higher values compared with the previous method. The point that has to be considered is that some methods reported results insufficiently; thus, solely one evaluation metric is listed by personal opinion. Among the methods that do not use word embeddings and evaluated on the Essays dataset, Ramezani2020 is got better results. Since the MyPersonality dataset is not available by the creators anymore, it is not used in recent researches. However, Wang2020_Encoding achieved the best results, as it is shown in table 10, which does not apply PLMs for embedding.
As mentioned in Sec 4.1, FriedndPersona is the most recent APP dataset that was only introduced for the proposed method on Jiang2020 and are shown in table 12. But in comparison with reported results on Essays (table 9) the evaluations are acceptable because the proposed model achieved in range of 60% (63.01% on FriendPersona and 61.158% on Essays).
The evaluations on the Kaggle dataset are shown in table 13 and done using f-measure and accuracy metrics. The best-performed method is Khan2020
that deployed XGboost ensemble algorithm to gain high values on both metrics. To the rest methods, three reported accuracies and one reported f-measure only that does not have any subscription, so methods with the same reported metrics should be compared.
In the end, as it can be concluded from the tables, deployment of PLMs for words and documents embeddings in couple with deep neural network models makes a significant improvement in textual-based APPs. However, somehow, it seems that the hybrid and ensemble models (especially PLM-free) are achieving better results in their own class and could be a progressive area of research. Also, the APP research area suffers from a lack of golden standard datasets and evaluations. As the final recommendation for future work, introducing novel datasets labelled on more than one personality trait model, with more samples and different lengths of documents, can make the APP research area more attractive and competitive.
Automatic personality prediction (perception) (APP) system provides an opportunity to predict personality traits based on human behaviour manifestations, especially texts in this review. This paper is reviewed text-based APP methods since 2010 and reported results by five well-known benchmark datasets. Also, the framework of overviewed methods has been collected. The aim of this review is to give a general overview of the steps of getting meliorate of APPs to researchers in this field.
Acknowledgements.This project is supported by a research grant of the University of Tabriz (number S/806).
This project is supported by a research grant of the University of Tabriz (number S/806).
Conflict of interest
The authors declare that they have no conflict of interest.
5.1 Data availability
Data sharing not applicable to this article as no datasets were generated or analysed during the current study.