Crowd analysis is a phenomenon of great interest in a large number of applications. Surveillance, entertainment and social sciences are examples of fields that can benefit from the development of this area of study. Literature dealt with different applications of crowd analysis, for example counting people in crowds [Chan2009, cai2014], group and crowd movement and formation [Solmaz2012, Zhou2014, Ricky:15, jo2013review] and detection of social groups in crowds [solera_2013, Shao2014, Feng2015, Chandran2015]. Normally, these approaches are based on personal tracking or optical flow algorithms, and handle as features: speed, directions and distances over time. Recently, some studies investigated cultural difference in videos from different countries. Chattaraj et al. [CHATTARAJ2009] suggested that cultural and population differences could produce deviations in speed, density and flow of the crowd. Favaretto et al. [Favaretto:2016] discussed cultural dimensions according to Hofstede analysis [Hofstede:2011] and presented a methodology to map data from video sequences to the dimensions of Hofstede cultural dimensions theory.
In this paper, we propose to detect crowd-cultural aspects based on the Big-five personality model (or OCEAN) [costa07] (Brazilian version) from NEO PI-R [costa92] using individuals behaviors automatically detected in video sequences. For this, we used the NEO PI-R [costa07] that is the standard questionnaire measure of the Big-Five Factor Model. The questionnaire provides a detailed personality description that can be a valuable resource for a variety of professionals. We firstly selected NEO PI-R items related to individual-level crowd characteristics and the corresponding factor, as described later in this paper. For example: ”Like being part of crowd at sporting events” corresponding to the factor “Extroversion”. More details about personality models are discussed in Section II.
After the NEO PI-R items selection (related to crowds characteristics), we propose a way to map data extracted from video sequences to Big-Five parameters, as described in Section III.
Since there are different distributions of each of the Big-Five factors in different countries [costa07], we hypothesize that it would be possible to detect cultural differences from videos processing crowd behavior from different countries. This discussion is addressed in Section IV. Conclusions and future work are presented in Section V.
Ii Related Work
This section discusses some topics concerned with personality and also associated with crowd simulation.
Personality may be labeled as deep psychological individual level trait [cattell50]. Trait is an inference made after observed behaviors that seeks to explain its regularity [hall98]. In general, researchers agree that there are five robust orthogonal traits which effectively matched personality attributes [digman90], known as the Big Five: Openness to experience (“the active seeking and appreciation of new experiences”); Conscientiousness (“degree of organization, persistence, control and motivation in goal directed behavior”); Extraversion (“quantity and intensity of energy directed outwards in the social world”); Agreeableness (“the kinds of interaction an individual prefers from compassion to tough mindedness”); Neuroticism (how much prone to psychological distress the individual is) [lordw07]
. The development of the Big Five personality model has its roots in the work by Allport and Odbert (1936) who tried to identify individual differences extracting relevant words in the Webster’s Unabridged Dictionary. They worked with the hypothesis that the most important individual differences would be coded in language, since as they are the most important, there would be an evolutionary necessity to communicate it. Although Allport and Odbert (1936) found 4.500 words which referred to generalized and stable personality traits, their technique couldn’t originate few personality traits which explained most part of behaviors variance.
Raymond Cattel is commonly referred as the one who developed the methodology which permitted the objective grouping of hundreds of trait descriptors in a set of higher level factors [digman90]. Cattell [cattell48] developed a taxonomy of individual differences that consisted of 16 primary factors and 8 second-order factors. Nevertheless, attempts to replicate his work were unsuccessful [fiske48] and researchers agreed that only the 5-factor model matched his data, originating the Big Five personality model.
The NEO PI-R [costa92] is one of the most used instrument based on the Big Five personality theory. It assesses the normal adult personality and is internationally recognized as a gold standard for personality assessment. One of its advantages is that it further specifies six facets within each personality trait and have data from several countries which easily allows cross-cultural comparisons [McCrae2002, McCrae05]. Although the current empirical evidence matching individual level traits, such personality and crowd behavior is not strong (one of the few examples is [Barry97]), the Big-Five personality model is widely used to model computational crowd simulation [kaup06, Durupinar08, Guy11]. The model allows to simulate a crowd with individual level parameters based on the expected behaviors of the agents.
Recently, research has shown that digital records can be an effective tool in predicting personality traits. Facebook likes, for example, can predict the actual score of the Big-Five personality model, especially the Openness trait [Kosinski:2013], providing roughly as much information as the self-reported personality test score itself. This makes room for the use of computational methods in predicting an individual’s personality as effectively as through the analysis of self-reported scores. A computational method to assess personality score can be also useful since there are issues concerning traditional self-report techniques: 1) individuals may deceive themselves and unintentionally distort their ratings of socially desirable traits in a positive direction [paulhus:1992]; 2) individuals can fake their responses to personality measures, especially in contexts which the test is used as a selection criterion, such as in job interviews; 3) individuals can distort their answers in different levels and ways, making it harder to apply a general statistical correction which serves equally to everyone [oh:2011].
One effective alternative to the self-report method is the observer ratings of personality (i.e., acquaintances, friends, colleagues). A meta-analysis has shown that observational rating provides substantial incremental validity over self-reports of personality [oh:2011]. One of the possible reasons for it is that self-reports assess the internal dynamics of an individual, whereas observer ratings analyze the behavioral performance. As the behavior is a better predictor of the future performance than the inner dynamics of an individual [JASP1355]
it might be the reason of the better predictive validity. Therefore, we propose that it is possible to predict facets of personality traits of individuals through computer vision of crowd behavior as effectively as through the self-report method and observer ratings such as a collegue or a friend. The rationale behind this proposal is that since observer ratings might be as valid as the self-report, computer vision might be effective as well - since the behavioral component is being analyzed and not solely the inner dynamics of an individual. One example is the way we can successfully predict players’ personality scores through behavioural cue of their avatars in virtual worlds and games[Yee:2011, Yee:2011-2].
Concerning cultural simulation, Lala et al. [Lala2012] introduced a virtual environment that allows the creation of different types of cultural crowds. The crowd parameterization is based on the cultural dimensions presented by Hofstede [Hofstede:1991]. The work proposed by Kaminka [Kaminka:2011] presents data that aim to differentiate populations with regard to their behavior of movement in crowds. Cultural parameters are proposed and analyzed in videos from different countries, for later comparison. Some of the analyzed parameters are: speed, personal space, collision quantities and population flow.
In this paper, the idea is to map parameters from individual behaviors (automatically detected from video sequences of different countries) to generate a Big-Five personality model score (OCEAN) [costa07] for each of them. In this sense, our contribution is a model based on a set of equations that handle the individual parameters related to crowd behaviors obtained from videos and mapped to crowd-related Big-Five personality traits, generating profiles of each individual/analyzed video. Since personality differences in the Big-five model between countries are established in the literature [McCrae2002, McCrae05], one can compare each specific result extracted from the video with the related country/cultural score.
Iii The Proposed Approach
Our model presents two main steps: video data extraction and cultural analysis. The first step aims to obtain the individual trajectories from observed pedestrians in real videos. Using these trajectories, we extracted data that are useful for the second step, that is responsible for the personality and cultural analysis.
Iii-a Individuals Data Extraction
Initially, the information about people from real videos is obtained using a tracker [Bins2013] to recover people trajectories. The features are following described. We compute firstly the geometric information for each person at each timestep: i) 2D position (meters); ii) speed (meters/frame); iii) angular variation
(degrees) w.r.t. a reference vector. In addition, three other features are also computed: iv) collectivity , v) socialization and vi) isolation levels . These features were chosen because two reasons: Firstly, they are strongly related with the questions concerned with groups activities in Neo-Pi survey [costa07]. The second reason is the theory behind socialization/isolation that easily can be represented through geometric data (positions and distances), and collectivity that has been already explored in the context of crowd behaviors detection [Zhou2014].
To compute the collectivity affecting one individual from all individuals in his/her social space (as presented in [Favaretto:Sib:2016]), we used Equation 1:
where the collectivity between two individuals and is calculated as a decay function of , considering and respectively the speed and orientation differences between the two people and and are constants that should regulate the offset in meters and radians. We have used and . So, values for are included in interval . is the maximum collectivity value when , and is empirically defined as decay constant. Hence, is a value in the interval .
To compute the socialization level
we use a classical supervised learning algorithm proposed by Moller[Moller93]
. The artificial neural network (ANN) (illustrated in Figure1) uses a Scaled Conjugate Gradient (SCG) algorithm in the training process to calculate the socialization level for each individual .
As described in Figure 1, the ANN has 3 inputs (collectivity of person , mean Euclidean distance from a person to others and the number of people in the Social Space111Social space is related to meters [hall98]. according to Hall’s proxemics [hall98] around the person
). In addition, the network has 10 hidden layers and 2 outputs (the probability of socialization and the probability of non socialization). The final accuracy from the training processes was 96%. We used 16.000 samples (70% of training and 30% of validating). These samples were obtained from the 25 initial frames from each of the videos from our dataset. The remaining frames were used to test the ANN as described in SectionIV.
The ground truth (GT) was generated as follows: Firstly, we define if a person has a high socialization level based on Hall’s proxemics, calculated according to the Equation 2:
where is the number of individuals in the social space around the person and is the number of individuals in the analyzed frame. If , we considered this person as a “social” person, otherwise the person is considered “not social” in the training processes. Secondly, we proceed a visual inspection manually correcting false positives or false negatives in comparison to our personal opinion. Using this GT and the neural network we evaluate for each individual at each frame, for each video in the test group.
Once we get the socialization level , we compute the isolation level , that corresponds to its inverse.
Finally, for each individual in a frame of a certain video , we will have a features vector . Then, computing the average for individual , for all frames of a video , we will have vector for each person .
In this paper, we are interested about mapping the features vector from each individual in a specific video to OCEAN dimensions, detailed in next section.
Iii-B Mapping crowd features in Cultural Dimensions
Our goal is to map data from to , where the last one is related to the Big-Fve dimensions (or OCEAN) for each individual for a certain video and described as a features vector: .
Therefore, in our method is computed based on NEO PI-R. With human beings, OCEAN is calculated based on their answers to the full version of NEO PI-R, with items. Our goal is to find out NEO PI-R “answers” for each individual in the video sequence, based on their features (). So, we have proposed a series of empirically defined equations to map individual and crowd characteristics (in video sequences) to OCEAN cultural dimensions.
As stated before, the complete version of NEO PI-R has items. Firstly, we selected 25 items from NEO PI-R inventory that had a direct relationship with crowd behavior. From the 25 items selected, 18 (72%) are from Extroversion, 3 (12%) are from Neuroticism, 2 (8%) are from Agreeableness, 1 (4%) is from Openness and 1 (4%) is from Conscientiousness. One example of items presented in NEO PI-R is “1 - Have clear goals, work to them in orderly way” and possible answers are in the interval [0;4] which respectively represent: Strongly Disagree, Disagree, Neutral, Agree and Strongly Agree.
Our proposal is to answer these 25 items (see Table I) for each individual at each frame in the video through the Equations on the right in Table I. For example, in order to represent the item “1 - Have clear goals, work to them in orderly way”, we consider that the individual should have a high velocity and low angular variation to have answer compatible with 4. So the equation for this item is . In this way, we empirically defined equations for all 25 items, as presented in Table I.
Once all questions (in the interval ) have been answered for all individuals , we will have for each frame . We computed the average values to have one vector per video.
As already mentioned, NEO PI-R items answers vary from to . We converted the values obtained in
in one of the 5 score possible options (0, 1, 2, 3 and 4) by simply normalizing the answers in 5 uniformly distributed levels, since we know the maximum level for each item at each video. We called this normalized vector as. In NEO PI-R definitions, some questions should invert the values, because an item score 4 (Strongly Agree) can represent a high value of Extroversion or low, depending on the question. For example, let us analyze questions 4 and 16. A score=4 to both of them represents completely opposite answers in terms of sociability. So, to get the correct values, we applied a factor to the questions which score should be inverted: .
In addition, in NEO PI-R definition, each of the questions are associated to one of the Big Five dimensions, as shown in next equations:
where represents the percentage of questions from the total, in each dimension (O, C, E, A and N), respectively 4%, 4%, 72%, 8% and 12%, as explained previously.
Once we get the OCEAN values of each person, we calculate the OCEAN of the video by the mean of people’s OCEAN. In a similar way, the OCEAN of a country is the mean of videos from that country. In the next section we present some obtained results of our method.
Iv Experimental Results
In this section we discuss some results obtained with our approach. We evaluated our method in a set of videos from countries ( from Brazil, from China, from Austria and from Japan). These videos, with a duration varying between 100 and 900 frames, were collected from different public databases available on the Internet, such as [Zhou2014, Caviar:2016, Rodriguez:2011]. Firstly, we get the OCEAN of each individual in the scene (Figure 2 shows some examples). In Figure 2 (a) we can observe the higher E that was found in an individual, part of a group of people, while the opposite happens in (b) when lower E was computed for individual alone and far from the others.
Same kind of analysis can be done for images (c) and (d) relating to their collectivity (higher and lower respectively) as described in Equation 1. Although it is more difficult to visual inspect the dimensions O, C and N we present the qualitative results. For example in Figure 2 (e) the highlighted individual has lower angular variation in comparison to all others (higher O value), while in (f) this is the individual with higher angular variation, consequently having lower value of O. In addition, in Figure 2 (b) we obtained the higher value of N, since it is dependent of the inverse of collectivity and socialization. Once the individual OCEAN values are computed, we get the mean OCEAN value for each video. The country’s OCEAN, in turn, is calculated by the average OCEANs of that country’s videos.
Figure 4 shows the results obtained by the country Brazil in all OCEAN dimensions, in comparison with the literature [costa07], considered as ground-truth in our approach. It is interesting to highlight that results achieved for this country showed the higher accuracy, when compared to the other countries (see the Figure 3). This was the country with more available videos to be processed in our method (9 videos), in comparison with other countries.
In addition, we computed the perceptual error when accumulating each dimension from all videos and compared with literature for those Countries. Figure 6 shows such errors and also indicates that the presented error of dimension E has lower value; that is an interesting observation since this was the dimension that had more questions to be analyzed, as shown in Equations 5, 6 and 7.
In terms of cultural aspects of individuals in the videos, Table II shows the countries that get the higher and lower values in each dimension, according to our approach. For example, Brazil is the most extrovert country, while the less neurotic is Japan.
According to previous work, another classical cultural dimension is proposed by Hofstede [Hofstede:2011]. In a recent paper, Favaretto et al. [Favaretto:2016] presented the cultural aspects of people in video using Hofstede’s cultural dimensions theory (Figure 5). We compared our error using Big-Five (Figure 6) with this method, when using Hofstede’s.
The accuracy of each approach (OCEAN and Hofstede) can be found in terms of the mean difference percentual when compared with the literature results, considering all dimensions among all videos. With an average error of 30% from the results presented in literature, the OCEAN method proved to be more promising than Hofstede (with an average error of 53%) for culturality mapping.
It is important to note that the mapping to OCEAN dimensions was empirically defined through equations using data extracted from computer vision. NEO PI-R measured these dimensions by considering a different type of information (subjective responses of individuals collected through questionnaires).
In this sense, it is possible to affirm that, even with few videos used, the results obtained with the proposed approach are coherent with NEO PI-R results and more effective if compared with Hofstede dimensions. The factor Extroversion (E) is the one that seems to be more predictable with our model. Probably because this factor comprehends the majority of items related to crowd behaviors.
In this paper we described a way to map equations to compute individual-level traits from video sequences, based on individuals and groups features. Our model computed, from video sequences, OCEAN personality traits and compared with data from different countries existent in the literature. In addition, we compared with some previous work that computed Hofstede dimensions using a similar approach. We believe the results are promising and video sequences can be used to detect crowd cultural aspects.
For our future work we intend to validate our model asking participants how much they agree with the assigned score of each item in the Big-Five questionnaire that resulted in our model for individuals with high scores in selected videos. By doing this we can compare human score with computer generated score of the same videos.
We also intend to increase our set of video data. Both aspects, number of countries and the among of videos from each of them, should be considered. One of the major difficulties of this work was to find a suitable set of videos to perform the experiments.
In addition, we intend to make video-recordings of group situations where each individual presented in the video has previously evaluated OCEAN scores. For this, one plausible option is evaluate our method with the SALSA dataset [Salsa:2016], which provides Big-five personality traits for a group of people in video sequences.
We may thus have another evidence of the validity of the presented model. In addition, we plan to create a new model comprehending different psychological domains related to crowd characteristics that have documented cultural differences: extraversion from the big-five model [costa07], Hofstede’s collectivism [Hofstede:2011], Hall´s personal space [Hall:1990], fundamental diagram [CHATTARAJ2009] and the subjective pace of time [Levine:1999].