Recently we launched a new line of inquiry to examine and explore the potential of machine learning in statistically inferring social perceptions of a face image, such as personality traits and demeanors, to non-acquaintances of a particular social group delimited by race, gender, age, political belief, social value, etc. At the onset of this research, we made it very clear, as stated in , that our interests are in finding machine learnable correlations, if any, between facial appearances and social attributes of the observed and the observer, not in the interpretations of our findings in the realm of social sciences. It will be remarkable and even somewhat surprising if machine learning is found to be capable of acquiring human-like first impressions of total strangers, because even humans have difficulties in rationalizing their first impression of others’ social attributes and in precisely characterizing facial features that induce their perceptions.
We started our investigation on what we thought a relatively easy case: automated face-induced statistical inference on the propensity of law breaking 
. Law-breakers and law-abiding people constitute the two populations that are arguably furthest apart in social attributes and behavioral predisposition. In a multitude of social dimensions, such as self-control, trustworthiness, dominance and innocence, law breakers, particularly the violent type, tend to populate the perimeters of the multidimensional distribution. If correlation exists between facial appearance and innate personal traits, then it should not be a surprise that law-breakers and law-abiding people are statistically separable by automatic face classifiers built by supervised machine learning, as turned out in our earlier study. In this work, a sequel to, we take face-induced automatic inference of social attributes to what appears to be a more elusive task: classifying different styles of female attractiveness, which are either approved or not approved by certain disjoint subsets of observers.
Female attractiveness has been studied by researchers in psychology [24, 3, 8, 9, 18, 14, 16, 4, 19, 5, 11, 12]. But in all previous experiments human judges’ opinions were solicited. The prevailing view is that the perception of female beauty is primarily derived from cues of health and reproductive potential. Most papers on facial attractiveness of female focus on its physical and physiological characteristics [24, 23, 20, 11, 8, 9].
However, female attractiveness is also a social perception, as reflected by the cliche phrase “beauty is in the eye of the beholder”. A non-acquaintance female face can be judged unanimously as being physically beautiful, and yet different observers may associate this face with approval or disapproval connotations, which may vary in different cultural and/or historical contexts, using labels (or stereotypes) like pure, sweet, endearing, innocent, cute, … on one hand, or indifferent, pretentious, pompous, arrogant, frivolous, coquettish, … on the other. As these labels are loosely binary quantization of social attributes of trustworthiness, dominance, innocence, and extroversion, here is another case for the old, cross-culture belief that facial appearances are symptoms of innate traits and behavioral propensity.
Built upon our progress in automated statistical inference on the face-induced perception of the propensity of law breaking, albeit being controversial, we take the same approach of supervised machine learning, and examine whether a trained data-driven classifier can distinguish different styles of facial attractiveness of females as discussed above. This classification problem, to humans, is not any easier than separating law breakers from law-abiding persons. The perception of facial attractiveness is more complex than the already knotty matter of subjective taste, for it is also a compound proxy for personalities and social values of the observed and the observer.
The remainder of this paper is structured as follows. In Section II, we detail a semi-automatic process of collecting and preparing sample data, which is necessary for setting up the experiments, with control variables of race, attractiveness, age and nationality, to assess the potential of the supervised machine learning in statistical inference on sociopsychological perceptions of female faces. In Section III, we teach a CNN classifier, using almost 4000 attractive face images of young Chinese females, each of which is attached to one or more sociopsychological label(s), to predict Chinese young men’s perception of a young woman in terms of her personality traits and demeanors. The experimental results turn out to be quite encouraging, and we offer and invite discussions and interpretations. Section IV concludes the paper.
Ii Data Preparation
In the interests of the availability of a large sample set for training and testing, and also of having our experiments controlled for attractiveness, race, gender and age, our research subjects are restricted to young Chinese females that are considered by mainstream Chinese to have attractive faces. Controlling the variable of attractiveness is crucial for the validity of our study because it strongly influences a non-acquaintance’s first impression of the female.
We run the Baidu image search engine with key words beautiful/pretty/attractive girls/young women, to select the sample images. The Baidu image search engine relies mainly on captions, labels and blogs associated with the images to produce the query results. The selected face images are divided into two subsets, denoted by and , corresponding to the following two different types of demeanors and/or attitudes.
The set contains the images returned by the Baidu image search engine when it is requested to find attractive girls but with qualifiers:
sweet, endearing, elegance, tender, caring, cute.
In contrast, the set contains the images returned by the Baidu image search engine when it is requested to find attractive girls but with qualifiers:
pretentious, pompous, indifferent, coquettish.
Note that the results of the qualifier-based query with the Baidu image search engine cannot be directly used for our purpose. The key words attached to the images cannot be taken at face value. For instance, an image associated with comment “a girl beautiful but not sweet” can be a match for a query with combined key words “beautiful sweet girl”, which erroneously places an instance in subset into subset
. To correct this problem, we use a simple text pattern matching program to detect such cases where the adjective is after a negation prefix. Finally, all images passing the software selection and screening process are examined manually by male Chinese graduate students to detect selection errors due to more difficult semantics of the image-associated text such as sarcastic remarks. Even though we have made every effort possible within our resources to ensure the data quality, there may be a very small probability of label noises.
Also, there is a caveat for anyone who wants to interpret our results down the road. Not all labels of social perceptions attached to the sample images are given by non-acquaintances. A small percentage of labels are apparently the results of observing the person in question for a period of time, hence they may carry some more information than others.
Through the above semi-automatic data gathering and screening process, we collect 3954 photographs of attractive young Chinese females, among which 2000 belong to subset and 1954 belong to subset . The selected test photographs are processed by cloud service Face++ . Face++ identifies and extracts the face(s) out of each photograph. The resulting face images are scaled to resolution to be ready for supervised machine learning. Figures 1 and 2 list some sample face images in both subsets and .
To our best knowledge, the internet users who attach the captions, labels and comments to the photographs of attractive Chinese females posted on line are predominantly young Chinese males. As such, the two classes of female face images in and reflect esthetic preference and value judgments that prevail among young males in contemporary China. Therefore, if it can classify between and based on face images with sufficiently high probability, then the supervised machine learning once more demonstrates its potential in drawing statistical inference on social attributes and behavioral propensity from face images. Indeed, the significance of this study lies in collecting empirical evidences for the capability of machine learning, or lack of it, to perform sociopsychological cognition tasks, which appears to be at a degree higher and subtler than the determination of biometric attributes (e.g., gender, race, age and unique facial signatures for identification) [6, 15, 13, 10].
In the resulting sets of sample images, each of the key words used in the image queries can be considered as vector quantization of three-dimensional social attributes (dominance, trustworthiness, innocence). Specifically, we quantize each query key word in the attribute space of (dominance, trustworthiness, innocence) as in TableI.
Figure 3 plots the sample distributions of subsets and in the above three-dimensional space of social perceptions. The immediate observation is that the two subsets of attractive young Chinese females have strong negative correlation in the vector of social perceptions, or at the opposite ends in terms of perceived personalities, demeanors and attitudes. Such polarized social perceptions are apparently caused by facial appearances because the face is the main object in all selected images.
Iii Experiments and Results
In this section, we examine whether the supervised machine learning can acquire the personal taste and social value of the mainstream Chinese male internet users who label/caption the sample images of attractive Chinese females. If yes, this adds a hint of evidence for the capability of machine learning for social cognition tasks.
Although the face-induced sociopsychological impressions of attractive Chinese young females left on mainstream Chinese young males appear to be quite consensual in the observers group, these young men have difficulty in pinpointing exactly what facial features and head poses contribute to their perceptions. We show our sample images in both subsets and
to 22 male Chinese graduate students, their perceptions of these face images agree with the labels (the mainstream stereotypes) given by the internet users. But when being pressed to explain their judgments with specifics, they all give vague answers like “I just feel this way”. In this case, the most appropriate tool of machine learning for automatic inference on sociopsychological impressions of faces is the Deep Convolutional Neural Network (CNN), which does not need to operate on explicit features like in SVM, KNN and other machine learning methods.
We choose one of the well-known CNN architectures widely known as AlexNet , and train it to identify which style of female attractiveness, represented by or , a face image likely belongs to; namely, predicting the social perception of an attractive Chinese female, whether she will be perceived favorably by those young Chinese men who prefer a traditional type of personality traits, demeanor, and attitude. Figure 4 illustrates the architecture of the AlexNet tailored to our problem. The AlexNet contains eight layers with weights; the first five are convolutional and the remaining three are fully-connected. We only borrow the architecture of the AlexNet, and train all the parameters of every layer in AlexNet for the binary classification task of discriminating between and .
In our experiments, 80% of the face images in our data set are used for training, 10% of the face images for validation, and the remaining 10% of face images for testing. This training process is repeated ten times for different random seeds. Table II tabulates the accuracy, false positive rate and false negative rate of our neural network for statistical inference on styles of female attractiveness. The ROC curve for our data-driven face classifier is also presented in Figure 5.
|Accuracy||False Positive Rate||False Negative Rate|
As indicated by Table II and Figure 5, the CNN face classifier performs well in inference on sociopsychological perception of attractive Chinese young females. This is quite remarkable given the fact that even human observers (male Chinese graduate students) have difficulty with rationalizing their sociopsychological perceptions of the tested female face images. Unfortunately, the CNN face classifier, effective as it is, does not offer any explanations for its success either.
One comment that is frequently made by Chinese male graduate students to justify their approval of some female face images is “look natural”. This leads us to suspicion that the CNN face classifier uses heavy facial makeups as an important cue in its decision. To verify our hunch, we remove colors and use only grey scale face images to train and test the CNN classifier. The deprivation of colors will reduce the effects of facial makeups because the use of colors is critical in female cosmetics. Our experiments with grey scale face images turn out to be inconclusive. The accuracy of the CNN face classifier decreases by only 6% without color information. Table III tabulates the accuracy, false positive rate and false negative rate of our neural network for grey scale face images. The ROC curve for grey scale face images is also presented in Figure 6.
|Accuracy||False Positive Rate||False Negative Rate|
Other symptoms of heavy makeups include overuse of color saturated lipsticks, mascara, eyebrow pencils, etc. Therefore, we compare the distributions of color contrast and saturation for face images in and . Figure 7 presents the histograms of color contrast and saturation for and after normalizing measurements to range
. The mean and standard deviation of color contrast and saturation forand are tabulated in Table IV. As expected, the color contrast of is on average 13.84% smaller than that of ; the color saturation of is on average 4.85% smaller than that of ; Interestingly, the saturation histogram for has two distinctive spikes at the two extreme ends, indicating that women in subset tend to use thick black (saturation = 0) or/and saturated (vivid) colors. We also note that the variations of color contrast and saturation of face images in are appreciably greater than those of . This apparently reflects the traditional esthetics preference for and social value of naturalness in Chinese culture. Such quite subtle cues in sociopsychological aspect seem to be picked up by our CNN method, otherwise it would be difficult to explain the good performance of the proposed face classifier.
Finally, to safe guard against possible risk of data overfitting by our neural network for statistical inference on sociopsychological perceptions of attractive female faces, we conduct the fault-finding experiment proposed in , in seeking for counterexamples. We randomly label the faces in our sample set as positive and negative instances with equal probability, and redo the above experiments of classification after retraining the CNN with the artificially labeled samples. The experimental results show that the randomly generated positive and negative instances cannot be distinguished at all by the CNN method; the average classification accuracy, false positive rate and false negative rate are all about 50%. Based on these findings, we are confident that the good accuracy of the proposed CNN face classifier is not due to data overfitting; otherwise, given the same sample size, the trained classifier would also be able to statistically separate the randomly labeled positive instances from the negative ones, rather than guessing at random as found in the above experiment.
This work is a sequel to our earlier paper . We drive the research on face processing, analysis and recognition beyond the tasks of biometric-based identification, and try to extend it in the direction of automatic statistical inferences on sociopsychological perceptions, such as personality traits and behavioral propensity. By the reported case study on face attractiveness of young Chinese females, we demonstrate once again the potential of supervised machine learning in face-induced social computing and cognition.
-  Face++. http://www.faceplusplus.com.
M. S. Bartlett.
Face image analysis by unsupervised learning and redundancy reduction. PhD thesis, Citeseer, 1998.
-  M. R. Cunningham. Measuring the physical in physical attractiveness: Quasi-experiments on the sociobiology of female facial beauty. Journal of personality and social psychology, 50(5):925, 1986.
-  M. R. Cunningham, A. R. Roberts, A. P. Barbee, P. B. Druen, and C.-H. Wu. ” their ideas of beauty are, on the whole, the same as ours”: Consistency and variability in the cross-cultural perception of female physical attractiveness. Journal of Personality and Social Psychology, 68(2):261, 1995.
-  B. Fink, K. Grammer, and P. J. Matts. Visible skin color distribution plays a role in the perception of age, attractiveness, and health in female faces. Evolution and Human Behavior, 27(6):433–442, 2006.
X. Geng, Z.-H. Zhou, and K. Smith-Miles.
Automatic age estimation based on facial aging patterns.IEEE Transactions on pattern analysis and machine intelligence, 29(12):2234–2240, 2007.
-  A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012.
-  J. H. Langlois and L. A. Roggman. Attractive faces are only average. Psychological science, 1(2):115–121, 1990.
-  J. H. Langlois, L. A. Roggman, and L. Musselman. What is average and what is not average about attractive faces? Psychological science, 5(4):214–220, 1994.
-  G. Levi and T. Hassner. Age and gender classification using convolutional neural networks. In , pages 34–42, 2015.
-  H. C. Lie, G. Rhodes, and L. W. Simmons. Genetic diversity revealed in human faces. Evolution, 62(10):2473–2486, 2008.
-  A. C. Little, B. C. Jones, and L. M. DeBruine. Facial attractiveness: evolutionary based research. Philosophical Transactions of the Royal Society B: Biological Sciences, 366(1571):1638–1659, 2011.
-  E. Makinen and R. Raisamo. Evaluation of gender classification methods with automatically detected and aligned faces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(3):541–547, 2008.
-  J. Manning, D. Scutt, G. Whitehouse, and S. Leinster. Breast asymmetry and phenotypic quality in women. Evolution and Human behavior, 18(4):223–236, 1997.
B. Moghaddam and M.-H. Yang.
Gender classification with support vector machines.In Automatic Face and Gesture Recognition, 2000. Proceedings. Fourth IEEE International Conference on, pages 306–311. IEEE, 2000.
-  A. P. Møller, M. Soler, and R. Thornhill. Breast asymmetry, sexual selection, and human reproductive success. Ethology and Sociobiology, 16(3):207–219, 1995.
J. Park and I. W. Sandberg.
Universal approximation using radial-basis-function networks.Neural computation, 3(2):246–257, 1991.
-  D. Perrett. Facial shape and judgements. Nature, 368:17, 1994.
-  G. Rhodes, C. Hickford, and L. Jeffery. Sex-typicality and attractiveness: Are supermale and superfemale faces super-attractive? British Journal of Psychology, 91(1):125–140, 2000.
-  S. C. Roberts, A. C. Little, L. M. Gosling, D. I. Perrett, V. Carter, B. C. Jones, I. Penton-Voak, and M. Petrie. Mhc-heterozygosity and human facial attractiveness. Evolution and Human Behavior, 26(3):213–226, 2005.
K. Sobottka and I. Pitas.
A novel method for automatic face segmentation, facial feature extraction and tracking.Signal processing: Image communication, 12(3):263–281, 1998.
C. E. Thomaz and G. A. Giraldi.
A new ranking method for principal components analysis and its application to face image analysis.Image and Vision Computing, 28(6):902–913, 2010.
-  R. Thornhill and S. W. Gangestad. Human facial beauty. Human nature, 4(3):237–269, 1993.
-  L. Van Valen. A study of fluctuating asymmetry. Evolution, pages 125–142, 1962.
-  X. Wu and X. Zhang. Automated inference on criminality using face images. arXiv preprint arXiv:1611.04135, 2016.