Automated detection of facial expressions has gained significant importance in the recent years especially with regards to the design of real-time security surveillance systems, internet-based social networking applications [Al-modwahi et al.2012] and human computer interaction systems [Lajevardi and Wu2012]. The primary challenges for automated facial expression detection include variations introduced by pose, lighting, distortions, expression and occlusions. While image filtering techniques aid equalization of lightening and distortions, Eigen-value decomposition of faces (Eigen-faces) followed by Isomap clustering have been well-known to cluster variations in pose [Li et al.2005]. Additionally, several supervised classification algorithms and publicly available data bases [Chavan and Kulkarni2013]
have shown significant success in classifying facial features, skin texture and basic expressions such as fear, sadness, happiness, anger, disgust, surprise. Most of the existing texture-based expression detection algorithms[Chavan and Kulkarni2013]
rely heavily on facial feature extraction and classifier training, and thereby incur significant computational complexity in the training phase. In this work, we propose a novel network-based clustering algorithm that is capable of separating the marginally classifiable expression faces from the easily classifiable ones. This method has two-fold advantages. First, this method can be used to reduce the overall computational time complexity for facial expression detection in a particular test data base of faces by subjecting only the faces with marginally classifiable expressions to complex feature-based classification. Second, the network-based metrics can be used to detect the most significant faces in the training data that are vital for feature-based expression classification tasks. Such network-based identification of most significant training image set has not been done in existing works so far. Identification of the most significant training faces can improve the existing accuracies in facial expression classification on facial test data sets.
The existing facial expression detection algorithms can be broadly categorized into two categories: holistic methods [Lonare and Jain2013] that focus on features of the full face, and geometric methods that depend on important parts of the face such as eye lids, eye brows, lips, nose etc. for expression detection [Ekman and Rosenberg1997] [Ekman and Friesen1971]. The first category of methods focus on pre-defined template matching (active shape models)[Hemalatha and Sumathi2014]
or extraction of Eigen-face descriptors followed by clustering using neural networks[Turk and Pentland1991] [Agarwal et al.2010]Punitha and Geetha2013]Hemalatha and Sumathi2014]. The second category of methods rely on the extraction of gray-scale and color features corresponding to facial features (group of edges) [Hemalatha and Sumathi2014], texture, and changes in eye-lids, eye-brows, nose, lips, wrinkles and bulges using local binary patterns (LBP) [Ojala et al.2002], optical flow [Dehkordi and Haddadnia2010] and pyramid extension of the histogram of gradient (PHOG) descriptors [Bosch et al.2007] [Dalal and Triggs2005]. For various classifier and training data sets, existing expression classification accuracies typically range between 50-95% [Hemalatha and Sumathi2014] [Lajevardi and Wu2012] while computation time can range from 4.5 seconds to a few hours. The proposed method aims at significantly reducing the computation time and improving expression classification accuracies in data bases with a large number of faces. In this work, two classification tasks are performed that include classification of images with facial occlusions and classification of faces with happy emotion, respectively. Unsupervised classification requires only 2 training images for cluster identification with a run-time of less than 1 second per image in a 2.6 GHz 2GB RAM Laptop.
2 Proposed Method
In this work, we focus on two specific binary facial expression classification tasks. The first task involves classification of faces that have occlusions in the eye region, such as glasses, from faces without occlusions. The second task involves classification of faces with a smile from the non-smiling faces. Both tasks are challenging due to variations in pose and lighting angles. Some examples of the two binary classification tasks are shown in Figure 1.
The data set used to analyze the expression classification performance are taken from the AT&T Cambridge Laboratories data base [Laboratories2002], which contains 400 facial images of dimension [112x92] pixels each, with varying expression, pose and lighting angles from 40 subjects with 10 images per subject. For our analysis we manually annotate the expressions in 80 facial images corresponding to the first and tenth image per subject for the 40 subjects. For classification Task 1 the faces with glasses are assigned class label 1 and for the faces without glasses the class label is 0. For classification Task 2 the smiling faces are assigned class label 1 and for the faces without a smile the class label is 0. To reduce the computational complexity, each facial image ‘’ is resized to [90x90] pixels.
First, face patches corresponding to the eye region (), indicative of occlusions due to glasses, and the mouth region (), indicative of smile are created. Next, Eigen-faces corresponding to the patched faces are extracted and a signature matrix of all the patched faces is created. For a set of faces, the Eigen-face signature matrix has dimensionality of [nxn]. Isomaps are then used to realize a 2-dimensional network from the facial signature matrix. Two nodes/faces in the network that have maximum Euclidean distance between them are detected as cluster identifiers followed by minimum-distance clustering of all remaining faces. The steps in the proposed algorithm are shown in Figure 2. Each step is described below.
2.1 Facial Patch Creation
To generate guided patches for smiling expression and facial occlusion identification, each facial image is thresholded to generate a foreground mask followed by high-pass filtering to extract several regions of interest (). 3 features are evaluated for each of the regions in ‘’ namely, major axis length (), minor axis length () and angular orientation (). The eye region (), indicative of occlusions due to glasses, contains high-pass filtered regions that are partially elliptical () and almost horizontal () as shown in (1). The mouth region () on the other hand is elongated and narrow () and almost horizontal () as shown in (2).
Finally, two face patches are created starting from the centroid of the region in ‘’ and ‘’, respectively, and extending 15 pixels above and below the centroid with length similar to that of the original image. These patches applied to image ‘’ results in patched image of the eye () and mouth () as shown in Figure 3, respectively.
2.2 Eigen-Face Generation
For each patched image ‘’, the Karhunen-Loeve expansion [Kirby and Sirovich1990] is applied to find vectors that best represent the distribution of face images . The steps of Eigen-face generation for the patched eye images () in Task 1 are shown below. Similar steps are followed for the patched mouth images () in Task 2. The mean facial image is computed as the Eigen-vector using (3). The differences of each face from the average face are then computed using (4). It is noteworthy that the dimensionality of each resized face vector is [1x], where ‘x’.
Each difference image is then subjected to principal component analysis (PCA) to find a set of ‘’, which best describe the distribution of the data set as shown in (5).
Now, the real symmetric covariance matrix has dimensions [x], and determination of ‘’ Eigen-vectors is an intractable operation for large image sizes. Thus, the computationally feasible solution for Eigen-vector determination in (5) is to correlate the Eigen-vectors of with dimensionality [nxn] to that of as shown in (7).
From this analysis, we construct matrix ‘’ of dimension [nxn], where . ‘’ Eigen-vectors ‘’ of matrix ‘’ determine the linear contributions of ‘’ faces to form Eigen-faces . The impact of Eigen-face generation is shown in Figure 4, where for the whole facial images ‘’, the Eigen-vector followed by top 15 Eigen-vectors for an image are shown. The matrix represents the signature of each face in terms of an ‘’ dimensional vector. Next, Isomaps are generated using this matrix ‘’ for lower dimension embedding by multi-dimensional scaling [Yang2002].
2.3 Facial Network Clustering: Isomaps
Isomaps have been used to find lower-dimensional manifolds from high-dimensional input data points for clustering faces based on imaging angles and lighting effects [Yang2002][Tenenbaum et al.2000]. In this work, we apply the same principle to analyze the impact of low dimension embedding on facial expression clustering.
Using the [x] Eigen-face signature matrix ‘’, Isomaps are used to realize an unweighted network ‘’, where each facial image ‘’, where is connected to ‘’ nearest Euclidean neighbors. The network ‘’, where represents the n-dimensional signature of each Eigen-face as a node, and ‘’ represents the connectivity matrix. It is noteworthy that if ‘’ is very large, too many connections destroy the clustering pattern, while a small ‘k’ creates a sparse network that lacks clustering properties. The Euclidean distance between nodes and are represented as .
For the two binary expression clustering tasks at hand two faces are identified as representatives of cluster 0 () and cluster 1 (), respectively. The two faces (nodes) that have the largest Euclidean distance between them are selected as the respective cluster representatives using (8). The manually annotated expression class labels of faces () and () corresponding to the classification tasks are then read. These two faces become the training data. For every remaining face, the Euclidean distance of each face from and are computed, followed by assignment of the class label that is at the shortest distance from each face as shown in (9). The assigned class label are [0,1] due to the binary classification tasks.
3 Experiments and Results
The performance of the proposed patched Eigen-face based Isomap clustering for facial occlusion and happiness expression classification are evaluated using two experiments. In the first experiment, the best classification metrics obtained using patched Eigen faces and full Eigen-faces are comparatively analyzed. For both classification tasks, the number of faces with manually annotated expression class label 1 that are correctly classified are true positives (), faces with manual expression class label 0 that are correctly classified are true negatives (). Faces that are actually manually annotated as class label 1 but misclassified as class 0 are false negatives () and faces with actual manual class label 0 but misclassified as 0 are false positives (). Classification performance metrics are computed as sensitivity (SEN), also known as recall, specificity (SPEC) and accuracy (ACC) are computed using (10) .
In the second experiment, the 2-d patched face network is analyzed to detect the faces that have most discriminating characteristics for expression classification.
3.1 Performance of Expression Classification
In Table 1, we observe that for low nearest neighbor parameters (), the Isomaps demonstrate sparsely connected clustering patterns that have high classification characteristics than for higher values of ‘’. Also, we observe that patched Eigen-face networks have comparable classification performance for occlusions when compared to full Eigen-face networks. However, in Table 1
the proposed patched Eigen-face based networks are shown to improve classification SEN, ACC and area under Receiver Operating Characteristic curve (AUC) over full Eigen-face based networks for both classification tasks.
3.2 Expression Network Analysis
After all faces have been clustered using (8-9), the most significant faces that are central to the two expression clusters are detected using network centrality measures. High betweenness centrality () locates nodes in a network that serve as bridge nodes connecting two dense clusters [Roychowdhury2010]. Thus, the two faces/nodes in the Isomap network that have the top two betweenness centralities (, ) represent the faces that lie at the edge of the expression clusters as the marginally classifiable faces.
High Eigen-centrality() is another measure that detects nodes in the network that are most centrally located. Thus, the faces with top two Eigen-centralities (, ) represent the faces that are central to the two decision clusters. For Task 1 and Task 2 the decision clusters and the faces/nodes with top two centrality metrics for full face and proposed patched face network are shown in Figure 5.
In Figure 5 we observe that for full Eigen-face networks, the nodes/faces with high are different from faces with high , however for the patched face networks, certain faces concurrently have high and . In Task 1 and Task 2, the most central patched faces are shown in Figure 6. From this observation we infer that patched faces with high centralities are central to the expression clusters and they also serve as bridge node connections to the other cluster. Thus, identification of these faces/nodes with high centrality measure and using them as training data can further improve the feature-based expression classification performances.
Additionally, we identify other significant patched faces for training purposes by applying repetitive max-flow-min-cut strategy to separate the links with high information flow through them, from the links with low flow. The nodes/faces at either ends of the link with maximum information flow through it are indicative of the patched faces with most information. In Figure 7(a) we observe that for Task 1, one instance of maximum information flow through the Isomap network occurs between a non-occluded female eye image and an occluded male eye image. In Figure 7(b), for Task 2, another instance of maximum information flow through the Isomap network occurs between a non-smiling and a partially smiling facial image. Thus, several additional instances of max-flow-min-cut can identify the faces that are more significant than the others for feature-based expression classification tasks.
In this work we propose an automated facial patch creation method to isolate certain regions of a face regardless of the pose and lighting conditions followed by Eigen-face decomposition and Isomap clustering for classification of facial expressions. We observe that patched Eigen-face Isomap networks created with low neighborhood parameter () values have higher sensitivity than full Eigen-face networks for facial occlusion and smile classification tasks. Additionally, network-based centrality and information flow in networks can be used as measures to detect the most significant subset of faces for feature-based expression classification tasks. The proposed method requires an average of 0.25 seconds per image for generating automated facial masks, 0.75 seconds for Eigen-face decomposition and Isomap network creation followed by 0.1s for clustering the faces/nodes for each classification task.
Future efforts will be directed towards using the proposed method as a first pass for expression classification tasks followed by feature-based expression on the marginally classifiable faces using the most significant faces detected from the proposed method as training data set. Future works can also investigate the performance of the proposed method on other expression classification tasks such as fear, disgust and anger in images with multiple faces.
- [Agarwal et al.2010] Mayank Agarwal, Nikunj Jain, Mr Manish Kumar, and Himanshu Agrawal. Face recognition using eigen faces and artificial neural network. International Journal of Computer Theory and Engineering, 2(4):1793–8201, 2010.
- [Al-modwahi et al.2012] Ashraf Abbas M Al-modwahi, Onkemetse Sebetela, Lefoko Nehemiah Batleng, Behrang Parhizkar, and Arash Habibi Lashkari. Facial expression recognition intelligent security system for real time surveillance. In Proc. of World Congress in Computer Science, Computer Engineering, and Applied Computing, 2012.
- [Bosch et al.2007] Anna Bosch, Andrew Zisserman, and Xavier Munoz. Representing shape with a spatial pyramid kernel. In Proceedings of the 6th ACM international conference on Image and video retrieval, pages 401–408. ACM, 2007.
- [Chavan and Kulkarni2013] Umesh Balkrishna Chavan and Dinesh B Kulkarni. Facial expression recognition-review. International Journal of Latest Trends in Engineering and Technology (IJLTET), 3(1):237–243, 2013.
- [Dalal and Triggs2005] Navneet Dalal and Bill Triggs. Histograms of oriented gradients for human detection. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, volume 1, pages 886–893. IEEE, 2005.
- [Dehkordi and Haddadnia2010] Behnam Kabirian Dehkordi and Javad Haddadnia. Facial expression recognition in video sequence images by using optical flow. In Signal Processing Systems (ICSPS), 2010 2nd International Conference on, volume 1, pages V1–727. IEEE, 2010.
- [Ekman and Friesen1971] Paul Ekman and Wallace V Friesen. Constants across cultures in the face and emotion. Journal of personality and social psychology, 17(2):124, 1971.
- [Ekman and Rosenberg1997] Paul Ekman and Erika L Rosenberg. What the face reveals: Basic and applied studies of spontaneous expression using the Facial Action Coding System (FACS). Oxford University Press, 1997.
- [Hemalatha and Sumathi2014] G Hemalatha and CP Sumathi. A study of techniques for facial detection and expression classification. International Journal of Computer Science & Engineering Survey (IJCSES) Vol, 5, 2014.
- [Kirby and Sirovich1990] Michael Kirby and Lawrence Sirovich. Application of the karhunen-loeve procedure for the characterization of human faces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(1):103–108, 1990.
- [Laboratories2002] At&T Cambridge Laboratories. The database of faces. http://www.cl.cam.ac.uk/research/dtg/attarchive /facedatabase.html, 2002.
- [Lajevardi and Wu2012] Seyed Mehdi Lajevardi and Hong Ren Wu. Facial expression recognition in perceptual color space. Image Processing, IEEE Transactions on, 21(8):3721–3733, 2012.
[Li et al.2005]
Rui-Fan Li, Hong-Wei Hao, Xu yan Tu, and Cong Wang.
Face recognition using kfd-isomap.
Proceedings of 2005 International Conference on Machine Learning and Cybernetics, 2005., volume 7, pages 4544–4548 Vol. 7, Aug 2005.
- [Lonare and Jain2013] Ashish Lonare and Shweta V Jain. A survey on facial expression analysis for emotion recognition. International Journal of Advanced Research in Computer and Communication Engineering, 2(12), 2013.
- [Ojala et al.2002] Timo Ojala, Matti Pietikäinen, and Topi Mäenpää. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 24(7):971–987, 2002.
- [Punitha and Geetha2013] A Punitha and M Kalaiselvi Geetha. Texture based emotion recognition from facial expressions using support vector machine. algorithms (eg Hidden Markov Models (HMMs), 1:6, 2013.
- [Roychowdhury2010] Sohini Roychowdhury. Mathematical models for prediction and optimal mitigation of epidemics. PhD thesis, Kansas State University, 2010.
- [Tenenbaum et al.2000] Joshua B Tenenbaum, Vin De Silva, and John C Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500):2319–2323, 2000.
- [Turk and Pentland1991] Matthew Turk and Alex Pentland. Eigenfaces for recognition. Journal of cognitive neuroscience, 3(1):71–86, 1991.
- [Yang2002] Ming-Hsuan Yang. Extended isomap for pattern classification. In AAAI/IAAI, pages 224–229, 2002.