It's LeVAsa not LevioSA! Latent Encodings for Valence-Arousal Structure Alignment

07/20/2020 ∙ by Surabhi S Nath, et al. ∙ IIIT Delhi 0

In recent years, great strides have been made in the field of affective computing. Several models have been developed to represent and quantify emotions. Two popular ones include (i) categorical models which represent emotions as discrete labels, and (ii) dimensional models which represent emotions in a Valence-Arousal (VA) circumplex domain. However, there is no standard for annotation mapping between the two labelling methods. We build a novel algorithm for mapping categorical and dimensional model labels using annotation transfer across affective facial image datasets. Further, we utilize the transferred annotations to learn rich and interpretable data representations using a variational autoencoder (VAE). We present "LeVAsa", a VAE model that learns implicit structure by aligning the latent space with the VA space. We evaluate the efficacy of LeVAsa by comparing performance with the Vanilla VAE using quantitative and qualitative analysis on two benchmark affective image datasets. Our results reveal that LeVAsa achieves high latent-circumplex alignment which leads to improved downstream categorical emotion prediction. The work also demonstrates the trade-off between degree of alignment and quality of reconstructions.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Emotions are intrinsic characteristics of most living species, particularly overt in human behaviour (Darwin and Prodger, 1998; Panksepp, 2004; Izard, 2013). Intelligent systems must employ means to incorporate emotions for a more natural interaction (Picard, 2000). This surge for “emotional intelligence” has evolved into the field of affective computing, which by definition encompasses the creation of and interaction with machines that can sense, recognize, respond to, and influence emotions (Picard and Klein, 2002). Several models of emotion have been developed over the years, which are considered as the backbone of affective computing (Gratch et al., 2009; Marsella et al., 2010; Tracy and Randles, 2011; Hamann, 2012). Among these models, a popular choice is the Categorical Model which describes six basic discrete emotions, namely, happiness, anger, disgust, sadness, fear, and surprise (Ekman and Friesen, 1971). However, this model failed to capture relations between the discrete emotions. Moreover, there is a lack of consistency in the choice of these fundamental emotions (Ekman and Cordaro, 2011). As a result, Russell & Mehrabian (Russell and Mehrabian, 1977) developed the Dimensional Model which suggests that each emotional state can be defined in terms of Valence (pleasure of an emotion), Arousal (energy of an emotion) and Dominance (controlling nature of an emotion). The Dominance dimension is commonly ignored since the valence- arousal (VA) dimensional model was shown to possess adequate reliability, convergent validity, and discriminant validity (Russell et al., 1989). This led to the conceptualization of the Circumplex Model to represent affective states as a circle in a 2D bipolar VA space (Russell, 1980). The VA variables are typically considered independent (Feldman Barrett and Russell, 1998).

The existence of different models of emotions result in a range of possible annotation strategies for affective data (Fabian Benitez-Quiroz et al., 2016; Nicolaou et al., 2010, 2011; Lucey et al., 2010; Dhall et al., 2011)

. This poses two challenges: (i) building deep learning models on affective data, and (ii) drawing collective insights from multiple datasets having potentially different formats of annotations

(De Bruyne et al., 2019). In this paper, we present a novel algorithm for mapping annotations of the Categorical Model to those of the Dimensional Model through annotation transfer across affective facial image datasets.

The subsequent task following annotation mapping is to obtain meaningful data representations. With the increased use of deep neural networks and generative models, there have been significant advances in emotion modelling and affective computing

(Han et al., 2019; Rouast et al., 2019; Jolly et al., 2019). Variational Autoencoders (VAEs) (Kingma and Welling, 2013) are known to yield disentangled latent representations and generate new data samples (Hu et al., 2018; Shukla et al., 2019; Higgins et al., 2017). They have been used extensively in affective computing to represent text, audio, image and electroencephalography (EEG) data (Wu et al., 2019; Latif et al., 2017). Applying VAEs on affective facial images to obtain disentangled image representations can (i) provide high quality feature representations for downstream tasks (Bengio et al., 2013; Peters et al., 2017), and (ii) serve applications like facial editing and data augmentation (Lindt et al., 2019). In our study, we obtain interpretable features by aligning the latent space of a VAE with the VA space. This enables improved affect classification and regression as demonstrated on two benchmark affective image datasets using a series of evaluation tasks.
Our major contributions are as follows:

  1. an annotation transfer algorithm for label transfer between Categorical and Dimensional models of emotion

  2. a regularised VAE model “LeVAsa” (Latent Encodings for Valence-Arousal Structure Alignment) that yields an interpretable latent space with an implicit structure aligned with the VA space

The rest of the paper is organized as follows. Section 2 presents our annotation transfer algorithm, the VAE model architectures and the datasets used in our experiments. Section 3 outlines the evaluation tasks conducted along with the obtained results. Section 4 concludes the paper and motivates future work.

2. Methods

In this section, we present our annotation transfer algorithm and describe our VAE model architectures. Our code and models are publically available 111https://github.com/vishaal27/LeVAsa.

2.1. Annotation Transfer Algorithm

For the task of annotation transfer between Categorical and Dimensional emotion models, we use an external reference dataset () containing both discrete categorical emotion labels (, where are the n discrete emotional labels) and valence, arousal values (, where are the lower and upper limits for valence and arousal values respectively). Each data sample thus has an emotion label , a valence value and an arousal value . serves as the standard based on which continuous or discrete VA values can be sampled for data points in a working dataset () with only emotion labels, or conversely, the most likely emotion labels can be obtained for data points in dataset with only VA tuples (Figure 2). Algorithm 1 is detailed as follows.

Input : reference dataset , discrete categorical emotion labels , VA values and , working dataset with discrete emotion labels, working dataset with VA tuples
Output : VA values for the working dataset , discrete emotion labels for the working dataset
1 Partition each sample into groups based on discrete emotion labels
2 For each group , , obtain the mean valence

, standard deviation valence

, mean arousal , standard deviation arousal values
3 Generate ellipses , for group representing emotion with centre , semi major axis and semi minor axis
4 To obtain VA values for data point in with label , sample from ellipse as: , , where , and ,
5 Convert to discrete values by scaling and rounding-off if desired
To obtain emotion label for sample in with VA , find ellipse with centroid at least Euclidean distance from , and assign as most likely emotion
Algorithm 1 Annotation transfer algorithm

2.2. VAE model architectures

Figure 1. Model Architecture

We train a generative model with an interpretable latent space with an implicit structure given a raw distribution of affective face images. We employ variational autoencoder based models because of their simple training protocols and structured inductive priors. We compare two VAE models, Vanilla VAE and LeVAsa. The latent space for both models was constructed to comprise three chunks. Figure 1 depicts our model architectures.

For the Vanilla VAE, no explicit alignment was imposed on the latent space, whereas for LeVAsa, we take inspiration from recent work (Jha et al., 2018; Bhagat et al., 2020) and model the latent space as follows:

  • – subspace consisting of valence attributes that learn to encode the valence features of image samples

  • – subspace consisting of arousal attributes that learn to encode the arousal features of image samples

  • – subspace consisting of other miscellaneous generative attributes that are required for high-fidelity reconstruction of the input data distribution.

Given a dataset of affective images , our VAE backbone consists of an encoder and a decoder given by:

We train the Vanilla VAE with a simple reconstruction loss along with a modified Kullback-Leibler (KL) loss (Eq. 1). We induce a prior on all three attributes , and .

(1)

We employ the same backbone Vanilla VAE architecture for the LeVAsa model with two major modifications:

  1. Projection Heads: We make use of two non-linear projection heads and which map the encoded valence and arousal representations and to the valence and arousal label space (giving label representations and ) where VA-regularisation loss is applied. The projections obtained are represented as follows:

  2. VA-regularization loss: To impose an explicit alignment of the and attributes with the VA ground truth factors, we introduce a VA-regularization loss as follows:

    (2)

    where takes the form of MSE for continuous and BCE for discrete annotation type.

The overall optimization objective for the LeVAsa model is:

(3)

where and

are hyperparameters.

2.3. Datasets

We use the following datasets in our experiments.

Annotation Transfer: AffectNet

  • AffectNet (Mollahosseini et al., 2017) is the largest facial expression dataset, with over 420,000 annotated images and contains both continuous VA annotations in [-1, 1] and discrete emotional labels in Neutral, Anger, Happiness, Sadness, Surprise, Fear, Disgust, Contempt, None, Uncertain and Non-face. The dataset also incorporates a wide diversity in gender, age and ethnicity, hence is an ideal choice for the reference dataset in the annotation transfer algorithm (Algorithm 1). The generated ellipses are shown in Figure 2.

Model Training: IMFDB, AFEW

  • IMFDB (Setty et al., 2013) contains around 34,000 annotated zoomed-in facial images of 100 Indian actors, with only emotional labels Neutral, Anger, Happiness, Sadness, Surprise, Fear, Disgust and no VA supervision. Continuous and discrete VA supervision for IMFDB is obtained from annotation transfer using AffectNet. This is particularly well suited due to the similar nature of images in IMFDB and AffectNet datasets.

  • AFEW (Dhall et al., 2011) on the other hand, contains around 24,000 annotated images from videos of real world scenes of approximately 600 actors with only discrete VA values in {-10, -9,…, 9, 10} and no discrete emotional labels.

The different nature of IMFDB and AFEW datasets allow us to analyse and compare model performance based on different factors including image type (zoomed in faces/video scenes) and annotation type (discrete VA supervision/continuous VA supervision).

(a)
Figure 2. Ellipses from AffectNet for annotation transfer
(a) IMFDB Vanilla VAE
(b) IMFDB LeVAsa Cont.
(c) IMFDB LeVAsa Discrete
(d) AFEW Vanilla VAE
(e) AFEW LeVAsa
Figure 4. VAE Reconstruction from the five models

3. Experiments

We perform our analyses and evaluations through a series of qualitative and quantitative experiments. This enables comparisons based on three aspects: (i) architecture (Vanilla VAE vs LeVAsa), (ii) dataset (IMFDB vs AFEW), and (iii) nature of annotations (Continuous VA vs Discrete VA). Altogether, we train five models: (i) Vanilla VAE on IMFDB, (ii) LeVAsa on IMFDB with continuous VA annotations, (iii) LeVAsa on IMFDB with discrete VA annotations, (iv) Vanilla VAE on AFEW, and (v) LeVAsa on AFEW with discrete VA annotations.

3.1. Latent-Circumplex Alignment

We measure the alignment of LeVAsa’s latent space with the VA ground truths using normalized Euclidean and Manhattan distance metrics for continuous annotations, and Cross Entropy measure for discrete annotations. This helps quantify the degree of latent-circumplex alignment. For the Vanilla VAE, we determine the and

chunks heuristically by considering the two latent chucks which aligned best with the corresponding valence and arousal ground truths. Further, we reduce the dimensionality of the

and latent chunks and plot them alongside the ground truth to replicate the circumplex representation.

It is found that LeVAsa outperformed Vanilla VAE for both continuous and discrete annotations (Table 1). This clearly exhibits the superior latent-circumplex alignment achieved by LeVAsa. For discrete annotations, in case of AFEW, the difference between the cross entropy measures of the Vanilla VAE and LeVAsa is greater than in case of IMFDB. This could be attributed to the different image types in both datasets. The circumplex plots (Figure 3

) for LeVAsa reveal reduced variance and increased alignment with true labels. This validates the quantitative results in Table

1.

IMFDB Valence Arousal Combined
MSE MAE MSE MAE MSE MAE
Vanilla VAE 1.83 0.29 1.49 0.26 3.31 0.55
LeVAsa 0.14 0.14 0.06 0.09 0.2 0.23
(a) Continuous
Table 1. Alignment
Model IMFDB AFEW
Vanilla VAE 8.9 8.9
LeVAsa 6.63 2.54
(b) Discrete
(a)
(a) IMFDB Cont.
(b) IMFDB Disc.
(c) AFEW
Figure 3. Circumplex Representation

To gain further insights, we assess the regressive power of the latent chunks and

by their ability to predict the corresponding VA ground truths. We used Multi Layer Perceptron (MLP) Regression for this task. This analysis applies to continuous annotations hence it was conducted only on the LeVAsa and Vanilla VAE models trained on IMFDB dataset with continuous VA values.

Axis Model MSE MAE EV
Valence
Vanilla VAE
LeVAsa
0.256
0.251
0.420
0.414
-0.011
0.016
-0.012
0.015
Arousal
Vanilla VAE
LeVAsa
0.092
0.074
0.242
0.224
-0.022
0.095
-0.048
0.086
Table 2. VA Regressive Power

It is observed that the MSE and MAE values computed for LeVAsa were lower by 2.25% and 1.42% as compared to Vanilla VAE for valence, and lower by 19.13% and 7.18% as compared to Vanilla VAE for arousal (Table 2). Furthermore, the goodness of fit metrics (explained variance and ) showed better performance in the case of LeVAsa. These results further strengthen our hypothesis.

3.2. Categorical Emotion Predictive Power

We predict the discrete emotion labels using different combinations of latent representations obtained from Vanilla VAE and LeVAsa (Table 3

). Due to lack of discrete emotion labels in the AFEW dataset, it was excluded from this analysis. We randomized the data splits across Continuous and Discrete experiments to ensure an unbiased setup. Model performance is evaluated using classification accuracy. We utilize a simple one-layered MLP to ensure that the accuracy is a direct measure of representation quality and not influenced by the complexity of the classifier.

Annotation Chunk Vanilla VAE LeVAsa Difference=
Type Combination (V) (L) L - V (in %)
0.29 0.36 7
0.32 0.35 3
Continuous 0.32 0.36 4
0.32 0.38 6
0.29 0.33 4
0.30 0.35 5
0.27 0.30 3
Discrete 0.24 0.30 6
0.25 0.33 8
0.26 0.30 4
Table 3. Categorical emotion predictive power using vanilla VAE and LeVAsa models. All reported scores are accuracies (

represents vector concatenation)

It is seen that LeVAsa has significantly better predictive power as compared to the Vanilla VAE. Moreover, for LeVAsa, the VA chunks alone are more informative in emotion prediction as compared to chunks altogether. Also, the improvement in classification accuracy by employing LeVAsa in place of Vanilla VAE can be compared under the continuous and discrete settings. This reveals that LeVAsa representations from the model trained with discrete annotations and BCE loss (Eq. 2) proves to be better at classifying emotion labels. This is due to the discrete nature of emotion labels which correlate well with the model representations.

3.3. Reconstruction Quality

VAE models are prone to posterior collapse and can produce unreliable reconstructions (He et al., 2019; Rybkin et al., 2020). Thus, along with analyses of the latent representations, we also study the quality of the reconstructed faces (Figure 4).

It is observed that the quality of the reconstructed faces is slightly compromised in the case of LeVAsa as compared to Vanilla VAE. This can be attributed to the slightly higher variance of the learnt LeVAsa decoding distribution (Higgins et al., 2017; Alemi et al., 2018). By Shannon’s rate-distortion theory (Berger, 2003), there is a trade-off between the distortion (reconstruction quality) and rate (representation quality). Since we are imposing an explicit compression bottleneck on the latent representations, it is expected that the reconstruction quality is slightly compromised in order to achieve better interpretability of latent representations.

4. Conclusion

In this paper, we have developed an annotation-transfer algorithm for mapping between Categorical and Dimensional emotion model annotations. Using them, we generated interpretable image features with a VA-regularized VAE model called LeVAsa. We conducted a series of evaluation tasks to verify and validate our experiments and compare performance based on three factors: (i) architecture (Vanilla VAE vs LeVAsa), (ii) dataset (IMFDB vs AFEW), and (iii) nature of annotations (Continuous VA vs Discrete VA). The results showed that the LeVAsa model obtains robust and interpretable representations enabling improved downstream affective task performance. In the future, we hope to (i) extend the annotation-transfer algorithm to action-unit annotations, and (ii) perform latent traversals for data augmentation and facial editing.

Acknowledgements

This work was supported by the Infosys Center for Artificial Intelligence at IIIT Delhi, India.

References

  • (1)
  • Alemi et al. (2018) Alexander Alemi, Ben Poole, Ian Fischer, Joshua Dillon, Rif A Saurous, and Kevin Murphy. 2018. Fixing a broken ELBO. In

    International Conference on Machine Learning

    . 159–168.
  • Bengio et al. (2013) Yoshua Bengio, Aaron Courville, and Pascal Vincent. 2013. Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence 35, 8 (2013), 1798–1828.
  • Berger (2003) Toby Berger. 2003. Rate-distortion theory. Wiley Encyclopedia of Telecommunications (2003).
  • Bhagat et al. (2020) Sarthak Bhagat, Vishaal Udandarao, and Shagun Uppal. 2020. DisCont: Self-Supervised Visual Attribute Disentanglement using Context Vectors. arXiv preprint arXiv:2006.05895 (2020).
  • Darwin and Prodger (1998) Charles Darwin and Phillip Prodger. 1998. The expression of the emotions in man and animals. Oxford University Press.
  • De Bruyne et al. (2019) Luna De Bruyne, Pepa Atanasova, and Isabelle Augenstein. 2019. Joint Emotion Label Space Modelling for Affect Lexica. arXiv preprint arXiv:1911.08782 (2019).
  • Dhall et al. (2011) Abhinav Dhall, Roland Goecke, Simon Lucey, and Tom Gedeon. 2011. Acted facial expressions in the wild database. Australian National University, Canberra, Australia, Technical Report TR-CS-11 2 (2011), 1–13.
  • Ekman and Cordaro (2011) Paul Ekman and Daniel Cordaro. 2011. What is meant by calling emotions basic. Emotion review 3, 4 (2011), 364–370.
  • Ekman and Friesen (1971) Paul Ekman and Wallace V Friesen. 1971. Constants across cultures in the face and emotion. Journal of personality and social psychology 17, 2 (1971), 124–129.
  • Fabian Benitez-Quiroz et al. (2016) C Fabian Benitez-Quiroz, Ramprakash Srinivasan, and Aleix M Martinez. 2016. Emotionet: An accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild. In

    Proceedings of the IEEE conference on computer vision and pattern recognition

    . 5562–5570.
  • Feldman Barrett and Russell (1998) Lisa Feldman Barrett and James A Russell. 1998. Independence and bipolarity in the structure of current affect. Journal of personality and social psychology 74, 4 (1998), 967–984.
  • Gratch et al. (2009) Jonathan Gratch, Stacy Marsella, Ning Wang, and Brooke Stankovic. 2009. Assessing the validity of appraisal-based models of emotion. In 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops. IEEE, 1–8.
  • Hamann (2012) Stephan Hamann. 2012. Mapping discrete and dimensional emotions onto the brain: controversies and consensus. Trends in cognitive sciences 16, 9 (2012), 458–466.
  • Han et al. (2019) Jing Han, Zixing Zhang, and Bjorn Schuller. 2019.

    Adversarial training in affective computing and sentiment analysis: Recent advances and perspectives.

    IEEE Computational Intelligence Magazine 14, 2 (2019), 68–81.
  • He et al. (2019) Junxian He, Daniel Spokoyny, Graham Neubig, and Taylor Berg-Kirkpatrick. 2019. Lagging inference networks and posterior collapse in variational autoencoders. arXiv preprint arXiv:1901.05534 (2019).
  • Higgins et al. (2017) Irina Higgins, Loïc Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew M Botvinick, Shakir Mohamed, and Alexander Lerchner. 2017. beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework. In ICLR. 1–22.
  • Hu et al. (2018) Qiyang Hu, Attila Szabó, Tiziano Portenier, Paolo Favaro, and Matthias Zwicker. 2018. Disentangling factors of variation by mixing them. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3399–3407.
  • Izard (2013) Carroll E Izard. 2013. Human emotions. Springer Science & Business Media.
  • Jha et al. (2018) Ananya Harsh Jha, Saket Anand, Maneesh Singh, and VSR Veeravasarapu. 2018. Disentangling factors of variation with cycle-consistent variational auto-encoders. In European Conference on Computer Vision. Springer, 829–845.
  • Jolly et al. (2019) Baani Leen Kaur Jolly, Palash Aggrawal, Surabhi S Nath, Viresh Gupta, Manraj Singh Grover, and Rajiv Ratn Shah. 2019. Universal EEG Encoder for Learning Diverse Intelligent Tasks. In 2019 IEEE Fifth International Conference on Multimedia Big Data (BigMM). IEEE, 213–218.
  • Kingma and Welling (2013) Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013).
  • Latif et al. (2017) Siddique Latif, Rajib Rana, Junaid Qadir, and Julien Epps. 2017. Variational autoencoders for learning latent representations of speech emotion: A preliminary study. arXiv preprint arXiv:1712.08708 (2017).
  • Lindt et al. (2019) Alexandra Lindt, Pablo Barros, Henrique Siqueira, and Stefan Wermter. 2019. Facial expression editing with continuous emotion labels. In 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019). IEEE, 1–8.
  • Lucey et al. (2010) Patrick Lucey, Jeffrey F Cohn, Takeo Kanade, Jason Saragih, Zara Ambadar, and Iain Matthews. 2010. The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression. In 2010 ieee computer society conference on computer vision and pattern recognition-workshops. IEEE, 94–101.
  • Marsella et al. (2010) Stacy Marsella, Jonathan Gratch, Paolo Petta, et al. 2010. Computational models of emotion. A Blueprint for Affective Computing-A sourcebook and manual 11, 1 (2010), 21–46.
  • Mollahosseini et al. (2017) Ali Mollahosseini, Behzad Hasani, and Mohammad H Mahoor. 2017. Affectnet: A database for facial expression, valence, and arousal computing in the wild. IEEE Transactions on Affective Computing 10, 1 (2017), 18–31.
  • Nicolaou et al. (2010) Mihalis A Nicolaou, Hatice Gunes, and Maja Pantic. 2010. Audio-visual classification and fusion of spontaneous affective data in likelihood space. In 2010 20th International Conference on Pattern Recognition. IEEE, 3695–3699.
  • Nicolaou et al. (2011) Mihalis A Nicolaou, Hatice Gunes, and Maja Pantic. 2011. Continuous prediction of spontaneous affect from multiple cues and modalities in valence-arousal space. IEEE Transactions on Affective Computing 2, 2 (2011), 92–105.
  • Panksepp (2004) Jaak Panksepp. 2004. Affective neuroscience: The foundations of human and animal emotions. Oxford University Press.
  • Peters et al. (2017) Jonas Peters, Dominik Janzing, and Bernhard Schölkopf. 2017. Elements of causal inference. The MIT Press.
  • Picard (2000) Rosalind W Picard. 2000. Affective computing. The MIT Press.
  • Picard and Klein (2002) Rosalind W Picard and Jonathan Klein. 2002. Computers that recognise and respond to user emotion: theoretical and practical implications. Interacting with computers 14, 2 (2002), 141–169.
  • Rouast et al. (2019) Philipp V Rouast, Marc Adam, and Raymond Chiong. 2019. Deep learning for human affect recognition: insights and new developments. IEEE Transactions on Affective Computing (2019), 1–20.
  • Russell (1980) James A Russell. 1980. A circumplex model of affect. Journal of personality and social psychology 39, 6 (1980), 1161–1178.
  • Russell and Mehrabian (1977) James A Russell and Albert Mehrabian. 1977. Evidence for a three-factor theory of emotions. Journal of research in Personality 11, 3 (1977), 273–294.
  • Russell et al. (1989) James A Russell, Anna Weiss, and Gerald A Mendelsohn. 1989. Affect grid: a single-item scale of pleasure and arousal. Journal of personality and social psychology 57, 3 (1989), 493–502.
  • Rybkin et al. (2020) Oleh Rybkin, Kostas Daniilidis, and Sergey Levine. 2020. Simple and Effective VAE Training with Calibrated Decoders. arXiv preprint arXiv:2006.13202 (2020).
  • Setty et al. (2013) Shankar Setty, Moula Husain, Parisa Beham, Jyothi Gudavalli, Menaka Kandasamy, Radhesyam Vaddi, Vidyagouri Hemadri, JC Karure, Raja Raju, B Rajan, et al. 2013.

    Indian movie face database: a benchmark for face recognition under wide variations. In

    2013 fourth national conference on computer vision, pattern recognition, image processing and graphics (NCVPRIPG). IEEE, 1–5.
  • Shukla et al. (2019) Ankita Shukla, Sarthak Bhagat, Shagun Uppal, Saket Anand, and Pavan K. Turaga. 2019.

    Product of Orthogonal Spheres Parameterization for Disentangled Representation Learning. In

    BMVC. 1–13.
  • Tracy and Randles (2011) Jessica L Tracy and Daniel Randles. 2011. Four models of basic emotions: a review of Ekman and Cordaro, Izard, Levenson, and Panksepp and Watt. Emotion Review 3, 4 (2011), 397–405.
  • Wu et al. (2019) Chuhan Wu, Fangzhao Wu, Sixing Wu, Zhigang Yuan, Junxin Liu, and Yongfeng Huang. 2019. Semi-supervised dimensional sentiment analysis with variational autoencoder. Knowledge-Based Systems 165 (2019), 30–39.