Affective computing using speech and eye gaze: a review and bimodal system proposal for continuous affect prediction
Speech has been a widely used modality in the field of affective computing. Recently however, there has been a growing interest in the use of multi-modal affective computing systems. These multi-modal systems incorporate both verbal and non-verbal features for affective computing tasks. Such multi-modal affective computing systems are advantageous for emotion assessment of individuals in audio-video communication environments such as teleconferencing, healthcare, and education. From a review of the literature, the use of eye gaze features extracted from video is a modality that has remained largely unexploited for continuous affect prediction. This work presents a review of the literature within the emotion classification and continuous affect prediction sub-fields of affective computing for both speech and eye gaze modalities. Additionally, continuous affect prediction experiments using speech and eye gaze modalities are presented. A baseline system is proposed using open source software, the performance of which is assessed on a publicly available audio-visual corpus. Further system performance is assessed in a cross-corpus and cross-lingual experiment. The experimental results suggest that eye gaze is an effective supportive modality for speech when used in a bimodal continuous affect prediction system. The addition of eye gaze to speech in a simple feature fusion framework yields a prediction improvement of 6.13 and 1.62
READ FULL TEXT