Multichannel Attention Network for Analyzing Visual Behavior in Public Speaking

07/21/2017
by   Rahul Sharma, et al.
0

Public speaking is an important aspect of human communication and interaction. The majority of computational work on public speaking concentrates on analyzing the spoken content, and the verbal behavior of the speakers. While the success of public speaking largely depends on the content of the talk, and the verbal behavior, non-verbal (visual) cues, such as gestures and physical appearance also play a significant role. This paper investigates the importance of visual cues by estimating their contribution towards predicting the popularity of a public lecture. For this purpose, we constructed a large database of more than 1800 TED talk videos. As a measure of popularity of the TED talks, we leverage the corresponding (online) viewers' ratings from YouTube. Visual cues related to facial and physical appearance, facial expressions, and pose variations are extracted from the video frames using convolutional neural network (CNN) models. Thereafter, an attention-based long short-term memory (LSTM) network is proposed to predict the video popularity from the sequence of visual features. The proposed network achieves state-of-the-art prediction accuracy indicating that visual cues alone contain highly predictive information about the popularity of a talk. Furthermore, our network learns a human-like attention mechanism, which is particularly useful for interpretability, i.e. how attention varies with time, and across different visual cues by indicating their relative importance.

READ FULL TEXT

page 1

page 3

page 8

research
01/17/2019

Enhance the Motion Cues for Face Anti-Spoofing using CNN-LSTM Architecture

Spatio-temporal information is very important to capture the discriminat...
research
05/04/2017

A Deep Learning Perspective on the Origin of Facial Expressions

Facial expressions play a significant role in human communication and be...
research
07/21/2017

Shallow reading with Deep Learning: Predicting popularity of online content using only its title

With the ever decreasing attention span of contemporary Internet users, ...
research
12/11/2020

Fairness in Rating Prediction by Awareness of Verbal and Gesture Quality of Public Speeches

The role of verbal and non-verbal cues towards great public speaking has...
research
10/31/2019

Harnessing the richness of the linguistic signal in predicting pragmatic inferences

The strength of pragmatic inferences systematically depends on linguisti...
research
09/28/2021

The VVAD-LRS3 Dataset for Visual Voice Activity Detection

Robots are becoming everyday devices, increasing their interaction with ...
research
04/19/2023

The Face of Populism: Examining Differences in Facial Emotional Expressions of Political Leaders Using Machine Learning

Online media has revolutionized the way political information is dissemi...

Please sign up or login with your details

Forgot password? Click here to reset