BEAT: A Large-Scale Semantic and Emotional Multi-Modal Dataset for Conversational Gestures Synthesis

03/10/2022
by   Haiyang Liu, et al.
0

Achieving realistic, vivid, and human-like synthesized conversational gestures conditioned on multi-modal data is still an unsolved problem, due to the lack of available datasets, models and standard evaluation metrics. To address this, we build Body-Expression-Audio-Text dataset, BEAT, which has i) 76 hours, high-quality, multi-modal data captured from 30 speakers talking with eight different emotions and in four different languages, ii) 32 millions frame-level emotion and semantic relevance annotations.Our statistical analysis on BEAT demonstrates the correlation of conversational gestures with facial expressions, emotions, and semantics, in addition to the known correlation with audio, text, and speaker identity. Qualitative and quantitative experiments demonstrate metrics' validness, ground truth data quality, and baseline's state-of-the-art performance. To the best of our knowledge, BEAT is the largest motion capture dataset for investigating the human gestures, which may contribute to a number of different research fields including controllable gesture synthesis, cross-modality analysis, emotional gesture recognition. The data, code and model will be released for research.

READ FULL TEXT
research
02/13/2021

Learning Speech-driven 3D Conversational Gestures from Video

We propose the first approach to automatically and jointly synthesize bo...
research
11/10/2020

Multi-modal Fusion for Single-Stage Continuous Gesture Recognition

Gesture recognition is a much studied research area which has myriad rea...
research
03/13/2021

EgoCom: A Multi-person Multi-modal Egocentric Communications Dataset

Multi-modal datasets in artificial intelligence (AI) often capture a thi...
research
06/10/2019

Learning Individual Styles of Conversational Gesture

Human speech is often accompanied by hand and arm gestures. Given audio ...
research
06/11/2020

Let's face it: Probabilistic multi-modal interlocutor-aware generation of facial gestures in dyadic settings

To enable more natural face-to-face interactions, conversational agents ...
research
02/03/2023

Bridging the Emotional Semantic Gap via Multimodal Relevance Estimation

Human beings have rich ways of emotional expressions, including facial a...
research
07/31/2020

Looking At The Body: Automatic Analysis of Body Gestures and Self-Adaptors in Psychological Distress

Psychological distress is a significant and growing issue in society. Au...

Please sign up or login with your details

Forgot password? Click here to reset