Co-Speech Gesture Detection through Multi-phase Sequence Labeling

08/21/2023
by   Esam Ghaleb, et al.
0

Gestures are integral components of face-to-face communication. They unfold over time, often following predictable movement phases of preparation, stroke, and retraction. Yet, the prevalent approach to automatic gesture detection treats the problem as binary classification, classifying a segment as either containing a gesture or not, thus failing to capture its inherently sequential and contextual nature. To address this, we introduce a novel framework that reframes the task as a multi-phase sequence labeling problem rather than binary classification. Our model processes sequences of skeletal movements over time windows, uses Transformer encoders to learn contextual embeddings, and leverages Conditional Random Fields to perform sequence labeling. We evaluate our proposal on a large dataset of diverse co-speech gestures in task-oriented face-to-face dialogues. The results consistently demonstrate that our method significantly outperforms strong baseline models in detecting gesture strokes. Furthermore, applying Transformer encoders to learn contextual embeddings from movement sequences substantially improves gesture unit detection. These results highlight our framework's capacity to capture the fine-grained dynamics of co-speech gesture phases, paving the way for more nuanced and accurate gesture detection and analysis.

READ FULL TEXT

page 1

page 3

page 8

research
08/17/2020

Sequence-to-Sequence Predictive Model: From Prosody To Communicative Gestures

Communicative gestures and speech prosody are tightly linked. Our object...
research
10/20/2015

What's the point? Frame-wise Pointing Gesture Recognition with Latent-Dynamic Conditional Random Fields

We use Latent-Dynamic Conditional Random Fields to perform skeleton-base...
research
03/04/2021

It's A Match! Gesture Generation Using Expressive Parameter Matching

Automatic gesture generation from speech generally relies on implicit mo...
research
11/05/2002

Prosody Based Co-analysis for Continuous Recognition of Coverbal Gestures

Although speech and gesture recognition has been studied extensively, al...
research
05/02/2023

AQ-GT: a Temporally Aligned and Quantized GRU-Transformer for Co-Speech Gesture Synthesis

The generation of realistic and contextually relevant co-speech gestures...
research
07/01/2021

iMiGUE: An Identity-free Video Dataset for Micro-Gesture Understanding and Emotion Analysis

We introduce a new dataset for the emotional artificial intelligence res...
research
04/01/2017

Complexity-Aware Assignment of Latent Values in Discriminative Models for Accurate Gesture Recognition

Many of the state-of-the-art algorithms for gesture recognition are base...

Please sign up or login with your details

Forgot password? Click here to reset