Prosody Based Co-analysis for Continuous Recognition of Coverbal Gestures

by   Sanshzar Kettebekov, et al.

Although speech and gesture recognition has been studied extensively, all the successful attempts of combining them in the unified framework were semantically motivated, e.g., keyword-gesture cooccurrence. Such formulations inherited the complexity of natural language processing. This paper presents a Bayesian formulation that uses a phenomenon of gesture and speech articulation for improving accuracy of automatic recognition of continuous coverbal gestures. The prosodic features from the speech signal were coanalyzed with the visual signal to learn the prior probability of co-occurrence of the prominent spoken segments with the particular kinematical phases of gestures. It was found that the above co-analysis helps in detecting and disambiguating visually small gestures, which subsequently improves the rate of continuous gesture recognition. The efficacy of the proposed approach was demonstrated on a large database collected from the weather channel broadcast. This formulation opens new avenues for bottom-up frameworks of multimodal integration.




Toward Natural Gesture/Speech Control of a Large Display

In recent years because of the advances in computer vision research, fre...

Speech2Properties2Gestures: Gesture-Property Prediction as a Tool for Generating Representational Gestures from Speech

We propose a new framework for gesture generation, aiming to allow data-...

It's A Match! Gesture Generation Using Expressive Parameter Matching

Automatic gesture generation from speech generally relies on implicit mo...

GestureMap: Supporting Visual Analytics and Quantitative Analysis of Motion Elicitation Data by Learning 2D Embeddings

This paper presents GestureMap, a visual analytics tool for gesture elic...

Continuous and Simultaneous Gesture and Posture Recognition for Commanding a Robotic Wheelchair; Towards Spotting the Signal Patterns

Spotting signal patterns with varying lengths has been still an open pro...

What's the point? Frame-wise Pointing Gesture Recognition with Latent-Dynamic Conditional Random Fields

We use Latent-Dynamic Conditional Random Fields to perform skeleton-base...

Labeling the Phrase Set of the Conversation Agent, Rinna

Mapping spoken text to gestures is an important research area for robots...
