Speaker Extraction with Co-Speech Gestures Cue

03/31/2022
by   Zexu Pan, et al.
0

Speaker extraction seeks to extract the clean speech of a target speaker from a multi-talker mixture speech. There have been studies to use a pre-recorded speech sample or face image of the target speaker as the speaker cue. In human communication, co-speech gestures that are naturally timed with speech also contribute to speech perception. In this work, we explore the use of co-speech gestures sequence, e.g. hand and body movements, as the speaker cue for speaker extraction, which could be easily obtained from low-resolution video recordings, thus more available than face recordings. We propose two networks using the co-speech gestures cue to perform attentive listening on the target speaker, one that implicitly fuses the co-speech gestures cue in the speaker extraction process, the other performs speech separation first, followed by explicitly using the co-speech gestures cue to associate a separated speech to the target speaker. The experimental results show that the co-speech gestures cue is informative in associating the target speaker, and the quality of the extracted speech shows significant improvements over the unprocessed mixture speech.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/30/2021

USEV: Universal Speaker Extraction with Visual Cue

A speaker extraction algorithm seeks to extract the target speaker's voi...
research
02/02/2021

SPEAK WITH YOUR HANDS Using Continuous Hand Gestures to control Articulatory Speech Synthesizer

This work presents our advancements in controlling an articulatory speec...
research
02/18/2016

EEG-informed attended speaker extraction from recorded speech mixtures with application in neuro-steered hearing prostheses

OBJECTIVE: We aim to extract and denoise the attended speaker in a noisy...
research
01/07/2023

Towards early prediction of neurodevelopmental disorders: Computational model for Face Touch and Self-adaptors in Infants

Infants' neurological development is heavily influenced by their motor s...
research
05/17/2020

Multimodal Target Speech Separation with Voice and Face References

Target speech separation refers to isolating target speech from a multi-...
research
07/01/2021

Passing a Non-verbal Turing Test: Evaluating Gesture Animations Generated from Speech

In real life, people communicate using both speech and non-verbal signal...
research
05/08/2018

Comparing heterogeneous visual gestures for measuring the diversity of visual speech signals

Visual lip gestures observed whilst lipreading have a few working defini...

Please sign up or login with your details

Forgot password? Click here to reset