Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis

06/15/2023
by   Shivam Mehta, et al.
0

With read-aloud speech synthesis achieving high naturalness scores, there is a growing research interest in synthesising spontaneous speech. However, human spontaneous face-to-face conversation has both spoken and non-verbal aspects (here, co-speech gestures). Only recently has research begun to explore the benefits of jointly synthesising these two modalities in a single system. The previous state of the art used non-probabilistic methods, which fail to capture the variability of human speech and motion, and risk producing oversmoothing artefacts and sub-optimal synthesis quality. We present the first diffusion-based probabilistic model, called Diff-TTSG, that jointly learns to synthesise speech and gestures together. Our method can be trained on small datasets from scratch. Furthermore, we describe a set of careful uni- and multi-modal subjective tests for evaluating integrated speech and gesture synthesis systems, and use them to validate our proposed approach. For synthesised examples please see https://shivammehta25.github.io/Diff-TTSG

READ FULL TEXT

page 3

page 5

research
01/24/2023

DiffMotion: Speech-Driven Gesture Synthesis Using Denoising Diffusion Model

Speech-driven gesture synthesis is a field of growing interest in virtua...
research
08/25/2021

Integrated Speech and Gesture Synthesis

Text-to-speech and co-speech gesture synthesis have until now been treat...
research
06/11/2020

Let's face it: Probabilistic multi-modal interlocutor-aware generation of facial gestures in dyadic settings

To enable more natural face-to-face interactions, conversational agents ...
research
09/11/2023

Diffusion-Based Co-Speech Gesture Generation Using Joint Text and Audio Representation

This paper describes a system developed for the GENEA (Generation and Ev...
research
11/17/2022

Listen, denoise, action! Audio-driven motion synthesis with diffusion models

Diffusion models have experienced a surge of interest as highly expressi...
research
04/20/2022

Exploration strategies for articulatory synthesis of complex syllable onsets

High-quality articulatory speech synthesis has many potential applicatio...
research
06/20/2023

EMoG: Synthesizing Emotive Co-speech 3D Gesture with Diffusion Model

Although previous co-speech gesture generation methods are able to synth...

Please sign up or login with your details

Forgot password? Click here to reset