Visual-Aware Text-to-Speech

by   Mohan Zhou, et al.

Dynamically synthesizing talking speech that actively responds to a listening head is critical during the face-to-face interaction. For example, the speaker could take advantage of the listener's facial expression to adjust the tones, stressed syllables, or pauses. In this work, we present a new visual-aware text-to-speech (VA-TTS) task to synthesize speech conditioned on both textual inputs and sequential visual feedback (e.g., nod, smile) of the listener in face-to-face communication. Different from traditional text-to-speech, VA-TTS highlights the impact of visual modality. On this newly-minted task, we devise a baseline model to fuse phoneme linguistic information and listener visual signals for speech synthesis. Extensive experiments on multimodal conversation dataset ViCo-X verify our proposal for generating more natural audio with scenario-appropriate rhythm and prosody.


page 2

page 4


AnyoneNet: Synchronized Speech and Talking Head Generation for Arbitrary Person

Automatically generating videos in which synthesized speech is synchroni...

Textual Paralanguage and its Implications for Marketing Communications

Both face-to-face communication and communication in online environments...

DBATES: DataBase of Audio features, Text, and visual Expressions in competitive debate Speeches

In this work, we present a database of multimodal communication features...

Responsive Listening Head Generation: A Benchmark Dataset and Baseline

Responsive listening during face-to-face conversations is a critical ele...

Text/Speech-Driven Full-Body Animation

Due to the increasing demand in films and games, synthesizing 3D avatar ...

On the Linguistic and Computational Requirements for Creating Face-to-Face Multimodal Human-Machine Interaction

In this study, conversations between humans and avatars are linguistical...

Visual Speech-Aware Perceptual 3D Facial Expression Reconstruction from Videos

The recent state of the art on monocular 3D face reconstruction from ima...

Please sign up or login with your details

Forgot password? Click here to reset