Synthesizing Personalized Non-speech Vocalization from Discrete Speech Representations

06/25/2022
by   Chin-Cheng Hsu, et al.
0

We formulated non-speech vocalization (NSV) modeling as a text-to-speech task and verified its viability. Specifically, we evaluated the phonetic expressivity of HUBERT speech units on NSVs and verified our model's ability to control over speaker timbre even though the training data is speaker few-shot. In addition, we substantiated that the heterogeneity in recording conditions is the major obstacle for NSV modeling. Finally, we discussed five improvements over our method for future research. Audio samples of synthesized NSVs are available on our demo page: https://resemble-ai.github.io/reLaugh.

READ FULL TEXT

page 1

page 3

research
01/05/2023

Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers

We introduce a language modeling approach for text to speech synthesis (...
research
05/10/2020

From Speaker Verification to Multispeaker Speech Synthesis, Deep Transfer with Feedback Constraint

High-fidelity speech can be synthesized by end-to-end text-to-speech mod...
research
11/24/2022

Prosody-controllable spontaneous TTS with neural HMMs

Spontaneous speech has many affective and pragmatic functions that are i...
research
11/07/2021

Speaker Generation

This work explores the task of synthesizing speech in nonexistent human-...
research
03/27/2019

CSS10: A Collection of Single Speaker Speech Datasets for 10 Languages

We describe our development of CSS10, a collection of single speaker spe...
research
07/01/2017

Modeling and Analyzing the Vocal Tract under Normal and Stressful Talking Conditions

In this research, we model and analyze the vocal tract under normal and ...
research
10/22/2020

The NTU-AISG Text-to-speech System for Blizzard Challenge 2020

We report our NTU-AISG Text-to-speech (TTS) entry systems for the Blizza...

Please sign up or login with your details

Forgot password? Click here to reset