Exploring Transfer Learning for Low Resource Emotional TTS

01/14/2019
by   Noé Tits, et al.
0

During the last few years, spoken language technologies have known a big improvement thanks to Deep Learning. However Deep Learning-based algorithms require amounts of data that are often difficult and costly to gather. Particularly, modeling the variability in speech of different speakers, different styles or different emotions with few data remains challenging. In this paper, we investigate how to leverage fine-tuning on a pre-trained Deep Learning-based TTS model to synthesize speech with a small dataset of another speaker. Then we investigate the possibility to adapt this model to have emotional TTS by fine-tuning the neutral TTS model with a small emotional dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/24/2023

Pre-Finetuning for Few-Shot Emotional Speech Recognition

Speech models have long been known to overfit individual speakers for ma...
research
07/05/2019

A Methodology for Controlling the Emotional Expressiveness in Synthetic Speech – a Deep Learning approach

In this project, we aim to build a Text-to-Speech system able to produce...
research
11/12/2020

Using IPA-Based Tacotron for Data Efficient Cross-Lingual Speaker Adaptation and Pronunciation Enhancement

Recent neural Text-to-Speech (TTS) models have been shown to perform ver...
research
09/26/2022

Effects of language mismatch in automatic forensic voice comparison using deep learning embeddings

In forensic voice comparison the speaker embedding has become widely pop...
research
10/26/2022

Fast Yet Effective Speech Emotion Recognition with Self-distillation

Speech emotion recognition (SER) is the task of recognising human's emot...
research
08/07/2020

Disentangled speaker and nuisance attribute embedding for robust speaker verification

Over the recent years, various deep learning-based embedding methods hav...
research
08/15/2022

Analysis of impact of emotions on target speech extraction and speech separation

Recently, the performance of blind speech separation (BSS) and target sp...

Please sign up or login with your details

Forgot password? Click here to reset