EditSpeech: A Text Based Speech Editing System Using Partial Inference and Bidirectional Fusion

07/04/2021
by   Daxin Tan, et al.
0

This paper presents the design, implementation and evaluation of a speech editing system, named EditSpeech, which allows a user to perform deletion, insertion and replacement of words in a given speech utterance, without causing audible degradation in speech quality and naturalness. The EditSpeech system is developed upon a neural text-to-speech (NTTS) synthesis framework. Partial inference and bidirectional fusion are proposed to effectively incorporate the contextual information related to the edited region and achieve smooth transition at both left and right boundaries. Distortion introduced to the unmodified parts of the utterance is alleviated. The EditSpeech system is developed and evaluated on English and Chinese in multi-speaker scenarios. Objective and subjective evaluation demonstrate that EditSpeech outperforms a few baseline systems in terms of low spectral distortion and preferred speech quality. Audio samples are available online for demonstration https://daxintan-cuhk.github.io/EditSpeech/ .

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/21/2023

FluentEditor: Text-based Speech Editing by Considering Acoustic and Prosody Consistency

Text-based speech editing (TSE) techniques are designed to enable users ...
research
02/21/2022

CampNet: Context-Aware Mask Prediction for End-to-End Text-Based Speech Editing

The text-based speech editor allows the editing of speech through intuit...
research
10/08/2021

Environment Aware Text-to-Speech Synthesis

This study aims at designing an environment-aware text-to-speech (TTS) s...
research
04/12/2022

CorrectSpeech: A Fully Automated System for Speech Correction and Accent Reduction

This study extends our previous work on text-based speech editing to dev...
research
08/31/2023

Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information

For text-to-speech (TTS) synthesis, prosodic structure prediction (PSP) ...
research
05/23/2023

FluentSpeech: Stutter-Oriented Automatic Speech Editing with Context-Aware Diffusion Models

Stutter removal is an essential scenario in the field of speech editing....
research
10/07/2015

Hierarchical Representation of Prosody for Statistical Speech Synthesis

Prominences and boundaries are the essential constituents of prosodic st...

Please sign up or login with your details

Forgot password? Click here to reset