State-of-the-art text-to-speech (TTS) systems have utilized pretrained
l...
We present a scalable method to produce high quality emphasis for
text-t...
We present eCat, a novel end-to-end multispeaker model capable of: a)
ge...
The research community has long studied computer-assisted pronunciation
...
Generating expressive and contextually appropriate prosody remains a
cha...
Duration modelling has become an important research problem once more wi...
In this paper, we present CopyCat2 (CC2), a novel model capable of: a)
s...
This paper presents a novel data augmentation technique for text-to-spee...
We propose a novel Multi-Scale Spectrogram (MSS) modelling approach to
s...
Voice Conversion (VC) is a technique that aims to transform the
non-ling...
Many factors influence speech yielding different renditions of a given
s...
We propose a weakly-supervised model for word-level mispronunciation
det...
A common approach to the automatic detection of mispronunciation in lang...
Emotional voice conversion models adapt the emotion in speech without
ch...
This paper describes two novel complementary techniques that improve the...
In this paper, we introduce Kathaka, a model trained with a novel two-st...
Various parametric representations have been proposed to model the speec...
Linear prediction (LP) is an ubiquitous analysis method in speech proces...
This paper focuses on the analysis and synthesis of hypo and hyperarticu...
Detecting the correct speech polarity is a necessary step prior to sever...
Maximum Voiced Frequency (MVF) is used in various speech models as the
s...
This paper investigates the temporal excitation patterns of creaky voice...
This paper addresses the problem of estimating the voice source directly...
An inversion of the speech polarity may have a dramatic detrimental effe...
In a previous work, we showed that the glottal source can be estimated f...
In the framework of assessing the pathology severity in chronic cough
di...
It was recently shown that complex cepstrum can be effectively used for
...
Prosody Transfer (PT) is a technique that aims to use the prosody from a...
This paper addresses the problem of automatic detection of voice patholo...
In most current approaches of speech processing, information is extracte...
This paper investigates the differences occuring in the excitation for
d...
Statistical parametric speech synthesizers have recently shown their abi...
This paper addresses the issue of cough detection using only audio
recor...
This paper addresses the problem of pitch modification, as an important
...
This paper proposes a method to improve the quality delivered by statist...
The development of a system for the automatic, objective and reliable
de...
Complex cepstrum is known in the literature for linearly separating caus...
The problem of pitch tracking has been extensively studied in the speech...
The great majority of current voice technology applications relies on
ac...
Homomorphic analysis is a well-known method for the separation of
non-li...
The modeling of speech production often relies on a source-filter approa...
Speech generated by parametric synthesizers generally suffers from a typ...
Source-tract decomposition (or glottal flow estimation) is one of the ba...
This paper proposes a new procedure to detect Glottal Closure and Openin...
The pseudo-periodicity of voiced speech can be exploited in several spee...
This paper focuses on the problem of pitch tracking in noisy conditions....
We present a novel system for singing synthesis, based on attention. Sta...
We present an approach to synthesize whisper by applying a handcrafted s...
Recent advances in Text-to-Speech (TTS) have improved quality and natura...
This paper proposed a novel approach for the detection and reconstructio...