Thomas Drugman

research

∙ 09/04/2023

A Comparative Analysis of Pretrained Language Models for Text-to-Speech

State-of-the-art text-to-speech (TTS) systems have utilized pretrained l...

0 Marcel Granero Moya, et al. ∙

research

∙ 07/13/2023

Controllable Emphasis with zero data for text-to-speech

We present a scalable method to produce high quality emphasis for text-t...

0 Arnaud Joly, et al. ∙

research

∙ 06/20/2023

eCat: An End-to-End Model for Multi-Speaker TTS Many-to-Many Fine-Grained Prosody Transfer

We present eCat, a novel end-to-end multispeaker model capable of: a) ge...

0 Ammar Abbas, et al. ∙

research

∙ 07/02/2022

Computer-assisted Pronunciation Training – Speech synthesis is almost all you need

The research community has long studied computer-assisted pronunciation ...

0 Daniel Korzekwa, et al. ∙

research

∙ 06/29/2022

Simple and Effective Multi-sentence TTS with Expressive and Coherent Prosody

Generating expressive and contextually appropriate prosody remains a cha...

0 Peter Makarov, et al. ∙

research

∙ 06/28/2022

Expressive, Variable, and Controllable Duration Modelling in TTS

Duration modelling has become an important research problem once more wi...

0 Ammar Abbas, et al. ∙

research

∙ 06/27/2022

CopyCat2: A Single Model for Multi-Speaker TTS and Many-to-Many Fine-Grained Prosody Transfer

In this paper, we present CopyCat2 (CC2), a novel model capable of: a) s...

0 Sri Karlapati, et al. ∙

research

∙ 02/13/2022

Distribution augmentation for low-resource expressive text-to-speech

This paper presents a novel data augmentation technique for text-to-spee...

0 Mateusz Łajszczak, et al. ∙

research

∙ 06/29/2021

Multi-Scale Spectrogram Modelling for Neural Text-to-Speech

We propose a novel Multi-Scale Spectrogram (MSS) modelling approach to s...

0 Ammar Abbas, et al. ∙

research

∙ 06/16/2021

Voicy: Zero-Shot Non-Parallel Voice Conversion in Noisy Reverberant Environments

Voice Conversion (VC) is a technique that aims to transform the non-ling...

0 Alejandro Mottini, et al. ∙

research

∙ 06/14/2021

A learned conditional prior for the VAE acoustic space of a TTS system

Many factors influence speech yielding different renditions of a given s...

0 Penny Karanasou, et al. ∙

research

∙ 06/07/2021

Weakly-supervised word-level pronunciation error detection in non-native English speech

We propose a weakly-supervised model for word-level mispronunciation det...

0 Daniel Korzekwa, et al. ∙

research

∙ 01/16/2021

Mispronunciation Detection in Non-native (L2) English with Uncertainty Modeling

A common approach to the automatic detection of mispronunciation in lang...

0 Daniel Korzekwa, et al. ∙

research

∙ 01/14/2021

EmoCat: Language-agnostic Emotional Voice Conversion

Emotional voice conversion models adapt the emotion in speech without ch...

0 Bastian Schnell, et al. ∙

research

∙ 12/29/2020

Detection of Lexical Stress Errors in Non-native (L2) English with Data Augmentation and Attention

This paper describes two novel complementary techniques that improve the...

0 Daniel Korzekwa, et al. ∙

research

∙ 11/04/2020

Prosodic Representation Learning and Contextual Sampling for Neural Text-to-Speech

In this paper, we introduce Kathaka, a model trained with a novel two-st...

0 Sri Karlapati, et al. ∙

research

∙ 06/07/2020

Parametric Representation for Singing Voice Synthesis: a Comparative Evaluation

Various parametric representations have been proposed to model the speec...

0 Onur Babacan, et al. ∙

research

∙ 06/07/2020

Maximum Phase Modeling for Sparse Linear Prediction of Speech

Linear prediction (LP) is an ubiquitous analysis method in speech proces...

0 Thomas Drugman, et al. ∙

research

∙ 06/07/2020

Analysis and Synthesis of Hypo and Hyperarticulated Speech

This paper focuses on the analysis and synthesis of hypo and hyperarticu...

0 Benjamin Picart, et al. ∙

research

∙ 05/31/2020

Residual Excitation Skewness for Automatic Speech Polarity Detection

Detecting the correct speech polarity is a necessary step prior to sever...

0 Thomas Drugman, et al. ∙

research

∙ 05/31/2020

Maximum Voiced Frequency Estimation: Exploiting Amplitude and Phase Spectra

Maximum Voiced Frequency (MVF) is used in various speech models as the s...

0 Thomas Drugman, et al. ∙

research

∙ 05/31/2020

Data-driven Detection and Analysis of the Patterns of Creaky Voice

This paper investigates the temporal excitation patterns of creaky voice...

0 Thomas Drugman, et al. ∙

research

∙ 05/24/2020

Glottal source estimation robustness: A comparison of sensitivity of voice source estimation techniques

This paper addresses the problem of estimating the voice source directly...

0 Thomas Drugman, et al. ∙

research

∙ 05/16/2020

Oscillating Statistical Moments for Speech Polarity Detection

An inversion of the speech polarity may have a dramatic detrimental effe...

0 Thomas Drugman, et al. ∙

research

∙ 05/16/2020

Glottal Source Estimation using an Automatic Chirp Decomposition

In a previous work, we showed that the glottal source can be estimated f...

0 Thomas Drugman, et al. ∙

research

∙ 05/10/2020

Audio and Contact Microphones for Cough Detection

In the framework of assessing the pathology severity in chronic cough di...

0 Thomas Drugman, et al. ∙

research

∙ 05/10/2020

Chirp Complex Cepstrum-based Decomposition for Asynchronous Glottal Analysis

It was recently shown that complex cepstrum can be effectively used for ...

0 Thomas Drugman, et al. ∙

research

∙ 04/30/2020

CopyCat: Many-to-Many Fine-Grained Prosody Transfer for Neural Text-to-Speech

Prosody Transfer (PT) is a technique that aims to use the prosody from a...

0 Sri Karlapati, et al. ∙

research

∙ 01/02/2020

On the Mutual Information between Source and Filter Contributions for Voice Pathology Detection

This paper addresses the problem of automatic detection of voice patholo...

0 Thomas Drugman, et al. ∙

research

∙ 01/02/2020

Phase-based Information for Voice Pathology Detection

In most current approaches of speech processing, information is extracte...

0 Thomas Drugman, et al. ∙

research

∙ 01/02/2020

Excitation-based Voice Quality Analysis and Modification

This paper investigates the differences occuring in the excitation for d...

0 Thomas Drugman, et al. ∙

research

∙ 01/02/2020

Eigenresiduals for improved Parametric Speech Synthesis

Statistical parametric speech synthesizers have recently shown their abi...

0 Thomas Drugman, et al. ∙

research

∙ 01/02/2020

Assessment of Audio Features for Automatic Cough Detection

This paper addresses the issue of cough detection using only audio recor...

0 Thomas Drugman, et al. ∙

research

∙ 01/02/2020

A Comparative Evaluation of Pitch Modification Techniques

This paper addresses the problem of pitch modification, as an important ...

0 Thomas Drugman, et al. ∙

research

∙ 12/30/2019

Using a Pitch-Synchronous Residual Codebook for Hybrid HMM/Frame Selection Speech Synthesis

This paper proposes a method to improve the quality delivered by statist...

0 Thomas Drugman, et al. ∙

research

∙ 12/30/2019

Objective Study of Sensor Relevance for Automatic Cough Detection

The development of a system for the automatic, objective and reliable de...

0 Thomas Drugman, et al. ∙

research

∙ 12/30/2019

Causal-Anticausal Decomposition of Speech using Complex Cepstrum for Glottal Source Estimation

Complex cepstrum is known in the literature for linearly separating caus...

0 Thomas Drugman, et al. ∙

research

∙ 12/29/2019

A Comparative Study of Pitch Extraction Algorithms on a Large Variety of Singing Sounds

The problem of pitch tracking has been extensively studied in the speech...

0 Onur Babacan, et al. ∙

research

∙ 12/29/2019

Glottal Source Processing: from Analysis to Applications

The great majority of current voice technology applications relies on ac...

0 Thomas Drugman, et al. ∙

research

∙ 12/29/2019

Complex Cepstrum-based Decomposition of Speech for Glottal Source Estimation

Homomorphic analysis is a well-known method for the separation of non-li...

0 Thomas Drugman, et al. ∙

research

∙ 12/29/2019

The Deterministic plus Stochastic Model of the Residual Signal and its Applications

The modeling of speech production often relies on a source-filter approa...

0 Thomas Drugman, et al. ∙

research

∙ 12/29/2019

A Deterministic plus Stochastic Model of the Residual Signal for Improved Parametric Speech Synthesis

Speech generated by parametric synthesizers generally suffers from a typ...

0 Thomas Drugman, et al. ∙

research

∙ 12/28/2019

A Comparative Study of Glottal Source Estimation Techniques

Source-tract decomposition (or glottal flow estimation) is one of the ba...

0 Thomas Drugman, et al. ∙

research

∙ 12/28/2019

Glottal Closure and Opening Instant Detection from Speech Signals

This paper proposes a new procedure to detect Glottal Closure and Openin...

0 Thomas Drugman, et al. ∙

research

∙ 12/28/2019

Detection of Glottal Closure Instants from Speech Signals: a Quantitative Review

The pseudo-periodicity of voiced speech can be exploited in several spee...

0 Thomas Drugman, et al. ∙

research

∙ 12/28/2019

Joint Robust Voicing Detection and Pitch Estimation Based on Residual Harmonics

This paper focuses on the problem of pitch tracking in noisy conditions....

0 Thomas Drugman, et al. ∙

research

∙ 12/12/2019

Singing Synthesis: with a little help from my attention

We present a novel system for singing synthesis, based on attention. Sta...

0 Orazio Angelini, et al. ∙

research

∙ 12/11/2019

Voice Conversion for Whispered Speech Synthesis

We present an approach to synthesize whisper by applying a handcrafted s...

0 Marius Cotescu, et al. ∙

research

∙ 12/02/2019

Dynamic Prosody Generation for Speech Synthesis using Linguistics-Driven Acoustic Embedding Selection

Recent advances in Text-to-Speech (TTS) have improved quality and natura...

0 Shubhi Tyagi, et al. ∙

research

∙ 07/10/2019

Interpretable Deep Learning Model for the Detection and Reconstruction of Dysarthric Speech

This paper proposed a novel approach for the detection and reconstructio...

0 Daniel Korzekwa, et al. ∙

Thomas Drugman

Featured Co-authors

Sign in with Google

Consider DeepAI Pro