Berrak Sisman

research

∙ 06/29/2023

High-Quality Automatic Voice Over with Accurate Alignment: Supervision through Self-Supervised Discrete Speech Units

The goal of Automatic Voice Over (AVO) is to generate speech in sync wit...

0 Junchen Lu, et al. ∙

research

∙ 06/01/2023

SlothSpeech: Denial-of-service Attack Against Speech Recognition Models

Deep Learning (DL) models have been popular nowadays to execute differen...

0 Mirazul Haque, et al. ∙

research

∙ 05/23/2023

Improving Speech Emotion Recognition Performance using Differentiable Architecture Search

Speech Emotion Recognition (SER) is a critical enabler of emotion-aware ...

0 Thejan Rajapakshe, et al. ∙

research

∙ 05/12/2023

Versatile Audio-Visual Learning for Handling Single and Multi Modalities in Emotion Regression and Classification Tasks

Most current audio-visual emotion recognition models lack the flexibilit...

0 Lucas Goncalves, et al. ∙

research

∙ 11/14/2022

SNIPER Training: Variable Sparsity Rate Training For Text-To-Speech

Text-to-speech (TTS) models have achieved remarkable naturalness in rece...

0 Perry Lam, et al. ∙

research

∙ 11/07/2022

Accented Text-to-Speech Synthesis with a Conditional Variational Autoencoder

Accent plays a significant role in speech communication, influencing und...

0 Jan Melechovsky, et al. ∙

research

∙ 10/25/2022

Mixed Emotion Modelling for Emotional Voice Conversion

Emotional voice conversion (EVC) aims to convert the emotional state of ...

0 Kun Zhou, et al. ∙

research

∙ 09/22/2022

EPIC TTS Models: Empirical Pruning Investigations Characterizing Text-To-Speech Models

Neural models are known to be over-parameterized, and recent work has sh...

0 Perry Lam, et al. ∙

research

∙ 09/22/2022

Controllable Accented Text-to-Speech Synthesis

Accented text-to-speech (TTS) synthesis seeks to generate speech with an...

0 Rui Liu, et al. ∙

research

∙ 08/11/2022

Speech Synthesis with Mixed Emotions

Emotional speech synthesis aims to synthesize human voices with various ...

0 Kun Zhou, et al. ∙

research

∙ 06/15/2022

Accurate Emotion Strength Assessment for Seen and Unseen Speech Based on Data-Driven Deep Learning

Emotion classification of speech and assessment of the emotion strength ...

0 Rui Liu, et al. ∙

research

∙ 01/10/2022

Emotion Intensity and its Control for Emotional Voice Conversion

Emotional voice conversion (EVC) seeks to convert the emotional state of...

6 Kun Zhou, et al. ∙

research

∙ 10/20/2021

Identity Conversion for Emotional Speakers: A Study for Disentanglement of Emotion Style and Speaker Identity

Expressive voice conversion performs identity conversion for emotional s...

0 Zongyang Du, et al. ∙

research

∙ 10/13/2021

DeepA: A Deep Neural Analyzer For Speech And Singing Vocoding

Conventional vocoders are commonly used as analysis tools to provide int...

0 Sergey Nikonorov, et al. ∙

research

∙ 10/07/2021

VisualTTS: TTS with Accurate Lip-Speech Synchronization for Automatic Voice Over

In this paper, we formulate a novel task to synthesize speech in sync wi...

0 Junchen Lu, et al. ∙

research

∙ 10/07/2021

StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis

Recently, emotional speech synthesis has achieved remarkable performance...

0 Rui Liu, et al. ∙

research

∙ 07/08/2021

Expressive Voice Conversion: A Joint Framework for Speaker Identity and Emotional Style Transfer

Traditional voice conversion(VC) has been focused on speaker identity co...

0 Zongyang Du, et al. ∙

research

∙ 05/31/2021

Emotional Voice Conversion: Theory, Databases and ESD

In this paper, we first provide a review of the state-of-the-art emotion...

0 Kun Zhou, et al. ∙

research

∙ 04/03/2021

Reinforcement Learning for Emotional Text-to-Speech Synthesis with Improved Emotion Discriminability

Emotional text-to-speech synthesis (ETTS) has seen much progress in rece...

0 Rui Liu, et al. ∙

research

∙ 03/31/2021

Limited Data Emotional Voice Conversion Leveraging Text-to-Speech: Two-stage Sequence-to-Sequence Training

Emotional voice conversion (EVC) aims to change the emotional state of a...

0 Kun Zhou, et al. ∙

research

∙ 11/03/2020

VAW-GAN for Disentanglement and Recomposition of Emotional Elements in Speech

Emotional voice conversion (EVC) aims to convert the emotion of speech f...

0 Kun Zhou, et al. ∙

research

∙ 10/28/2020

Seen and Unseen emotional style transfer for voice conversion with a new emotional speech dataset

Emotional voice conversion aims to transform emotional prosody in speech...

0 Kun Zhou, et al. ∙

research

∙ 10/23/2020

GraphSpeech: Syntax-Aware Graph Attention Network For Neural Speech Synthesis

Attention-based end-to-end text-to-speech synthesis (TTS) is superior to...

0 Rui Liu, et al. ∙

research

∙ 08/11/2020

Modeling Prosodic Phrasing with Multi-Task Learning in Tacotron-based TTS

Tacotron-based end-to-end speech synthesis has shown remarkable voice qu...

0 Rui Liu, et al. ∙

research

∙ 08/11/2020

Spectrum and Prosody Conversion for Cross-lingual Voice Conversion with CycleGAN

Cross-lingual voice conversion aims to change source speaker's voice to ...

0 Zongyang Du, et al. ∙

research

∙ 08/10/2020

VAW-GAN for Singing Voice Conversion with Non-parallel Training Data

Singing voice conversion aims to convert singer's voice from source to t...

0 Junchen Lu, et al. ∙

research

∙ 08/09/2020

An Overview of Voice Conversion and its Challenges: From Statistical Modeling to Deep Learning

Speaker identity is one of the important characteristics of human speech...

0 Berrak Sisman, et al. ∙

research

∙ 08/04/2020

Expressive TTS Training with Frame and Style Reconstruction Loss

We propose a novel training strategy for Tacotron-based text-to-speech (...

0 Rui Liu, et al. ∙

research

∙ 05/13/2020

Converting Anyone's Emotion: Towards Speaker-Independent Emotional Voice Conversion

Emotional voice conversion aims to convert the emotion of the speech fro...

29 Kun Zhou, et al. ∙

research

∙ 02/02/2020

WaveTTS: Tacotron-based TTS with Joint Time-Frequency Domain Loss

Tacotron-based text-to-speech (TTS) systems directly synthesize speech f...

2 Rui Liu, et al. ∙

research

∙ 02/01/2020

Transforming Spectrum and Prosody for Emotional Voice Conversion with Non-Parallel Training Data

Emotional voice conversion is to convert the spectrum and prosody to cha...

0 Kun Zhou, et al. ∙

research

∙ 11/07/2019

Teacher-Student Training for Robust Tacotron-based TTS

While neural end-to-end text-to-speech (TTS) is superior to conventional...

0 Rui Liu, et al. ∙

research

∙ 05/27/2019

VQVAE Unsupervised Unit Discovery and Multi-scale Code2Spec Inverter for Zerospeech Challenge 2019

We describe our submitted system for the ZeroSpeech Challenge 2019. The ...

0 Andros Tjandra, et al. ∙

Berrak Sisman

Featured Co-authors

Sign in with Google

Consider DeepAI Pro