Audio-Visual Mandarin Electrolaryngeal Speech Voice Conversion

06/11/2023
by   Yung-Lun Chien, et al.
0

Electrolarynx is a commonly used assistive device to help patients with removed vocal cords regain their ability to speak. Although the electrolarynx can generate excitation signals like the vocal cords, the naturalness and intelligibility of electrolaryngeal (EL) speech are very different from those of natural (NL) speech. Many deep-learning-based models have been applied to electrolaryngeal speech voice conversion (ELVC) for converting EL speech to NL speech. In this study, we propose a multimodal voice conversion (VC) model that integrates acoustic and visual information into a unified network. We compared different pre-trained models as visual feature extractors and evaluated the effectiveness of these features in the ELVC task. The experimental results demonstrate that the proposed multimodal VC model outperforms single-modal models in both objective and subjective metrics, suggesting that the integration of visual information can significantly improve the quality of ELVC.

READ FULL TEXT
research
10/29/2020

The IQIYI System for Voice Conversion Challenge 2020

This paper presents the IQIYI voice conversion system (T24) for Voice Co...
research
06/11/2023

Mandarin Electrolaryngeal Speech Voice Conversion using Cross-domain Features

Patients who have had their entire larynx removed, including the vocal f...
research
03/18/2022

Improve few-shot voice cloning using multi-modal learning

Recently, few-shot voice cloning has achieved a significant improvement....
research
04/23/2021

Deep Learning Based Assessment of Synthetic Speech Naturalness

In this paper, we present a new objective prediction model for synthetic...
research
09/08/2021

Time Alignment using Lip Images for Frame-based Electrolaryngeal Voice Conversion

Voice conversion (VC) is an effective approach to electrolaryngeal (EL) ...
research
06/28/2023

Fake the Real: Backdoor Attack on Deep Speech Classification via Voice Conversion

Deep speech classification has achieved tremendous success and greatly p...
research
02/07/2021

EMA2S: An End-to-End Multimodal Articulatory-to-Speech System

Synthesized speech from articulatory movements can have real-world use f...

Please sign up or login with your details

Forgot password? Click here to reset