EPG2S: Speech Generation and Speech Enhancement based on Electropalatography and Audio Signals using Multimodal Learning

06/16/2022
by   Li-Chin Chen, et al.
0

Speech generation and enhancement based on articulatory movements facilitate communication when the scope of verbal communication is absent, e.g., in patients who have lost the ability to speak. Although various techniques have been proposed to this end, electropalatography (EPG), which is a monitoring technique that records contact between the tongue and hard palate during speech, has not been adequately explored. Herein, we propose a novel multimodal EPG-to-speech (EPG2S) system that utilizes EPG and speech signals for speech generation and enhancement. Different fusion strategies based on multiple combinations of EPG and noisy speech signals are examined, and the viability of the proposed method is investigated. Experimental results indicate that EPG2S achieves desirable speech generation outcomes based solely on EPG signals. Further, the addition of noisy speech signals is observed to improve quality and intelligibility. Additionally, EPG2S is observed to achieve high-quality speech enhancement based solely on audio signals, with the addition of EPG signals further improving the performance. The late fusion strategy is deemed to be the most effective approach for simultaneous speech generation and enhancement.

READ FULL TEXT
research
08/30/2020

Improved Lite Audio-Visual Speech Enhancement

Numerous studies have investigated the effectiveness of audio-visual mul...
research
09/04/2020

SEANet: A Multi-modal Speech Enhancement Network

We explore the possibility of leveraging accelerometer data to perform s...
research
10/07/2022

Model-based estimation of in-car-communication feedback applied to speech zone detection

Modern cars provide versatile tools to enhance speech communication. Whi...
research
09/07/2022

Multimodal Speech Enhancement Using Burst Propagation

This paper proposes the MBURST, a novel multimodal solution for audio-vi...
research
08/10/2018

A simplified convolutional sparse filter for impulsive signature enhancement and its application to the prognostic of rotating machinery

Impulsive signature enhancement (ISE) is an important topic in the monit...
research
11/22/2019

Time-Domain Multi-modal Bone/air Conducted Speech Enhancement

Integrating modalities, such as video signals with speech, has been show...
research
02/07/2021

EMA2S: An End-to-End Multimodal Articulatory-to-Speech System

Synthesized speech from articulatory movements can have real-world use f...

Please sign up or login with your details

Forgot password? Click here to reset