Improved Lite Audio-Visual Speech Enhancement

08/30/2020
by   Shang-Yi Chuang, et al.
0

Numerous studies have investigated the effectiveness of audio-visual multimodal learning for speech enhancement (AVSE) tasks, seeking a solution that uses visual data as auxiliary and complementary input to reduce the noise of noisy speech signals. Recently, we proposed a lite audio-visual speech enhancement (LAVSE) algorithm. Compared to conventional AVSE systems, LAVSE requires less online computation and moderately solves the user privacy problem on facial data. In this study, we extend LAVSE to improve its ability to address three practical issues often encountered in implementing AVSE systems, namely, the requirement for additional visual data, audio-visual asynchronization, and low-quality visual data. The proposed system is termed improved LAVSE (iLAVSE), which uses a convolutional recurrent neural network architecture as the core AVSE model. We evaluate iLAVSE on the Taiwan Mandarin speech with video dataset. Experimental results confirm that compared to conventional AVSE systems, iLAVSE can effectively overcome the aforementioned three practical issues and can improve enhancement performance. The results also confirm that iLAVSE is suitable for real-world scenarios, where high-quality audio-visual sensors may not always be available.

READ FULL TEXT

page 11

page 14

page 19

page 21

page 25

research
05/24/2020

Lite Audio-Visual Speech Enhancement

Previous studies have confirmed the effectiveness of incorporating visua...
research
06/16/2022

EPG2S: Speech Generation and Speech Enhancement based on Electropalatography and Audio Signals using Multimodal Learning

Speech generation and enhancement based on articulatory movements facili...
research
11/15/2018

Effects of Lombard Reflex on the Performance of Deep-Learning-Based Audio-Visual Speech Enhancement Systems

Humans tend to change their way of speaking when they are immersed in a ...
research
09/04/2020

SEANet: A Multi-modal Speech Enhancement Network

We explore the possibility of leveraging accelerometer data to perform s...
research
06/27/2022

ClearBuds: Wireless Binaural Earbuds for Learning-Based Speech Enhancement

We present ClearBuds, the first hardware and software system that utiliz...
research
12/21/2022

ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Enhancement

Prior works on improving speech quality with visual input typically stud...
research
09/13/2018

Real-Time Lightweight Chaotic Encryption for 5G IoT Enabled Lip-Reading Driven Secure Hearing-Aid

Existing audio-only hearing-aids are known to perform poorly in noisy si...

Please sign up or login with your details

Forgot password? Click here to reset