Audio-Visual Speech Separation in Noisy Environments with a Lightweight Iterative Model

05/31/2023
by   Héctor Martel, et al.
0

We propose Audio-Visual Lightweight ITerative model (AVLIT), an effective and lightweight neural network that uses Progressive Learning (PL) to perform audio-visual speech separation in noisy environments. To this end, we adopt the Asynchronous Fully Recurrent Convolutional Neural Network (A-FRCNN), which has shown successful results in audio-only speech separation. Our architecture consists of an audio branch and a video branch, with iterative A-FRCNN blocks sharing weights for each modality. We evaluated our model in a controlled environment using the NTCD-TIMIT dataset and in-the-wild using a synthetic dataset that combines LRS3 and WHAM!. The experiments demonstrate the superiority of our model in both settings with respect to various audio-only and audio-visual baselines. Furthermore, the reduced footprint of our model makes it suitable for low resource applications.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/07/2019

Time Domain Audio Visual Speech Separation

Audio-visual multi-modal modeling has been demonstrated to be effective ...
research
03/07/2023

A Multi-Stage Triple-Path Method for Speech Separation in Noisy and Reverberant Environments

In noisy and reverberant environments, the performance of deep learning-...
research
07/21/2020

SLNSpeech: solving extended speech separation problem by the help of sign language

A speech separation task can be roughly divided into audio-only separati...
research
04/05/2022

Audio-visual multi-channel speech separation, dereverberation and recognition

Despite the rapid advance of automatic speech recognition (ASR) technolo...
research
12/04/2021

Speech Separation Using an Asynchronous Fully Recurrent Convolutional Neural Network

Recent advances in the design of neural network architectures, in partic...
research
12/21/2022

An Audio-Visual Speech Separation Model Inspired by Cortico-Thalamo-Cortical Circuits

Audio-visual approaches involving visual inputs have laid the foundation...
research
03/29/2022

Disentangling speech from surroundings in a neural audio codec

We present a method to separate speech signals from noisy environments i...

Please sign up or login with your details

Forgot password? Click here to reset