An Audio-Visual Speech Separation Model Inspired by Cortico-Thalamo-Cortical Circuits

12/21/2022
by   Kai Li, et al.
0

Audio-visual approaches involving visual inputs have laid the foundation for recent progress in speech separation. However, the optimization of the concurrent usage of auditory and visual inputs is still an active research area. Inspired by the cortico-thalamo-cortical circuit, in which the sensory processing mechanisms of different modalities modulate one another via the non-lemniscal sensory thalamus, we propose a novel cortico-thalamo-cortical neural network (CTCNet) for audio-visual speech separation (AVSS). First, the CTCNet learns hierarchical auditory and visual representations in a bottom-up manner in separate auditory and visual subnetworks, mimicking the functions of the auditory and visual cortical areas. Then, inspired by the large number of connections between cortical regions and the thalamus, the model fuses the auditory and visual information in a thalamic subnetwork through top-down connections. Finally, the model transmits this fused information back to the auditory and visual subnetworks, and the above process is repeated several times. The results of experiments on three speech separation benchmark datasets show that CTCNet remarkably outperforms existing AVSS methods with considerablely fewer parameters. These results suggest that mimicking the anatomical connectome of the mammalian brain has great potential for advancing the development of deep neural networks. Project repo is https://github.com/JusperLee/CTCNet.

READ FULL TEXT

page 2

page 4

page 5

page 9

page 14

research
11/29/2020

Audio-visual Speech Separation with Adversarially Disentangled Visual Representation

Speech separation aims to separate individual voice from an audio mixtur...
research
07/21/2020

SLNSpeech: solving extended speech separation problem by the help of sign language

A speech separation task can be roughly divided into audio-only separati...
research
01/08/2021

VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency

We introduce a new approach for audio-visual speech separation. Given a ...
research
12/04/2021

Speech Separation Using an Asynchronous Fully Recurrent Convolutional Neural Network

Recent advances in the design of neural network architectures, in partic...
research
05/31/2023

Audio-Visual Speech Separation in Noisy Environments with a Lightweight Iterative Model

We propose Audio-Visual Lightweight ITerative model (AVLIT), an effectiv...
research
10/12/2021

The Mirrornet : Learning Audio Synthesizer Controls Inspired by Sensorimotor Interaction

Experiments to understand the sensorimotor neural interactions in the hu...
research
02/07/2023

Network-based Statistics Distinguish Anomic and Broca Aphasia

Aphasia is a speech-language impairment commonly caused by damage to the...

Please sign up or login with your details

Forgot password? Click here to reset