M3-AUDIODEC: Multi-channel multi-speaker multi-spatial audio codec

09/14/2023
by   Anton Ratnarajah, et al.
0

We introduce M3-AUDIODEC, an innovative neural spatial audio codec designed for efficient compression of multi-channel (binaural) speech in both single and multi-speaker scenarios, while retaining the spatial location information of each speaker. This model boasts versatility, allowing configuration and training tailored to a predetermined set of multi-channel, multi-speaker, and multi-spatial overlapping speech conditions. Key contributions are as follows: 1) Previous neural codecs are extended from single to multi-channel audios. 2) The ability of our proposed model to compress and decode for overlapping speech. 3) A groundbreaking architecture that compresses speech content and spatial cues separately, ensuring the preservation of each speaker's spatial context after decoding. 4) M3-AUDIODEC's proficiency in reducing the bandwidth for compressing two-channel speech by 48 channel compression. Impressively, at a 12.6 kbps operation, it outperforms Opus at 24 kbps and AUDIODEC at 24 kbps by 37 assessment, we employed speech enhancement and room acoustic metrics to ascertain the accuracy of clean speech and spatial cue estimates from M3-AUDIODEC. Audio demonstrations and source code are available online https://github.com/anton-jeran/MULTI-AUDIODEC .

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/17/2023

BASEN: Time-Domain Brain-Assisted Speech Enhancement Network with Convolutional Cross Attention in Multi-talker Conditions

Time-domain single-channel speech enhancement (SE) still remains challen...
research
09/14/2023

SpatialCodec: Neural Spatial Speech Coding

In this work, we address the challenge of encoding speech captured by a ...
research
03/31/2022

Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis

Since facial actions such as lip movements contain significant informati...
research
03/28/2022

Analyzing Language-Independent Speaker Anonymization Framework under Unseen Conditions

In our previous work, we proposed a language-independent speaker anonymi...
research
09/09/2021

BeamTransformer: Microphone Array-based Overlapping Speech Detection

We propose BeamTransformer, an efficient architecture to leverage beamfo...
research
09/24/2022

Spatial-aware Speaker Diarization for Multi-channel Multi-party Meeting

This paper describes a spatial-aware speaker diarization system for the ...
research
11/11/2021

MultiSV: Dataset for Far-Field Multi-Channel Speaker Verification

Motivated by unconsolidated data situation and the lack of a standard be...

Please sign up or login with your details

Forgot password? Click here to reset