SpatialCodec: Neural Spatial Speech Coding

09/14/2023
by   Zhongweiyang Xu, et al.
0

In this work, we address the challenge of encoding speech captured by a microphone array using deep learning techniques with the aim of preserving and accurately reconstructing crucial spatial cues embedded in multi-channel recordings. We propose a neural spatial audio coding framework that achieves a high compression ratio, leveraging single-channel neural sub-band codec and SpatialCodec. Our approach encompasses two phases: (i) a neural sub-band codec is designed to encode the reference channel with low bit rates, and (ii), a SpatialCodec captures relative spatial information for accurate multi-channel reconstruction at the decoder end. In addition, we also propose novel evaluation metrics to assess the spatial cue preservation: (i) spatial similarity, which calculates cosine similarity on a spatially intuitive beamspace, and (ii), beamformed audio quality. Our system shows superior spatial performance compared with high bitrate baselines and black-box neural architecture. Demos are available at https://xzwy.github.io/SpatialCodecDemo. Codes and models are available at https://github.com/XZWY/SpatialCodec.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/14/2023

M3-AUDIODEC: Multi-channel multi-speaker multi-spatial audio codec

We introduce M3-AUDIODEC, an innovative neural spatial audio codec desig...
research
06/13/2023

Evaluation of Spatial Distortion in Multichannel Audio

Despite the recent proliferation of spatial audio technologies, the eval...
research
07/28/2023

MLIC++: Linear Complexity Multi-Reference Entropy Modeling for Learned Image Compression

Recently, multi-reference entropy model has been proposed, which capture...
research
11/10/2020

Enhancing Low-Quality Voice Recordings Using Disentangled Channel Factor and Neural Waveform Model

High-quality speech corpora are essential foundations for most speech ap...
research
03/14/2023

Native Multi-Band Audio Coding within Hyper-Autoencoded Reconstruction Propagation Networks

Spectral sub-bands do not portray the same perceptual relevance. In audi...
research
03/23/2023

LMCodec: A Low Bitrate Speech Codec With Causal Transformer Models

We introduce LMCodec, a causal neural speech codec that provides high qu...
research
04/03/2021

Diarization of Legal Proceedings. Identifying and Transcribing Judicial Speech from Recorded Court Audio

United States Courts make audio recordings of oral arguments available a...

Please sign up or login with your details

Forgot password? Click here to reset