Disentangling speech from surroundings in a neural audio codec

03/29/2022
by   Ahmed Omran, et al.
11

We present a method to separate speech signals from noisy environments in the compressed domain of a neural audio codec. We introduce a new training procedure that allows our model to produce structured encodings of audio waveforms given by embedding vectors, where one part of the embedding vector represents the speech signal, and the rest represents the environment. We achieve this by partitioning the embeddings of different input waveforms and training the model to faithfully reconstruct audio from mixed partitions, thereby ensuring each partition encodes a separate audio attribute. As use cases, we demonstrate the separation of speech from background noise or from reverberation characteristics. Our method also allows for targeted adjustments of the audio output characteristics.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/02/2021

Audio-Visual Speech Separation Using Cross-Modal Correspondence Loss

We present an audio-visual speech separation learning method that consid...
research
04/10/2018

Looking to Listen at the Cocktail Party: A Speaker-Independent Audio-Visual Model for Speech Separation

We present a joint audio-visual model for isolating a single speech sign...
research
10/08/2021

Environment Aware Text-to-Speech Synthesis

This study aims at designing an environment-aware text-to-speech (TTS) s...
research
02/19/2021

TransMask: A Compact and Fast Speech Separation Model Based on Transformer

Speech separation is an important problem in speech processing, which ta...
research
05/31/2023

Audio-Visual Speech Separation in Noisy Environments with a Lightweight Iterative Model

We propose Audio-Visual Lightweight ITerative model (AVLIT), an effectiv...
research
04/01/2020

Can Machine Learning Be Used to Recognize and Diagnose Coughs?

5G is bringing new use cases to the forefront, one of the most prominent...
research
10/19/2021

Temporal separation of whale vocalizations from background oceanic noise using a power calculation

The process of analyzing audio signals in search of cetacean vocalizatio...

Please sign up or login with your details

Forgot password? Click here to reset