SEANet: A Multi-modal Speech Enhancement Network

09/04/2020
by   Marco Tagliasacchi, et al.
0

We explore the possibility of leveraging accelerometer data to perform speech enhancement in very noisy conditions. Although it is possible to only partially reconstruct user's speech from the accelerometer, the latter provides a strong conditioning signal that is not influenced from noise sources in the environment. Based on this observation, we feed a multi-modal input to SEANet (Sound EnhAncement Network), a wave-to-wave fully convolutional model, which adopts a combination of feature losses and adversarial losses to reconstruct an enhanced version of user's speech. We trained our model with data collected by sensors mounted on an earbud and synthetically corrupted by adding different kinds of noise sources to the audio signal. Our experimental results demonstrate that it is possible to achieve very high quality results, even in the case of interfering speech at the same level of loudness. A sample of the output produced by our model is available at https://google-research.github.io/seanet/multimodal/speech.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/16/2022

EPG2S: Speech Generation and Speech Enhancement based on Electropalatography and Audio Signals using Multimodal Learning

Speech generation and enhancement based on articulatory movements facili...
research
10/21/2020

Real-time Speech Frequency Bandwidth Extension

In this paper we propose a lightweight model for frequency bandwidth ext...
research
08/30/2020

Improved Lite Audio-Visual Speech Enhancement

Numerous studies have investigated the effectiveness of audio-visual mul...
research
02/19/2021

Speech enhancement with weakly labelled data from AudioSet

Speech enhancement is a task to improve the intelligibility and perceptu...
research
11/22/2019

Time-Domain Multi-modal Bone/air Conducted Speech Enhancement

Integrating modalities, such as video signals with speech, has been show...
research
09/30/2019

AV Speech Enhancement Challenge using a Real Noisy Corpus

This paper presents, a first of its kind, audio-visual (AV) speech enhac...
research
07/20/2018

A Fully Convolutional Neural Network Approach to End-to-End Speech Enhancement

This paper will describe a novel approach to the cocktail party problem ...

Please sign up or login with your details

Forgot password? Click here to reset