Improving Voice Separation by Incorporating End-to-end Speech Recognition

11/29/2019
by   Naoya Takahashi, et al.
0

Despite recent advances in voice separation methods, many challenges remain in realistic scenarios such as noisy recording and the limits of available data. In this work, we propose to explicitly incorporate the phonetic and linguistic nature of speech by taking a transfer learning approach using an end-to-end automatic speech recognition (E2EASR) system. The voice separation is conditioned on deep features extracted from E2EASR to cover the long-term dependence of phonetic aspects. Experimental results on speech separation and enhancement task on the AVSpeech dataset show that the proposed method significantly improves the signal-to-distortion ratio over the baseline model and even outperforms an audio visual model, that utilizes visual information of lip movements.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/13/2019

End-to-End Multi-Speaker Speech Recognition using Speaker Embeddings and Transfer Learning

This paper presents our latest investigation on end-to-end automatic spe...
research
11/27/2018

Improved Speech Enhancement with the Wave-U-Net

We study the use of the Wave-U-Net architecture for speech enhancement, ...
research
04/05/2022

VocaLiST: An Audio-Visual Synchronisation Model for Lips and Voices

In this paper, we address the problem of lip-voice synchronisation in vi...
research
03/21/2023

End-to-End Integration of Speech Separation and Voice Activity Detection for Low-Latency Diarization of Telephone Conversations

Recent works show that speech separation guided diarization (SSGD) is an...
research
06/02/2023

Improved DeepFake Detection Using Whisper Features

With a recent influx of voice generation methods, the threat introduced ...
research
07/07/2016

Single-Channel Multi-Speaker Separation using Deep Clustering

Deep clustering is a recently introduced deep learning architecture that...
research
11/03/2019

Onssen: an open-source speech separation and enhancement library

Speech separation is an essential task for multi-talker speech recogniti...

Please sign up or login with your details

Forgot password? Click here to reset