Mixture Encoder for Joint Speech Separation and Recognition

06/21/2023
by   Simon Berger, et al.
0

Multi-speaker automatic speech recognition (ASR) is crucial for many real-world applications, but it requires dedicated modeling techniques. Existing approaches can be divided into modular and end-to-end methods. Modular approaches separate speakers and recognize each of them with a single-speaker ASR system. End-to-end models process overlapped speech directly in a single, powerful neural network. This work proposes a middle-ground approach that leverages explicit speech separation similarly to the modular approach but also incorporates mixture speech information directly into the ASR module in order to mitigate the propagation of errors made by the speech separator. We also explore a way to exchange cross-speaker context information through a layer that combines information of the individual speakers. Our system is optimized through separate and joint training stages and achieves a relative improvement of 7

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/04/2023

End-to-End Joint Target and Non-Target Speakers ASR

This paper proposes a novel automatic speech recognition (ASR) system th...
research
09/15/2023

Mixture Encoder Supporting Continuous Speech Separation for Meeting Recognition

Many real-life applications of automatic speech recognition (ASR) requir...
research
11/22/2021

Multi-Channel Multi-Speaker ASR Using 3D Spatial Feature

Automatic speech recognition (ASR) of multi-channel multi-speaker overla...
research
06/20/2022

An Empirical Analysis on the Vulnerabilities of End-to-End Speech Segregation Models

End-to-end learning models have demonstrated a remarkable capability in ...
research
04/01/2022

End-to-End Multi-speaker ASR with Independent Vector Analysis

We develop an end-to-end system for multi-channel, multi-speaker automat...
research
11/23/2020

Streaming Multi-speaker ASR with RNN-T

Recent research shows end-to-end ASR systems can recognize overlapped sp...
research
05/15/2018

A Purely End-to-end System for Multi-speaker Speech Recognition

Recently, there has been growing interest in multi-speaker speech recogn...

Please sign up or login with your details

Forgot password? Click here to reset