A Purely End-to-end System for Multi-speaker Speech Recognition

05/15/2018
by   Hiroshi Seki, et al.
0

Recently, there has been growing interest in multi-speaker speech recognition, where the utterances of multiple speakers are recognized from their mixture. Promising techniques have been proposed for this task, but earlier works have required additional training data such as isolated source signals or senone alignments for effective learning. In this paper, we propose a new sequence-to-sequence framework to directly decode multiple label sequences from a single speech sequence by unifying source separation and speech recognition functions in an end-to-end manner. We further propose a new objective function to improve the contrast between the hidden vectors to avoid generating similar hypotheses. Experimental results show that the model is directly able to learn a mapping from a speech mixture to multiple label sequences, achieving 83.1 without the proposed objective. Interestingly, the results are comparable to those produced by previous end-to-end works featuring explicit separation and recognition modules.

READ FULL TEXT
research
11/05/2018

End-to-End Monaural Multi-speaker ASR System without Pretraining

Recently, end-to-end models have become a popular approach as an alterna...
research
10/15/2019

MIMO-SPEECH: End-to-End Multi-Channel Multi-Speaker Speech Recognition

Recently, the end-to-end approach has proven its efficacy in monaural mu...
research
03/01/2022

Extended Graph Temporal Classification for Multi-Speaker End-to-End ASR

Graph-based temporal classification (GTC), a generalized form of the con...
research
06/21/2023

Mixture Encoder for Joint Speech Separation and Recognition

Multi-speaker automatic speech recognition (ASR) is crucial for many rea...
research
06/25/2020

Sequence to Multi-Sequence Learning via Conditional Chain Mapping for Mixture Signals

Neural sequence-to-sequence models are well established for applications...
research
10/28/2022

Dysfluencies Seldom Come Alone – Detection as a Multi-Label Problem

Specially adapted speech recognition models are necessary to handle stut...
research
11/25/2020

SAR-Net: A End-to-End Deep Speech Accent Recognition Network

This paper proposes a end-to-end deep network to recognize kinds of acce...

Please sign up or login with your details

Forgot password? Click here to reset