Guided Source Separation Meets a Strong ASR Backend: Hitachi/Paderborn University Joint Investigation for Dinner Party ASR

05/29/2019
by   Naoyuki Kanda, et al.
0

In this paper, we present Hitachi and Paderborn University's joint effort for automatic speech recognition (ASR) in a dinner party scenario. The main challenges of ASR systems for dinner party recordings obtained by multiple microphone arrays are (1) heavy speech overlaps, (2) severe noise and reverberation, (3) very natural conversational content, and possibly (4) insufficient training data. As an example of a dinner party scenario, we have chosen the data presented during the CHiME-5 speech recognition challenge, where the baseline ASR had a 73.3 performing system at the CHiME-5 challenge had a 46.1 investigated a combination of the guided source separation-based speech enhancement technique and an already proposed strong ASR backend and found that a tight combination of these techniques provided substantial accuracy improvements. Our final system achieved WERs of 39.94 development and evaluation data, respectively, both of which are the best published results for the dataset. We also investigated with additional training data on the official small data in the CHiME-5 corpus to assess the intrinsic difficulty of this ASR task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/28/2018

The fifth 'CHiME' Speech Separation and Recognition Challenge: Dataset, task and baselines

The CHiME challenge series aims to advance robust automatic speech recog...
research
06/23/2023

The CHiME-7 DASR Challenge: Distant Meeting Transcription with Multiple Devices in Diverse Scenarios

The CHiME challenges have played a significant role in the development a...
research
08/22/2023

Convoifilter: A case study of doing cocktail party speech recognition

This paper presents an end-to-end model designed to improve automatic sp...
research
09/26/2019

An Investigation into the Effectiveness of Enhancement in ASR Training and Test for CHiME-5 Dinner Party Transcription

Despite the strong modeling power of neural network acoustic models, spe...
research
06/14/2020

The JHU Multi-Microphone Multi-Speaker ASR System for the CHiME-6 Challenge

This paper summarizes the JHU team's efforts in tracks 1 and 2 of the CH...
research
12/12/2019

On Neural Phone Recognition of Mixed-Source ECoG Signals

The emerging field of neural speech recognition (NSR) using electrocorti...
research
10/22/2018

Investigation of Independent Monaural Front-End Processing for Robust ASR without Retraining and Joint-Training

In recent years, monaural speech separation has been formulated as a sup...

Please sign up or login with your details

Forgot password? Click here to reset