BCN2BRNO: ASR System Fusion for Albayzin 2020 Speech to Text Challenge

01/29/2021
by   Martin Kocour, et al.
0

This paper describes joint effort of BUT and Telefónica Research on development of Automatic Speech Recognition systems for Albayzin 2020 Challenge. We compare approaches based on either hybrid or end-to-end models. In hybrid modelling, we explore the impact of SpecAugment layer on performance. For end-to-end modelling, we used a convolutional neural network with gated linear units (GLUs). The performance of such model is also evaluated with an additional n-gram language model to improve word error rates. We further inspect source separation methods to extract speech from noisy environment (i.e. TV shows). More precisely, we assess the effect of using a neural-based music separator named Demucs. A fusion of our best systems achieved 23.33 in official Albayzin 2020 evaluations. Aside from techniques used in our final submitted systems, we also describe our efforts in retrieving high quality transcripts for training.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/14/2022

LeVoice ASR Systems for the ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge

This paper describes LeVoice automatic speech recognition systems to tra...
research
04/22/2020

Towards a Competitive End-to-End Speech Recognition for CHiME-6 Dinner Party Transcription

While end-to-end ASR systems have proven competitive with the convention...
research
09/19/2019

A Comparison of Hybrid and End-to-End Models for Syllable Recognition

This paper presents a comparison of a traditional hybrid speech recognit...
research
09/21/2015

Noise Robust IOA/CAS Speech Separation and Recognition System For The Third 'CHIME' Challenge

This paper presents the contribution to the third 'CHiME' speech separat...
research
11/03/2022

Probing Statistical Representations For End-To-End ASR

End-to-End automatic speech recognition (ASR) models aim to learn a gene...
research
02/02/2021

The Hitachi-JHU DIHARD III System: Competitive End-to-End Neural Diarization and X-Vector Clustering Systems Combined by DOVER-Lap

This paper provides a detailed description of the Hitachi-JHU system tha...
research
09/22/2020

End-to-End Learning of Speech 2D Feature-Trajectory for Prosthetic Hands

Speech is one of the most common forms of communication in humans. Speec...

Please sign up or login with your details

Forgot password? Click here to reset