A Comparison of Hybrid and End-to-End Models for Syllable Recognition

09/19/2019
by   Sebastian P. Bayerl, et al.
0

This paper presents a comparison of a traditional hybrid speech recognition system (kaldi using WFST and TDNN with lattice-free MMI) and a lexicon-free end-to-end (TensorFlow implementation of multi-layer LSTM with CTC training) models for German syllable recognition on the Verbmobil corpus. The results show that explicitly modeling prior knowledge is still valuable in building recognition systems. With a strong language model (LM) based on syllables, the structured approach significantly outperforms the end-to-end model. The best word error rate (WER) regarding syllables was achieved using kaldi with a 4-gram LM, modeling all syllables observed in the training set. It achieved 10.0 best WER was 27.53 future recognition systems that operate independent of a large vocabulary, as typically used in a tasks such as recognition of syllabic or agglutinative languages, out-of-vocabulary techniques, keyword search indexing and medical speech processing.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/05/2022

Fast and accurate factorized neural transducer for text adaption of end-to-end speech recognition models

Neural transducer is now the most popular end-to-end model for speech re...
research
01/29/2021

BCN2BRNO: ASR System Fusion for Albayzin 2020 Speech to Text Challenge

This paper describes joint effort of BUT and Telefónica Research on deve...
research
09/11/2016

Wav2Letter: an End-to-End ConvNet-based Speech Recognition System

This paper presents a simple end-to-end model for speech recognition, co...
research
09/21/2015

Noise Robust IOA/CAS Speech Separation and Recognition System For The Third 'CHIME' Challenge

This paper presents the contribution to the third 'CHiME' speech separat...
research
01/28/2022

Neural-FST Class Language Model for End-to-End Speech Recognition

We propose Neural-FST Class Language Model (NFCLM) for end-to-end speech...
research
07/28/2020

Multimodal Integration for Large-Vocabulary Audio-Visual Speech Recognition

For many small- and medium-vocabulary tasks, audio-visual speech recogni...
research
05/28/2020

Subword RNNLM Approximations for Out-Of-Vocabulary Keyword Search

In spoken Keyword Search, the query may contain out-of-vocabulary (OOV) ...

Please sign up or login with your details

Forgot password? Click here to reset