Exploring End-to-End Techniques for Low-Resource Speech Recognition

07/02/2018
by   Vladimir Bataev, et al.
0

In this work we present simple grapheme-based system for low-resource speech recognition using Babel data for Turkish spontaneous speech (80 hours). We have investigated different neural network architectures performance, including fully-convolutional, recurrent and ResNet with GRU. Different features and normalization techniques are compared as well. We also proposed CTC-loss modification using segmentation during training, which leads to improvement while decoding with small beam size. Our best model achieved word error rate of 45.8 data for this task, according to our knowledge.

READ FULL TEXT
research
05/13/2021

Exploring CTC Based End-to-End Techniques for Myanmar Speech Recognition

In this work, we explore a Connectionist Temporal Classification (CTC) b...
research
02/17/2022

Curriculum optimization for low-resource speech recognition

Modern end-to-end speech recognition models show astonishing results in ...
research
10/30/2020

Phoneme Based Neural Transducer for Large Vocabulary Speech Recognition

To join the advantages of classical and end-to-end approaches for speech...
research
04/08/2022

Hierarchical Softmax for End-to-End Low-resource Multilingual Speech Recognition

Low resource speech recognition has been long-suffering from insufficien...
research
02/02/2023

Improving Rare Words Recognition through Homophone Extension and Unified Writing for Low-resource Cantonese Speech Recognition

Homophone characters are common in tonal syllable-based languages, such ...
research
06/19/2018

Recurrent DNNs and its Ensembles on the TIMIT Phone Recognition Task

In this paper, we have investigated recurrent deep neural networks (DNNs...
research
06/15/2018

Deep Lip Reading: a comparison of models and an online application

The goal of this paper is to develop state-of-the-art models for lip rea...

Please sign up or login with your details

Forgot password? Click here to reset