Automatic Speech Recognition for Humanitarian Applications in Somali

07/23/2018
by   Raghav Menon, et al.
0

We present our first efforts in building an automatic speech recognition system for Somali, an under-resourced language, using 1.57 hrs of annotated speech for acoustic model training. The system is part of an ongoing effort by the United Nations (UN) to implement keyword spotting systems supporting humanitarian relief programmes in parts of Africa where languages are severely under-resourced. We evaluate several types of acoustic model, including recent neural architectures. Language model data augmentation using a combination of recurrent neural networks (RNN) and long short-term memory neural networks (LSTMs) as well as the perturbation of acoustic data are also considered. We find that both types of data augmentation are beneficial to performance, with our best system using a combination of convolutional neural networks (CNNs), time-delay neural networks (TDNNs) and bi-directional long short term memory (BLSTMs) to achieve a word error rate of 53.75

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/06/2018

Bidirectional Quaternion Long-Short Term Memory Recurrent Neural Networks for Speech Recognition

Recurrent neural networks (RNN) are at the core of modern automatic spee...
research
06/24/2019

SylNet: An Adaptable End-to-End Syllable Count Estimator for Speech

Automatic syllable count estimation (SCE) is used in a variety of applic...
research
04/07/2015

Deep Recurrent Neural Networks for Acoustic Modelling

We present a novel deep Recurrent Neural Network (RNN) model for acousti...
research
06/17/2019

Adversarial Training for Multilingual Acoustic Modeling

Multilingual training has been shown to improve acoustic modeling perfor...
research
05/17/2020

Multi-modal Automated Speech Scoring using Attention Fusion

In this study, we propose a novel multi-modal end-to-end neural approach...
research
06/12/2023

Multi-View Frequency-Attention Alternative to CNN Frontends for Automatic Speech Recognition

Convolutional frontends are a typical choice for Transformer-based autom...
research
04/04/2016

Recurrent Neural Networks for Polyphonic Sound Event Detection in Real Life Recordings

In this paper we present an approach to polyphonic sound event detection...

Please sign up or login with your details

Forgot password? Click here to reset