MASRI-HEADSET: A Maltese Corpus for Speech Recognition

08/13/2020
by   Carlos Mena, et al.
0

Maltese, the national language of Malta, is spoken by approximately 500,000 people. Speech processing for Maltese is still in its early stages of development. In this paper, we present the first spoken Maltese corpus designed purposely for Automatic Speech Recognition (ASR). The MASRI-HEADSET corpus was developed by the MASRI project at the University of Malta. It consists of 8 hours of speech paired with text, recorded by using short text snippets in a laboratory environment. The speakers were recruited from different geographical locations all over the Maltese islands, and were roughly evenly distributed by gender. This paper also presents some initial results achieved in baseline experiments for Maltese ASR using Sphinx and Kaldi. The MASRI-HEADSET Corpus is publicly available for research/academic purposes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/20/2022

The Makerere Radio Speech Corpus: A Luganda Radio Corpus for Automatic Speech Recognition

Building a usable radio monitoring automatic speech recognition (ASR) sy...
research
04/20/2020

ClovaCall: Korean Goal-Oriented Dialog Speech Corpus for Automatic Speech Recognition of Contact Centers

Automatic speech recognition (ASR) via call is essential for various app...
research
03/24/2022

Lahjoita puhetta – a large-scale corpus of spoken Finnish with some benchmarks

The Donate Speech campaign has so far succeeded in gathering approximate...
research
02/15/2021

Jira: a Kurdish Speech Recognition System Designing and Building Speech Corpus and Pronunciation Lexicon

In this paper, we introduce the first large vocabulary speech recognitio...
research
01/30/2017

Structural Analysis of Hindi Phonetics and A Method for Extraction of Phonetically Rich Sentences from a Very Large Hindi Text Corpus

Automatic speech recognition (ASR) and Text to speech (TTS) are two prom...
research
04/17/2023

Political corpus creation through automatic speech recognition on EU debates

In this paper, we present a transcribed corpus of the LIBE committee of ...
research
01/29/2018

A Corpus for Modeling Word Importance in Spoken Dialogue Transcripts

Motivated by a project to create a system for people who are deaf or har...

Please sign up or login with your details

Forgot password? Click here to reset