Attentive Sequence-to-Sequence Learning for Diacritic Restoration of Yorùbá Language Text

04/03/2018
by   Iroro Orife, et al.
0

Yorùbá is a widely spoken West African language with a writing system rich in tonal and orthographic diacritics. With very few exceptions, diacritics are omitted from electronic texts, due to limited device and application support. Diacritics provide morphological information, are crucial for lexical disambiguation, pronunciation and are vital for any Yorùbá text-to-speech (TTS), automatic speech recognition (ASR) and natural language processing (NLP) tasks. Reframing Automatic Diacritic Restoration (ADR) as a machine translation task, we experiment with two different attentive Sequence-to-Sequence neural models to process undiacritized text. On our evaluation dataset, this approach produces diacritization error rates of less than 5 pre-trained models, datasets and source-code as an open-source project to advance efforts on Yorùbá language technology.

READ FULL TEXT

page 4

page 5

research
03/23/2020

Improving Yorùbá Diacritic Restoration

Yorùbá is a widely spoken West African language with a writing system ri...
research
10/18/2021

ViraPart: A Text Refinement Framework for ASR and NLP Tasks in Persian

The Persian language is an inflectional SOV language. This fact makes Pe...
research
07/24/2023

Boosting Punctuation Restoration with Data Generation and Reinforcement Learning

Punctuation restoration is an important task in automatic speech recogni...
research
03/31/2022

indic-punct: An automatic punctuation restoration and inverse text normalization framework for Indic languages

Automatic Speech Recognition (ASR) generates text which is most of the t...
research
02/19/2022

Punctuation Restoration

Given the increasing number of livestreaming videos, automatic speech re...
research
10/25/2018

Tackling Sequence to Sequence Mapping Problems with Neural Networks

In Natural Language Processing (NLP), it is important to detect the rela...
research
05/28/2023

RASR2: The RWTH ASR Toolkit for Generic Sequence-to-sequence Speech Recognition

Modern public ASR tools usually provide rich support for training variou...

Please sign up or login with your details

Forgot password? Click here to reset