PidginUNMT: Unsupervised Neural Machine Translation from West African Pidgin to English

12/07/2019
by   Kelechi Ogueji, et al.
0

Over 800 languages are spoken across West Africa. Despite the obvious diversity among people who speak these languages, one language significantly unifies them all - West African Pidgin English. There are at least 80 million speakers of West African Pidgin English. However, there is no known natural language processing (NLP) work on this language. In this work, we perform the first NLP work on the most popular variant of the language, providing three major contributions. First, the provision of a Pidgin corpus of over 56000 sentences, which is the largest we know of. Secondly, the training of the first ever cross-lingual embedding between Pidgin and English. This aligned embedding will be helpful in the performance of various downstream tasks between English and Pidgin. Thirdly, the training of an Unsupervised Neural Machine Translation model between Pidgin and English which achieves BLEU scores of 7.93 from Pidgin to English, and 5.18 from English to Pidgin. In all, this work greatly reduces the barrier of entry for future NLP works on West African Pidgin English.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/18/2020

Unsupervised Pidgin Text Generation By Pivoting English Data and Self-Training

West African Pidgin English is a language that is significantly spoken i...
research
04/01/2020

Igbo-English Machine Translation: An Evaluation Benchmark

Although researchers and practitioners are pushing the boundaries and en...
research
02/13/2021

The first large scale collection of diverse Hausa language datasets

Hausa language belongs to the Afroasiatic phylum, and with more first-la...
research
04/07/2021

Interpreting Verbal Metaphors by Paraphrasing

Metaphorical expressions are difficult linguistic phenomena, challenging...
research
12/05/2020

Codeswitched Sentence Creation using Dependency Parsing

Codeswitching has become one of the most common occurrences across multi...
research
05/06/2022

Bridging the Domain Gap for Stance Detection for the Zulu language

Misinformation has become a major concern in recent last years given its...
research
02/26/2020

Marathi To English Neural Machine Translation With Near Perfect Corpus And Transformers

There have been very few attempts to benchmark performances of state-of-...

Please sign up or login with your details

Forgot password? Click here to reset