Despite the progress of multilingual machine translation (MMT) and the many efforts towards improving its performance for low-resource languages, African languages suffer from under-representation. For example, of the known African languages Eberhard et al. (2020) only
of them are available in the FLORES 101 Large-scale Multilingual Translation Task as at the time of this research. Furthermore, most research that look into transfer learning of multlilingual models from high-resource to low-resource languages rarely work with ALs in the low-resource scenario. While the consensus is that the outcome of the research made using the low-resource non-African languages should be scalable to African languages, this cross-lingual generalization is not guaranteedOrife et al. (2020) and the extent to which it actually works remains largely under-studied. Transfer learning from African languages to African languages sharing the same language sub-class has been shown to give better translation quality than from high-resource Anglo-centric languages Nyoni and Bassett (2021) calling for the need to investigate ALAL multilingual translation.
We take a step towards addressing the under-representation of African languages in MMT and improving experiments by participating in the 2021 WMT Shared Task: Large-Scale Multilingual Machine Translation and focusing solely on ALsALs. We focused on African languages and non-African languages (English and French). Table 1 gives an overview of our focus African languages in terms of their language family, number of speakers and the regions in Africa where they are spoken Adelani et al. (2021b). We chose these languages in an effort to create some language diversity: the 6 African languages span the most widely and least spoken languages in Africa.
Our main contributions are summarized below:
MMTAfrica – a many-to-many ALAL multilingual model for African languages.
We further created a unique highly representative test set – MMTAfrica Test Set – and reported benchmark results and insights using MMTAfrica.
|Swahili||swa||Niger-Congo-Bantu||98M||Southern, Central & East|
2 Related Work
2.1 Multilingual Machine Translation (MMT)
The current state of multilingual NMT, where a single NMT model is optimized for the translation of multiple language pairs Firat et al. (2016); Johnson et al. (2017); Lu et al. (2018); Aharoni et al. (2019); Arivazhagan et al. (2019b), has become very appealing for a number of reasons. It is scalable and easy to deploy or maintan (the ability of a single model to effectively handle all translation directions from languages, if properly trained and designed, surpasses the scalability of individually trained models using the traditional bilingual framework). Multilingual NMT can encourage knowledge transfer among related language pairs Lakew et al. (2018); Tan et al. (2019) as well as positive transfer from higher-resource languages Zoph et al. (2016); Neubig and Hu (2018); Arivazhagan et al. (2019a); Aharoni et al. (2019); Johnson et al. (2017) due to its shared representation, improve low-resource translation Ha et al. (2016); Johnson et al. (2017); Arivazhagan et al. (2019b); Xue et al. (2021) and enable zero-shot translation (i.e. direct translation between a language pair never seen during training) Firat et al. (2016); Johnson et al. (2017).
Despite the many advantages of multilingual NMT it suffers from certain disadvantages. Firstly, the output vocabulary size is typically fixed regardless of the number of languages in the corpus and increasing the vocabulary size is costly in terms of computational resources because the training and inference time scales linearly with the size of the decoder’s output layer.
Another pitfall of massively multilingual NMT is its poor zero-shot performance Firat et al. (2016); Arivazhagan et al. (2019a); Johnson et al. (2017); Aharoni et al. (2019), particularly compared to pivot-based models (two bilingual models that translate from source to target language through an intermediate language), the spurious correlation issue Gu et al. (2019) and off-target translation Johnson et al. (2017) where the model ignores the given target information and translates into a wrong language.
Our work is inspired by some research to improve the performance (including zero-shot translation) of multilingual models via back-translation and leveraging monolingual data. Zhang et al. (2020) proposed random online backtranslation to enhance multilingual translation of unseen training language pairs. Siddhant et al. (2020) leveraged monolingual data in a semi-supervised fashion and reported three major results:
Using monolingual data significantly boosts the translation quality of low resource languages in multilingual models.
Self-supervision improves zero-shot translation quality in multilingual models.
Leveraging monolingual data with self-supervision provides a viable path towards adding new languages to multilingual models.
3 Data Methodology
Table 2 presents the size of the gathered and cleaned parallel sentences for each language direction.
We devised preprocessing guidelines for each of our focus languages taking their linguistic properties into consideration. We used a maximum sequence length of (due to computational resources) and a minimum of . In the following sections we will describe the data sources for the the parallel and monolingual corpora.
As NMT models are very reliant on parallel data, we sought to gather more parallel sentences for each language direction in an effort to increase the size and domain of each language direction. To this end, our first source was JW300 Agić and Vulić (2019), a parallel corpus of over 300 languages with around 100 thousand biblical domain parallel sentences per language pair on average. Using OpusTools Aulamo et al. (2020) we were able to get only very trustworthy translations by setting ( is a threshold which indicates the confidence of the translations). We collected more parallel sentences from Tatoeba111https://opus.nlpl.eu/Tatoeba.php, kde4222https://huggingface.co/datasets/kde4 Tiedemann (23-25), and some English-based bilingual samples from MultiParaCrawl333https://www.paracrawl.eu/.
Finally, following pointers from the native speakers of these focus languages in the Masakhane community et al. (2020)
to existing research on machine translation for African languages which open-sourced their parallel data, we assembled more parallel sentences mostly in theAL direction.
Despite our efforts to gather several parallel data from various domains, we were faced with some problems: 1) there was a huge imbalance in parallel samples across the language directions. In Table 2 we see that the fon direction has the least amount of parallel sentences while swa or yor is made up of relatively larger parallel sentences. 2) the parallel sentences particularly for ALAL span a very small domain (mostly biblical, internet )
We therefore set out to gather monolingual data from diverse sources. As our focus is on African languages, we collated monolingual data in only these languages.
The monolingual sources and volume are summarized in Table 3.
|Xhosa (xho)||The CC100-Xhosa Dataset created by Conneau et al. (2019), and OpenSLR van Niekerk et al. (2017)|
|Yoruba (yor)||Yoruba Embeddings Corpus Alabi et al. (2020) and MENYO20k Adelani et al. (2021a)|
|Fon/Fongbe (fon)||FFR Dataset Dossou and Emezue (2020), and Fon French Daily Dialogues Parallel Data Dossou and Emezue (2021)|
|Swahili/Kiswahili (swa)||Shikali and Refuoe (2019)|
|Kinyarwanda (kin)||KINNEWS-and-KIRNEWS Niyongabo et al. (2020)|
|Igbo (ibo)||Ezeani et al. (2020)|
3.1 Data Set Types in our Work
Here we elaborate on the different categories of data set that we (generated and) used in our work for training and evaluation.
FLORES Test Set: This refers to the dev test set of parallel sentences in all language directions provided by Goyal et al. (2021)444https://dl.fbaipublicfiles.com/flores101/dataset/flores101_dataset.tar.gz. We performed evaluation on this test set for all language directions except fon and kin.
MMTAfrica Test Set: This is a test set we created by taking out a small but equal number of sentences from each parallel source domain. As a result, we have a set from a wide range of domains, while encompassing samples from many existing test sets from previous research. Although this set is small to be fully considered as a test set, we open-source it because it contains sentences from many domains (making it useful for evaluation) and we hope that it can be built upon, by perhaps merging it with other benchmark test sets Abate et al. (2018); Abbott and Martinus (2019); Reid et al. (2021).
Baseline Train/Test Set: We first conducted baseline experiments with Fon, Igbo, English and French as explained in section 5.0.1. For this we created a special data set by carefully selecting a small subset of the FFR Dataset (which already contained parallel sentences in French and Fon), first automatically translating the sentences to English and Igbo, using the Google Translate API555https://cloud.google.com/translate, and finally re-translating with the help of Igbo (7) and English (7) native speakers (we recognized that it was easier for native speakers to edit/tweak an existing translation rather than writing the whole translation from scratch). In so doing, we created a data set of translations in all language directions. We split the data set into for training Baseline Train Set, for dev and for test Baseline Test Set.
4 Model and Training Setup
For each language direction we have its set of parallel sentences where is the th source sentence of language and is its translation in the target language .
Following the approach of Johnson et al. (2017) and Xue et al. (2021), we model translation in a text-to-text format. More specifically, we create the input for the model by prepending the target language tag to the source sentence. Therefore for each source sentence the input to the model is <> and the target is . Taking a real example, let’s say we wish to translate the Igbo sentence Daalụ maka ikwu eziokwu nke Chineke to English. The input to the model becomes ¡eng¿ Daalụ maka ikwu eziokwu nke Chineke.
4.1 Model Setup
For all our experiments, we used the mT5 model Xue et al. (2021), a multilingual variant of the encoder-decoder, transformer-based Vaswani et al. (2017) “Text-to-Text Transfer Transformer” (T5) model Raffel et al. (2019). In T5 pre-training, the NLP tasks (including machine translation) were cast into a “text-to-text” format – that is, a task where the model is fed some text prefix for context or conditioning and is then asked to produce some output text. This framework makes it straightforward to design a number of NLP tasks like machine translation, summarization, text classification, etc. Also, it provides a consistent training objective both for pre-training and finetuning. The mT5 model was pre-trained with a maximum likelihood objective using “teacher forcing” Williams and Zipser (1989). The mT5 model was also pretrained with a modification of the masked language modelling objective Devlin et al. (2018).
We finetuned the mt5-base model on our many-to-many machine translation task. While Xue et al. (2021) suggest that higher versions of the mT5 model (Large, XL or XXL) give better performance on downstream multilingual translation tasks, we were constrained by computational resources to mt5-base, which has parameters.
4.2 Training Setup
We have a set of language tags for the languages we are working with in our multilingual many-to-many translation. In our baseline setup (section 5.0.1) and in our final experiment (section 5.0.2) . We carried out many-to-many translation using all the possible directions from except . We skipped for this fundamental reason:
our main focus is on AfricanAfrican or African. Due to the high-resource nature of English and French, adding the training set for would overshadow the learning of the other language directions and greatly impede our analyses. Our intuition draws from the observation of Xue et al. (2021) as the reason for off-target translation in the mT5 model: as English-based finetuning proceeds, the model’s assigned likelihood of non-English tokens presumably decreases. Therefore since the mt5-base training set contained predominantly English (and after other European languages) tokens and our research is about ALAL translation, removing the direction was our way of ensuring the model designated more likelihood to AL tokens.
4.2.1 Our Contributions
In addition to the parallel data between the African languages, we leveraged monolingual data to improve translation quality in two ways:
our backtranslation (BT): We designed a modified form of the random online backtranslation Zhang et al. (2020) where instead of randomly selecting a subset of languages to backtranslate, we selected for each language sentences at random from the monolingual data set. This means that the model gets to backtranslate different (monolingual) sentences every backtranslation time and in so doing, we believe, improve the model’s domain adaptation because it gets to learn from various samples from the whole monolingual data set. We initially tested different values of to find a compromise between backtranslation computation time and translation quality. Following research works which have shown the effectiveness of random beam-search over greedy decoding while generating backtranslations Lample et al. (2017); Edunov et al. (2018); Hoang et al. (2018); Zhang et al. (2020), we generated
prediction sentences from the model and randomly selected (with equal probability) one for our backtranslated sentence. Naturally the value offurther affects the computation time (because the model has to produce different output sentences for each input sentence) and so we finally settled with .
our reconstruction: Given a monolingual sentence from language , we applied random swapping ( times) and deletion (with a probability of ) to get a noisy version . Taking inspiration from Raffel et al. (2019) we integrated the reconstruction objective into our model finetuning by prepending the language tag ¡¿ to
and setting its target output to.
In all our experiments we initialized the pretrained mT5-base model using Hugging Face’s AutoModelForSeq2SeqLM and tracked the training process with Weights&Biases Biewald (2020). We used the AdamW optimizer Loshchilov and Hutter (2017) with a learning rate (lr) of and transformer’s scheduler (where the learning rate decreases linearly from the initial lr set in the optimizer to , after a warmup period and then increases linearly from to the initial lr set in the optimizer.)
The goal of our baseline was to understand the effect of jointly finetuning with backtranslation and reconstruction on the AfricanAfrican language translation quality in two scenarios: when the AL was initially pretrained on the multilingual model and contrariwise. Using Fon (which was not initially included in the pretraining) and Igbo (which was initially included in the pretraining) as the African languages for our baseline training, we finetuned our model on a many-to-many translation in all directions of amounting to directions. We used the Baseline Train Set for training and the Baseline Test Set
for evaluation. We trained the model for only 3 epochs in three settings:
BASE: in this setup we finetune the model on only the many-to-many translation task: no backtranslation nor reconstruction.
BT: refers to finetuning with our backtranslation objective described in section 4. For our baseline, where we backtranslate using monolingual data in , we set . For our final experiments, we first tried with but finally reduced to due to the great deal of computation required. For our baseline experiment, we ran one epoch normally and the remaining two with backtranslation. For our final experiments, we first finetuned the model on epochs before continuing with backtranslation.
BT&REC: refers to joint backtranslation and reconstruction (explained in section 4) while finetuning. Two important questions were addressed – 1) the ratio, backtranslation : reconstruction, of monolingual sentences to use and 2) whether to use the same or different sentences for backtranslation and reconstruction. Bearing computation time in mind, we resolved to go with for our baseline and for our final experiments. We leave ablation studies on the effect of the ratio on translation quality to future work. For the second question we decided to randomly sample (with replacement) different sentences each for our backtranslation and reconstruction.
For our baseline, we used a learning rate of , a batch size of 32 sentences, with gradient accumulation up to a batch of 256 sentences and an early stopping patience of 100 evaluation steps. To further analyse the performance of our baseline setups we ran comparemt666https://github.com/neulab/compare-mt Neubig et al. (2019) on the model’s predictions.
MMTAfrica refers to our final experimental setup where we finetuned our model on all language directions involving all eight languages except engfra. Taking inspiration from our baseline results we ran our experiment with our proposed BT&REC setting and made some adjustments along the way.
The long computation time for backtranslating (with just 100 sentences per language the model was required to generate around translations every backtranslation time) was a drawback. To mitigate the issue we parallelized the process using the multiprocessing package in Python777https://docs.python.org/3/library/multiprocessing.html. We further slowly reduced the number of sentences for backtranslation (to , and finally ).
Gradient descent in large multilingual models has been shown to be more stable when updates are performed over large batch sizes are used Xue et al. (2021). To cope with our computational resources, we used gradient accumulation to increase updates from an initial batch size of sentences, up to a batch gradient computation size of
sentences. We further utilized PyTorch’s DataParallel package888https://pytorch.org/docs/stable/generated/torch.nn.DataParallel.html to parallelize the training across the GPUs. We used a learning rate (lr) of
6 Results and Insights
All evaluations were made using spBLEU (sentencepiece Kudo and Richardson (2018) + sacreBLEU Post (2018)) as described in Goyal et al. (2021). We further evaluated on the chrF Popović (2015) and TER metrics.
6.1 Baseline Results and Insights
Figure 1 compares the spBLEU scores for the three setups used in our baseline experiments. As a reminder, we make use of the symbol to refer to any language in the set .
BT gives strong improvement over BASE (except in engibo where it’s relatively the same, and fraibo where it performs worse).
When the target language is fon, we observe a considerable boost in the spBLEU of the BT setting, which also significantly outperformed BASE and BT&REC. BT&REC contributed very little when compared with BT and sometimes even performed poorly (in engfon). We attribute this poor performance from the reconstruction objective to the fact that the mt5-base model was not originally pretrained on Fon. Therefore, with only 3 epochs of finetuning (and 1 epoch before introducing the reconstruction and backtranslation objectives) the model was not able to meaningfully utilize both objectives.
Conversely, when the target language is ibo BT&REC gives best results – even in scenarios where BT underperforms BASE(as is the case of fraibo and engibo). We believe that the decoder of the model, being originally pretrained on corpora containing Igbo, was able to better use our reconstruction to improve translation quaity in ibo direction.
Drawing insights from fonibo we offer the following propositions concerning ALAL multilingual translation:
our backtranslation (section 4) from monolingual data improves the cross-lingual mapping of the model for low-resource African languages. While it is computationally expensive, our parallelization and decay of number of backtranslated sentences are some potential solutions towards effectively adopting backtranslation using monolingual data.
Denoising objectives typically have been known to improve machine translation quality Zhang and Zong (2016); Cheng et al. (2016); Gu et al. (2019); Zhang et al. (2020); Xue et al. (2021) because they imbue the model with more generalizable knowledge (about that language) which is used by the decoder to predict better token likelihoods for that language during translation. This is a reasonable explanation for the improved quality with the BT&REC over BT in the ibo. As we learned from fon, using reconstruction could perform unsatisfactorily if not handled well. Some methods we propose are:
For African languages that were included in the original model pretraining (as was the case of Igbo, Swahili, Xhosa, and Yorùbá in the mT5 model), using the BT&REC setting for finetuning produces best results. While we did not perform ablation studies on the data size ratio for backtranslation and reconstruction, we believe that our ratio of (in our final experiments) gives the best compromise on both computation time and translation quality.
For African languages that were not originally included in the original model pretraining (as was the case of Kinyarwanda and Fon in the mT5 model), reconstruction together with backtranslation (especially at an early stage) only introduces more noise which could harm the cross-lingual learning. For these languages we propose:
first finetuning the model on only our reconstruction (described in section 4) for fairly long training steps before using BT&REC. This way, the initial reconstruction will help the model learn that language representation space and increase its the likelihood of tokens.
6.2 MMTAfrica Results and Insights
In Table 4, we compared MMTAfrica with the M2M MMTFan et al. (2020) benchmark results of Goyal et al. (2021) using the same test set they used – FLORES Test Set. On all language pairs except swaeng (which has a comparable spBLEU difference), we report an improvement from MMTAfrica (spBLEU gains ranging from in swafra to in fraxho). The lower score of swaeng presents an intriguing anomaly, especially given the large availability of parallel corpora in our training set for this pair. We plan to investigate this in further work.
|Source||Target||spBLEU (FLORES)||spBLEU (Ours*)||spCHRF (Ours*)|
|ibo||swa||4.38||21.84 — 11.63||37.38 — 35.66|
|ibo||xho||2.44||13.97 — 7.65||31.95 — 29.47|
|ibo||yor||1.54||10.72 — 7.72||26.55 — 16.84|
|ibo||eng||7.37||13.62 — 15.44||38.90 — 37.99|
|ibo||fra||6.02||16.46 — 12.89||35.10 — 31.71|
|swa||ibo||1.97||19.80 — 16.73||33.95 — 28.07|
|swa||xho||2.71||21.71 — 11.74||39.86 — 35.67|
|swa||yor||1.29||11.68 — 8.28||27.44 — 17.18|
|swa||eng||30.43||27.67 — 28.41||56.12 — 53.65|
|swa||fra||26.69||27.27 — 19.85||46.20 — 41.41|
|xho||ibo||3.80||17.02 — 15.28||31.30 — 26.67|
|xho||swa||6.14||29.47 — 15.73||44.68 — 40.78|
|xho||yor||1.92||10.42 — 7.82||26.77 — 17.10|
|xho||eng||10.86||20.77 — 21.75||48.69 — 46.34|
|xho||fra||8.28||21.48 — 15.97||40.65 — 36.28|
|yor||ibo||1.85||11.45 — 11.44||25.26 — 21.70|
|yor||swa||1.93||14.99 — 6.61||30.49 — 28.21|
|yor||xho||1.94||9.31 — 4.99||26.34 — 24.27|
|yor||eng||4.18||8.15 — 9.02||30.65 — 28.85|
|yor||fra||3.57||10.59 — 7.91||27.60 — 23.93|
|eng||ibo||3.53||21.49 — 19.52||37.24 — 32.46|
|eng||swa||26.95||40.11 — 27.06||53.13 — 51.90|
|eng||xho||4.47||27.15 — 14.85||44.93 — 39.88|
|eng||yor||2.17||12.09 — 9.43||28.34 — 18.39|
|fra||ibo||1.69||19.48 — 17.25||34.47 — 29.49|
|fra||swa||17.17||34.21 — 19.49||48.95 — 45.44|
|fra||xho||2.27||21.73 — 11.37||40.06 — 35.41|
|fra||yor||1.16||11.42 — 8.54||27.67 — 17.53|
In Table 5 we introduce benchmark results of MMTAfrica on MMTAfrica Test Set with and without BT&REC. We also put the test size of each language pair. The spBLEU scores demonstrate the efficiency of our new objective, as it led to improvements in majority of the tasks.
Interesting analysis about Fon (fon) and Yorùbá (yor):
For each language, the lowest spBLEU scores in both tables come from the yor direction, except fonyor (from Table 5) which interestingly has the highest spBLEU score compared to the other fon directions. We do not know the reason for the very low performance in the yor direction, but we offer below a plausible explanation about fonyor.
The oral linguistic history of Fon ties it to the ancient Yorùbá kingdom Barnes (1997). Furthermore, in present day Benin, where Fon is largely spoken as a native language, Yoruba is one of the indigenuous languages commonly spoken. 999https://en.wikipedia.org/wiki/Benin (Last Accessed : 30.08.2021). Therefore Fon and Yorùbá share some linguistic characteristics and we believe this is one logic behind the fonyor surpassing other fon directions.
This explanation could inspire transfer learning from Yorùbá, which has received comparably more research and has more resources for machine translation, to Fon. We leave this for future work.
7 Conclusion and Future Work
In this paper, we introduced MMTAfrica, a multilingual machine translation model on 6 African Languages. Our results and analyses, including a new reconstruction objective, give insights on MMT for African languages for future research.
|ibo||swa||60||34.89 (12.27)||47.38 (36.65)||68.28 (124.01)|
|ibo||xho||30||36.69 (21.92)||50.66 (41.40)||59.65 (76.36)|
|ibo||yor||30||11.77 (10.19)||29.54 (22.10)||129.84 (130.39)|
|ibo||kin||30||33.92 (16.07)||46.53 (36.95)||67.73 (96.5)|
|ibo||fon||30||35.96 (11.47)||43.14 (21.75)||63.21 (91.91)|
|ibo||eng||90||37.28 (11.70)||60.42 (38.11)||62.05 (110.67)|
|ibo||fra||60||30.86 (6.02)||44.09 (28.13)||69.53 (121.43)|
|swa||ibo||60||33.71 (23.12)||43.02 (33.91)||60.01 (85.18)|
|swa||xho||30||37.28 (20.55)||52.53 (40.84)||55.86 (72.71)|
|swa||yor||30||14.09 (15.49)||27.50 (23.50)||113.63 (106.22)|
|swa||kin||30||23.86 (13.53)||42.59 (36.88)||94.67 (118.0)|
|swa||fon||30||23.29 (8.94)||33.52 (16.97)||65.11 (84.12)|
|swa||eng||60||35.55 (43.11)||60.47 (66.52)||47.32 (40.0)|
|swa||fra||60||30.11 (21.99)||48.33 (43.84)||63.38 (71.17)|
|xho||ibo||30||33.25 (24.33)||45.36 (36.42)||62.83 (70.63)|
|xho||swa||30||39.26 (23.42)||53.75 (46.22)||53.72 (67.13)|
|xho||yor||30||22.00 (16.86)||38.06 (27.20)||70.45 (74.36)|
|xho||kin||30||30.66 (14.09)||46.19 (37.26)||74.70 (112.4)|
|xho||fon||30||25.80 (10.73)||34.87 (18.70)||65.96 (85.51)|
|xho||eng||90||30.25 (21.36)||55.12 (48.96)||62.11 (69.61)|
|xho||fra||30||29.45 (16.25)||45.72 (35.99)||61.03 (70.91)|
|yor||ibo||30||25.11 (15.00)||34.19 (26.75)||74.80 (97.44)|
|yor||swa||30||17.62 (4.81)||34.71 (28.23)||85.18 (130.22)|
|yor||xho||30||29.31 (14.43)||43.13 (32.34)||66.82 (87.2)|
|yor||kin||30||25.16 (14.65)||38.02 (32.22)||72.67 (86.99)|
|yor||fon||30||31.81 (10.28)||37.45 (17.52)||63.39 (88.57)|
|yor||eng||90||17.81 (2.11)||41.73 (22.90)||93.00 (93.49)|
|yor||fra||30||15.44 (5.62)||30.97 (22.81)||90.57 (136.96)|
|kin||ibo||30||31.25 (27.30)||42.36 (37.44)||66.73 (76.1)|
|kin||swa||30||33.65 (13.00)||46.34 (38.62)||72.70 (100.84)|
|kin||xho||30||20.40 (9.96)||39.71 (33.27)||89.97 (108.05)|
|kin||yor||30||18.34 (17.64)||33.53 (27.48)||70.43 (72.4)|
|kin||fon||30||22.43 (10.84)||32.49 (18.40)||67.26 (82.65)|
|kin||eng||60||15.82 (9.28)||43.10 (35.88)||96.55 (102.82)|
|kin||fra||30||16.23 (12.24)||33.51 (29.41)||91.82 (100.75)|
|fon||ibo||30||32.36 (16.24)||46.44 (31.18)||61.82 (83.54)|
|fon||swa||30||29.84 (17.08)||42.96 (35.26)||72.28 (88.37)|
|fon||xho||30||28.82 (13.59)||43.74 (31.80)||66.98 (93.27)|
|fon||yor||30||30.45 (22.17)||42.63 (30.52)||60.72 (70.91)|
|fon||kin||30||23.88 (10.08)||39.59 (28.96)||78.06 (91.81)|
|fon||eng||30||16.63 (13.67)||41.63 (30.36)||69.03 (83.57)|
|fon||fra||60||24.79 (17.31)||43.39 (33.39)||82.15 (82.97)|
|eng||ibo||90||44.24 (25.18)||54.89 (35.84)||63.92 (87.47)|
|eng||swa||60||49.94 (33.53)||61.45 (55.58)||47.83 (65.22)|
|eng||xho||120||31.97 (22.57)||49.74 (46.01)||72.89 (68.47)|
|eng||yor||90||23.93 (11.01)||36.19 (19.25)||84.05 (90.29)|
|eng||kin||90||40.98 (11.47)||56.00 (30.05)||76.37 (101.42)|
|eng||fon||30||27.19 (6.40)||36.86 (14.91)||62.54 (91.08)|
|fra||ibo||60||36.47 (18.26)||46.93 (28.72)||59.91 (86.05)|
|fra||swa||60||36.53 (20.72)||51.42 (46.35)||55.94 (66.86)|
|fra||xho||30||34.35 (21.49)||49.39 (43.36)||60.30 (72.39)|
|fra||yor||30||7.26 (7.88)||25.54 (19.59)||124.53 (121.17)|
|fra||kin||30||31.07 (17.24)||42.26 (37.63)||81.06 (95.45)|
|fra||fon||60||31.07 (10.82)||38.72 (21.10)||75.74 (93.33)|
- Participatory research for low-resourced machine translation: a case study in african languages. Findings of EMNLP. Cited by: §3.
- Parallel corpora for bi-lingual English-Ethiopian languages statistical machine translation. In Proceedings of the 27th International Conference on Computational Linguistics, Santa Fe, New Mexico, USA, pp. 3102–3111. External Links: Cited by: 2nd item.
- Benchmarking neural machine translation for Southern African languages. In Proceedings of the 2019 Workshop on Widening NLP, Florence, Italy, pp. 98–101. External Links: Cited by: 2nd item.
- The effect of domain and diacritics in yorùbá-english neural machine translation. External Links: Cited by: Table 3.
MasakhaNER: named entity recognition for african languages. CoRR abs/2103.11811. External Links: Cited by: Table 1, §1.
- JW300: a wide-coverage parallel corpus for low-resource languages. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, pp. 3204–3210. External Links: Cited by: §3.
- Massively multilingual neural machine translation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota, pp. 3874–3884. External Links: Cited by: §2.1, §2.1.
- Massive vs. curated embeddings for low-resourced languages: the case of Yorùbá and Twi. In Proceedings of the 12th Language Resources and Evaluation Conference, Marseille, France, pp. 2754–2762 (English). External Links: Cited by: Table 3.
- The missing ingredient in zero-shot neural machine translation. CoRR abs/1903.07091. External Links: Cited by: §2.1, §2.1.
- Massively multilingual neural machine translation in the wild: findings and challenges. CoRR abs/1907.05019. External Links: Cited by: §2.1.
- OpusTools and parallel corpus diagnostics. In Proceedings of The 12th Language Resources and Evaluation Conference, pp. 3782–3789. External Links: Cited by: §3.
- Africa’s ogun: old world and new. African systems of thought, Indiana University Press. External Links: Cited by: §6.2.
- Experiment tracking with weights and biases. Note: Software available from wandb.com External Links: Cited by: §5.
- Semi-supervised learning for neural machine translation. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany, pp. 1965–1974. External Links: Cited by: 2nd item.
- Unsupervised cross-lingual representation learning at scale. External Links: Cited by: Table 3.
- BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805. External Links: Cited by: §4.1.
- FFR V1.0: fon-french neural machine translation. CoRR abs/2003.12111. External Links: Cited by: Table 3.
- Crowdsourced phrase-based tokenization for low-resourced neural machine translation: the case of fon language. CoRR abs/2103.08052. External Links: Cited by: Table 3.
- Ethnologue: languages of the world. twenty-third edition.. SIL International. External Links: Cited by: Table 1, §1.
- Understanding back-translation at scale. CoRR abs/1808.09381. External Links: Cited by: item 1.
- Igbo-english machine translation: an evaluation benchmark. CoRR abs/2004.00648. External Links: Cited by: Table 3.
- Beyond english-centric multilingual machine translation. CoRR abs/2010.11125. External Links: Cited by: item 3, §6.2.
- Multi-way, multilingual neural machine translation with a shared attention mechanism. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, California, pp. 866–875. External Links: Cited by: §2.1.
Zero-resource translation with multi-lingual neural machine translation.
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, Texas, pp. 268–277. External Links: Cited by: §2.1, §2.1.
- The flores-101 evaluation benchmark for low-resource and multilingual machine translation. External Links: Cited by: item 3, 1st item, §6.2, §6.
- Improved zero-shot neural machine translation via ignoring spurious correlations. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, pp. 1258–1268. External Links: Cited by: §2.1, 2nd item.
- Toward multilingual neural machine translation with universal encoder and decoder. CoRR abs/1611.04798. External Links: Cited by: §2.1.
- Iterative back-translation for neural machine translation. In Proceedings of the 2nd Workshop on Neural Machine Translation and Generation, Melbourne, Australia, pp. 18–24. External Links: Cited by: item 1.
- Google’s multilingual neural machine translation system: enabling zero-shot translation. Transactions of the Association for Computational Linguistics 5, pp. 339–351. External Links: Cited by: §2.1, §2.1, §4.
- SentencePiece: a simple and language independent subword tokenizer and detokenizer for neural text processing. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Brussels, Belgium, pp. 66–71. External Links: Cited by: §6.
A comparison of transformer and recurrent neural networks on multilingual neural machine translation. In Proceedings of the 27th International Conference on Computational Linguistics, Santa Fe, New Mexico, USA, pp. 641–652. External Links: Cited by: §2.1.
- Unsupervised machine translation using monolingual corpora only. CoRR abs/1711.00043. External Links: Cited by: item 1.
- Fixing weight decay regularization in adam. CoRR abs/1711.05101. External Links: Cited by: §5.
- A neural interlingua for multilingual machine translation. In Proceedings of the Third Conference on Machine Translation: Research Papers, Brussels, Belgium, pp. 84–92. External Links: Cited by: §2.1.
- Compare-mt: A tool for holistic comparison of language generation systems. CoRR abs/1903.07926. External Links: Cited by: §5.0.1.
- Rapid adaptation of neural machine translation to new languages. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, pp. 875–880. External Links: Cited by: §2.1.
- KINNEWS and KIRNEWS: benchmarking cross-lingual text classification for Kinyarwanda and Kirundi. In Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain (Online), pp. 5507–5521. External Links: Cited by: Table 3.
- Low-resource neural machine translation for southern african languages. CoRR abs/2104.00366. External Links: Cited by: §1.
- Masakhane - machine translation for africa. CoRR abs/2003.11529. External Links: Cited by: §1.
- . In Proceedings of the Tenth Workshop on Statistical Machine Translation, Lisbon, Portugal, pp. 392–395. External Links: Cited by: §6.
- A call for clarity in reporting BLEU scores. In Proceedings of the Third Conference on Machine Translation: Research Papers, Brussels, Belgium, pp. 186–191. External Links: Cited by: §6.
- Exploring the limits of transfer learning with a unified text-to-text transformer. CoRR abs/1910.10683. External Links: Cited by: item 2, §4.1.
- AfroMT: pretraining strategies and reproducible benchmarks for translation of 8 african languages. External Links: Cited by: 2nd item.
- Language modeling data for swahili. Zenodo. External Links: Cited by: Table 3.
- Leveraging monolingual data with self-supervision for multilingual neural machine translation. CoRR abs/2005.04816. External Links: Cited by: §2.1.
- Multilingual neural machine translation with language clustering. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, pp. 963–973. External Links: Cited by: §2.1.
- Parallel data, tools and interfaces in opus. In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12), N. C. (. Chair), K. Choukri, T. Declerck, M. U. Dogan, B. Maegaard, J. Mariani, J. Odijk, and S. Piperidis (Eds.), Istanbul, Turkey (english). External Links: Cited by: §3.
- Rapid development of TTS corpora for four South African languages. In Proc. Interspeech 2017, Stockholm, Sweden, pp. 2178–2182. External Links: Cited by: Table 3.
- Attention is all you need. CoRR abs/1706.03762. External Links: Cited by: §4.1.
A learning algorithm for continually running fully recurrent neural networks. Neural Computation 1 (2), pp. 270–280. External Links: Cited by: §4.1.
- MT5: a massively multilingual pre-trained text-to-text transformer. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online, pp. 483–498. External Links: Cited by: §2.1, 1st item, §4.1, §4.1, §4, §5.0.2, 2nd item.
- Improving massively multilingual neural machine translation and zero-shot translation. CoRR abs/2004.11867. External Links: Cited by: §2.1, item 1, 2nd item.
- Exploiting source-side monolingual data in neural machine translation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, Texas, pp. 1535–1545. External Links: Cited by: 2nd item.
- Transfer learning for low-resource neural machine translation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, Texas, pp. 1568–1575. External Links: Cited by: §2.1.