Afro-MNIST: Synthetic generation of MNIST-style datasets for low-resource languages

09/28/2020
by   Daniel J Wu, et al.
0

We present Afro-MNIST, a set of synthetic MNIST-style datasets for four orthographies used in Afro-Asiatic and Niger-Congo languages: Ge`ez (Ethiopic), Vai, Osmanya, and N'Ko. These datasets serve as "drop-in" replacements for MNIST. We also describe and open-source a method for synthetic MNIST-style dataset generation from single examples of each digit. These datasets can be found at https://github.com/Daniel-Wu/AfroMNIST. We hope that MNIST-style datasets will be developed for other numeral systems, and that these datasets vitalize machine learning education in underrepresented nations in the research community.

READ FULL TEXT

page 1

page 3

page 5

page 10

research
02/12/2022

Typography-MNIST (TMNIST): an MNIST-Style Image Dataset to Categorize Glyphs and Font-Styles

We present Typography-MNIST (TMNIST), a dataset comprising of 565,292 MN...
research
08/25/2017

Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms

We present Fashion-MNIST, a new dataset comprising of 28x28 grayscale im...
research
08/03/2019

Kannada-MNIST: A new handwritten digits dataset for the Kannada language

In this paper, we disseminate a new handwritten digits-dataset, termed K...
research
01/17/2022

OmniPrint: A Configurable Printed Character Synthesizer

We introduce OmniPrint, a synthetic data generator of isolated printed c...
research
06/29/2021

SDL: New data generation tools for full-level annotated document layout

We present a novel data generation tool for document processing. The too...
research
12/03/2018

Deep Learning for Classical Japanese Literature

Much of machine learning research focuses on producing models which perf...
research
05/25/2019

Cold Case: The Lost MNIST Digits

Although the popular MNIST dataset [LeCun et al., 1994] is derived from ...

Please sign up or login with your details

Forgot password? Click here to reset