Kannada-MNIST: A new handwritten digits dataset for the Kannada language

08/03/2019
by   Vinay Uday Prabhu, et al.
10

In this paper, we disseminate a new handwritten digits-dataset, termed Kannada-MNIST, for the Kannada script, that can potentially serve as a direct drop-in replacement for the original MNIST dataset. In addition to this dataset, we disseminate an additional real world handwritten dataset (with 10k images), which we term as the Dig-MNIST dataset that can serve as an out-of-domain test dataset. We also duly open source all the code as well as the raw scanned images along with the scanner settings so that researchers who want to try out different signal processing pipelines can perform end-to-end comparisons. We provide high level morphological comparisons with the MNIST dataset and provide baselines accuracies for the dataset disseminated. The initial baselines obtained using an oft-used CNN architecture (96.8% for the main test-set and 76.1% for the Dig-MNIST test-set) indicate that these datasets do provide a sterner challenge with regards to generalizability than MNIST or the KMNIST datasets. We also hope this dissemination will spur the creation of similar datasets for all the languages that use different symbols for the numeral digits.

READ FULL TEXT

page 2

page 4

page 5

page 11

page 12

page 19

page 20

page 21

research
04/08/2020

MNIST-MIX: A Multi-language Handwritten Digit Recognition Dataset

In this letter, we contribute a multi-language handwritten digit recogni...
research
05/25/2019

Cold Case: The Lost MNIST Digits

Although the popular MNIST dataset [LeCun et al., 1994] is derived from ...
research
09/28/2020

Afro-MNIST: Synthetic generation of MNIST-style datasets for low-resource languages

We present Afro-MNIST, a set of synthetic MNIST-style datasets for four ...
research
04/27/2022

An Improved Nearest Neighbour Classifier

A windowed version of the Nearest Neighbour (WNN) classifier for images ...
research
03/30/2010

Recognition of handwritten Roman Numerals using Tesseract open source OCR engine

The objective of the paper is to recognize handwritten samples of Roman ...
research
11/02/2021

Graph Tree Deductive Networks

In this paper, we introduce Graph Tree Deductive Networks, a network tha...
research
11/15/2021

Tensor network to learn the wavefunction of data

How many different ways are there to handwrite digit 3? To quantify this...

Please sign up or login with your details

Forgot password? Click here to reset