Image Pre-processing on NumtaDB for Bengali Handwritten Digit Recognition

by   Ovi Paul, et al.

NumtaDB is by far the largest data-set collection for handwritten digits in Bengali. This is a diverse dataset containing more than 85000 images. But this diversity also makes this dataset very difficult to work with. The goal of this paper is to find the benchmark for pre-processed images which gives good accuracy on any machine learning models. The reason being, there are no available pre-processed data for Bengali digit recognition to work with like the English digits for MNIST.



page 2

page 3


MNIST-MIX: A Multi-language Handwritten Digit Recognition Dataset

In this letter, we contribute a multi-language handwritten digit recogni...

An improved helmet detection method for YOLOv3 on an unbalanced dataset

The YOLOv3 target detection algorithm is widely used in industry due to ...

Tamil Vowel Recognition With Augmented MNIST-like Data Set

We report generation of a MNIST [4] compatible data set [1] for Tamil vo...

TMIXT: A process flow for Transcribing MIXed handwritten and machine-printed Text

Handling large corpuses of documents is of significant importance in man...

Object Detection Based Handwriting Localization

We present an object detection based approach to localize handwritten re...

NumtaDB - Assembled Bengali Handwritten Digits

To benchmark Bengali digit recognition algorithms, a large publicly avai...

OCR Synthetic Benchmark Dataset for Indic Languages

We present the largest publicly available synthetic OCR benchmark datase...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.