Image Pre-processing on NumtaDB for Bengali Handwritten Digit Recognition

08/18/2020
by   Ovi Paul, et al.
0

NumtaDB is by far the largest data-set collection for handwritten digits in Bengali. This is a diverse dataset containing more than 85000 images. But this diversity also makes this dataset very difficult to work with. The goal of this paper is to find the benchmark for pre-processed images which gives good accuracy on any machine learning models. The reason being, there are no available pre-processed data for Bengali digit recognition to work with like the English digits for MNIST.

READ FULL TEXT

Authors

page 2

page 3

04/08/2020

MNIST-MIX: A Multi-language Handwritten Digit Recognition Dataset

In this letter, we contribute a multi-language handwritten digit recogni...
11/09/2020

An improved helmet detection method for YOLOv3 on an unbalanced dataset

The YOLOv3 target detection algorithm is widely used in industry due to ...
06/09/2020

Tamil Vowel Recognition With Augmented MNIST-like Data Set

We report generation of a MNIST [4] compatible data set [1] for Tamil vo...
04/28/2019

TMIXT: A process flow for Transcribing MIXed handwritten and machine-printed Text

Handling large corpuses of documents is of significant importance in man...
06/28/2021

Object Detection Based Handwriting Localization

We present an object detection based approach to localize handwritten re...
06/06/2018

NumtaDB - Assembled Bengali Handwritten Digits

To benchmark Bengali digit recognition algorithms, a large publicly avai...
05/05/2022

OCR Synthetic Benchmark Dataset for Indic Languages

We present the largest publicly available synthetic OCR benchmark datase...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.