Multi-label Classification of Common Bengali Handwritten Graphemes: Dataset and Challenge

10/01/2020
by   Samiul Alam, et al.
0

Latin has historically led the state-of-the-art in handwritten optical character recognition (OCR) research. Adapting existing systems from Latin to alpha-syllabary languages is particularly challenging due to a sharp contrast between their orthographies. The segmentation of graphical constituents corresponding to characters becomes significantly hard due to a cursive writing system and frequent use of diacritics in the alpha-syllabary family of languages. We propose a labeling scheme based on graphemes (linguistic segments of word formation) that makes segmentation inside alpha-syllabary words linear and present the first dataset of Bengali handwritten graphemes that are commonly used in an everyday context. The dataset is open-sourced as a part of the Bengali.AI Handwritten Grapheme Classification Challenge on Kaggle to benchmark vision algorithms for multi-label grapheme classification. From competition proceedings, we see that deep learning methods can generalize to a large span of uncommon graphemes even when they are absent during training.

READ FULL TEXT
research
02/10/2014

Handwritten Character Recognition In Malayalam Scripts- A Review

Handwritten character recognition is one of the most challenging and ong...
research
02/14/2012

Segmentation of Offline Handwritten Bengali Script

Character segmentation has long been one of the most critical areas of o...
research
11/16/2021

Bengali Handwritten Grapheme Classification: Deep Learning Approach

Despite being one of the most spoken languages in the world (6^th based ...
research
02/22/2017

BanglaLekha-Isolated: A Comprehensive Bangla Handwritten Character Dataset

Bangla handwriting recognition is becoming a very important issue nowada...
research
03/21/2019

Semantic Comparison of State-of-the-Art Deep Learning Methods for Image Multi-Label Classification

Image understanding relies heavily on accurate multi-label classificatio...
research
11/15/2020

BanglaWriting: A multi-purpose offline Bangla handwriting dataset

This article presents a Bangla handwriting dataset named BanglaWriting t...
research
10/10/2018

AI Learns to Recognize Bengali Handwritten Digits: Bengali.AI Computer Vision Challenge 2018

Solving problems with Artificial intelligence in a competitive manner ha...

Please sign up or login with your details

Forgot password? Click here to reset