DDI-100: Dataset for Text Detection and Recognition

12/25/2019
by   Ilia Zharikov, et al.
0

Nowadays document analysis and recognition remain challenging tasks. However, only a few datasets designed for text detection (TD) and optical character recognition (OCR) problems exist. In this paper we present Distorted Document Images dataset (DDI-100) and demonstrate its usefulness in a wide range of document analysis problems. DDI-100 dataset is a synthetic dataset based on 7000 real unique document pages and consists of more than 100000 augmented images. Ground truth comprises text and stamp masks, text and characters bounding boxes with relevant annotations. Validation of DDI-100 dataset was conducted using several TD and OCR models that show high-quality performance on real data.

READ FULL TEXT
research
01/28/2022

Self-paced learning to improve text row detection in historical documents with missing labels

An important preliminary step of optical character recognition systems i...
research
09/11/2015

OCR accuracy improvement on document images through a novel pre-processing approach

Digital camera and mobile document image acquisition are new trends aris...
research
07/11/2016

Benchmark for License Plate Character Segmentation

Automatic License Plate Recognition (ALPR) has been the focus of many re...
research
07/02/2019

Brno Mobile OCR Dataset

We introduce the Brno Mobile OCR Dataset (B-MOD) for document Optical Ch...
research
03/15/2020

Multistage Curvilinear Coordinate Transform Based Document Image Dewarping using a Novel Quality Estimator

The present work demonstrates a fast and improved technique for dewarpin...
research
11/15/2020

BanglaWriting: A multi-purpose offline Bangla handwriting dataset

This article presents a Bangla handwriting dataset named BanglaWriting t...
research
03/16/2023

ShabbyPages: A Reproducible Document Denoising and Binarization Dataset

Document denoising and binarization are fundamental problems in the docu...

Please sign up or login with your details

Forgot password? Click here to reset