OCR Synthetic Benchmark Dataset for Indic Languages

05/05/2022
by   Naresh Saini, et al.
0

We present the largest publicly available synthetic OCR benchmark dataset for Indic languages. The collection contains a total of 90k images and their ground truth for 23 Indic languages. OCR model validation in Indic languages require a good amount of diverse data to be processed in order to create a robust and reliable model. Generating such a huge amount of data would be difficult otherwise but with synthetic data, it becomes far easier. It can be of great importance to fields like Computer Vision or Image Processing where once an initial synthetic data is developed, model creation becomes easier. Generating synthetic data comes with the flexibility to adjust its nature and environment as and when required in order to improve the performance of the model. Accuracy for labeled real-time data is sometimes quite expensive while accuracy for synthetic data can be easily achieved with a good score.

READ FULL TEXT
research
07/15/2021

DynaDog+T: A Parametric Animal Model for Synthetic Canine Image Generation

Synthetic data is becoming increasingly common for training computer vis...
research
09/20/2019

BinarySDG: binary sensor data generation with R

The scarcity of Smart Home data is still a pretty big problem, and in a ...
research
06/29/2023

BEDLAM: A Synthetic Dataset of Bodies Exhibiting Detailed Lifelike Animated Motion

We show, for the first time, that neural networks trained only on synthe...
research
06/04/2020

RarePlanes: Synthetic Data Takes Flight

RarePlanes is a unique open-source machine learning dataset that incorpo...
research
05/09/2023

Novel Synthetic Data Tool for Data-Driven Cardboard Box Localization

Application of neural networks in industrial settings, such as automated...
research
01/03/2023

Procedural Humans for Computer Vision

Recent work has shown the benefits of synthetic data for use in computer...
research
06/01/2019

A synthetic dataset for deep learning

In this paper, we propose a novel method for generating a synthetic data...

Please sign up or login with your details

Forgot password? Click here to reset