OmniPrint: A Configurable Printed Character Synthesizer

01/17/2022
by   Haozhe Sun, et al.
18

We introduce OmniPrint, a synthetic data generator of isolated printed characters, geared toward machine learning research. It draws inspiration from famous datasets such as MNIST, SVHN and Omniglot, but offers the capability of generating a wide variety of printed characters from various languages, fonts and styles, with customized distortions. We include 935 fonts from 27 scripts and many types of distortions. As a proof of concept, we show various use cases, including an example of meta-learning dataset designed for the upcoming MetaDL NeurIPS 2021 competition. OmniPrint is available at https://github.com/SunHaozhe/OmniPrint.

READ FULL TEXT

page 1

page 4

page 6

page 31

research
07/11/2023

Duncode Characters Shorter

This paper investigates the employment of various encoders in text trans...
research
05/19/2022

Oracle-MNIST: a Realistic Image Dataset for Benchmarking Machine Learning Algorithms

We introduce the Oracle-MNIST dataset, comprising of 28×28 grayscale ima...
research
09/28/2020

Afro-MNIST: Synthetic generation of MNIST-style datasets for low-resource languages

We present Afro-MNIST, a set of synthetic MNIST-style datasets for four ...
research
02/12/2022

Typography-MNIST (TMNIST): an MNIST-Style Image Dataset to Categorize Glyphs and Font-Styles

We present Typography-MNIST (TMNIST), a dataset comprising of 565,292 MN...
research
03/27/2022

UAST: Unicode Aware Sanskrit Transliteration

Devanagari is the writing system that is adapted by various languages li...
research
01/18/2023

Synthcity: facilitating innovative use cases of synthetic data in different data modalities

Synthcity is an open-source software package for innovative use cases of...
research
05/18/2022

DDXPlus: A new Dataset for Medical Automatic Diagnosis

There has been rapidly growing interests in Automatic Diagnosis (AD) and...

Please sign up or login with your details

Forgot password? Click here to reset