Oracle-MNIST: a Realistic Image Dataset for Benchmarking Machine Learning Algorithms

05/19/2022
by   Mei Wang, et al.
0

We introduce the Oracle-MNIST dataset, comprising of 28×28 grayscale images of 30,222 ancient characters from 10 categories, for benchmarking pattern classification, with particular challenges on image noise and distortion. The training set totally consists of 27,222 images, and the test set contains 300 images per class. Oracle-MNIST shares the same data format with the original MNIST dataset, allowing for direct compatibility with all existing classifiers and systems, but it constitutes a more challenging classification task than MNIST. The images of ancient characters suffer from 1) extremely serious and unique noises caused by three-thousand years of burial and aging and 2) dramatically variant writing styles by ancient Chinese, which all make them realistic for machine learning research. The dataset is freely available at https://github.com/wm-bupt/oracle-mnist.

READ FULL TEXT

page 2

page 4

research
08/25/2017

Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms

We present Fashion-MNIST, a new dataset comprising of 28x28 grayscale im...
research
02/12/2022

Typography-MNIST (TMNIST): an MNIST-Style Image Dataset to Categorize Glyphs and Font-Styles

We present Typography-MNIST (TMNIST), a dataset comprising of 565,292 MN...
research
01/17/2022

OmniPrint: A Configurable Printed Character Synthesizer

We introduce OmniPrint, a synthetic data generator of isolated printed c...
research
06/22/2022

The ArtBench Dataset: Benchmarking Generative Models with Artworks

We introduce ArtBench-10, the first class-balanced, high-quality, cleanl...
research
01/14/2021

OrigamiSet1.0: Two New Datasets for Origami Classification and Difficulty Estimation

Origami is becoming more and more relevant to research. However, there i...
research
05/13/2022

Unsupervised Structure-Texture Separation Network for Oracle Character Recognition

Oracle bone script is the earliest-known Chinese writing system of the S...
research
07/21/2020

Enhancement of damaged-image prediction through Cahn-Hilliard Image Inpainting

We assess the benefit of including an image inpainting filter before pas...

Please sign up or login with your details

Forgot password? Click here to reset