What is MNIST?
The MNIST database, an extension of the NIST database, is a low-complexity data collection of handwritten digits used to train and test various supervised machine learning algorithms. The database contains 70,000 28x28 black and white images representing the digits zero through nine. The data is split into two subsets, with 60,000 images belonging to the training set and 10,000 images belonging to the testing set. The separation of images ensures that given what an adequately trained model has learned previously, it can accurately classify relevant images not previously examined.
As it can be seen from the image above, the handwritten digits consist of varying styles and complexities. For example, in the first column, there are three 3s with distinct defining characteristics. These digits further differ from the 3s that exist in column five. The variety in the dataset gives robustness to an appropriately trained model, which is evident through the accuracy on the testing data (up to 99.65%) when fed through such model.
Purpose of Database and its Applications
In simple terms, MNIST can be thought of as the “Hello, World!” of machine learning. MNIST is primarily used to experiment with different machine learning algorithms and to compare their relative strengths. Yann LeCun, one of the three researchers behind the creation of MNIST, has devoted a portion of his research to using MNIST to experiment with cutting edge algorithms, which can be seen on his personal website yann.lecun.com. Many researchers, hobbyists, and students alike continue to use MNIST alongside their algorithmic implementations and other popular datasets as a way to solidify their understanding of the fundamental concepts in machine learning and to compare their new algorithms against existing cutting edge research.