CoCo Dataset

What is the COCO Dataset?

The Common Objects in Context (COCO) dataset is one of the most popular open source object recognition databases used to train deep learning programs. This database includes hundreds of thousands of images with millions of already labeled objects for training.

Arguably the most important element of supervised machine learning is access to a large and well documented dataset to learn from. Sponsored by Microsoft, COCO segments images into categories and object, while also providing machine-readable context captions and tags. This all drastically cuts down on the basic training time for any AI that needs to process images.


Other Popular Machine Learning Datasets:

  • MNIST - MNIST is a dataset of handwritten digits, with a training set of 60,000 examples and a test set of 10,000 examples.
  • ImageNet - ImageNet is a dataset of images organized according to the WordNet hierarchy, with approximately 100,000 phrases and about 1000 images on average to illustrate each phrase.
  • Open Images Dataset - Open Images is a dataset of almost 9 million URLs for images annotated with label bounding boxes spanning thousands of classes.
  • VisualQA - VQA is a dataset containing open-ended questions about images.