1. Overview

In this article, we’ll present some popular datasets in the field of computer vision. First, we’ll discuss the importance of having large-scale open-source datasets in computer vision. Then, we’ll talk about three popular datasets that are ImageNet, MS COCO, and Google Open images. Finally, we’ll briefly mention the Kaggle datasets.

2. Datasets: Importance and Problems

Computer vision is considered one of the main fields where deep learning achieved excellent results and revolutionalized the way scientists develop algorithms. However, deep learning models need more and more data in order to work properly and achieve these results. As a result, the development of large-scale open-source datasets has become a necessity in computer vision in recent years.

Hopefully, more and more computer vision datasets are created every day for a variety of visual tasks like object detection, segmentation, classification, captioning, pose estimation, etc. In this article, we’ll present three of them that are widely used by many companies, researchers, and individuals.

3. ImageNet

ImageNet is one of the most important datasets in computer vision since it is the first open-source large-scale dataset that was widely used for deep learning algorithms. It was developed by a group of researchers at Stanford, Princeton, and UNC Chapel Hill.

The initial idea of ImageNet was to create a WordNet (that is a large lexical database of English) for images. For each concept of WordNet, the researchers collected around one thousand images by querying search engines. The quality of these images was then evaluated by Amazon Mechanical Turk.

The large success of ImageNet came up when deep learning for computer vision gained popularity, and a large dataset like ImageNet was a necessity. Specifically, ImageNet Large Scale Visual Recognition Challenge is a subset of the ImageNet dataset that is considered the most popular benchmark for image classification algorithms. Every classification algorithm that is proposed should be evaluated on this benchmark before being published. It consists of around 1 million training, half a million validation, and 100 thousand test images for 1000 object classes.

In the image below, we can some example images from ImageNet in increasing semantic specificity of WordNet:

imagenet

4. MS COCO

Another very popular dataset for computer vision is MS COCO which stands for Common Objects in Context and was developed by Microsoft. It is a large-scale dataset that can be used for object detection, segmentation, and captioning.

It is considered the main benchmark for evaluating object detection algorithms. Also, its large popularity lies in the fact that it contains images that are annotated for many different tasks. Specifically, a COCO image contains the following annotations:

  • Coordinates of bounding boxes and segmentation masks for object detection.
  • Natural language descriptions for captioning.
  • Keypoints of humans that are present in the image.
  • full scene segmentation with 80 categories that correspond to things (like bicycles, cats, etc) and 91 categories that have to do with stuff (like the sky, the road, etc).
  • Dense pose of humans.

Also, an important characteristic of COCO is that the objects in the images are in a natural context, as the name suggests, which means that they appear in real-life scenes where sometimes objects are occluded. So, it is a challenging dataset that is appropriate for evaluating a deep learning algorithm that should work in real-life scenarios.

Below, we can see some example images of the COCO dataset for each type of annotation:
coco

5. Google Open Images

The third dataset that we will discuss in this article is Google Open Images which was created by Google. It consists of around 9 million images that are annotated with more than 6000 classes. Google Open Images gained a lot of popularity due to its large variety of classes, contrary to ImageNet, which contains 1000 classes.

Another characteristic of Google Open Images is the fact that each image has about 8 different labels assigned. For example, we can see that the two below images have various labels since they contain more than one object:
open images

6. Kaggle Datasets

The three above datasets are very popular and cover a variety of tasks. However, the number of different visual tasks that exist is much greater. So, where can someone search for a dataset that fits his specific needs?

Kaggle dataset is a place where we can search and explore over 100 thousand datasets. Also, there is a specific computer vision section where 1,758 visual datasets exist. A very useful feature of these datasets is the fact that for every dataset, there are notebooks of people that worked on it and trained ML models. This acts as a very good starting point when someone wants to experiment with a new dataset.

7. Conclusion

In this tutorial, we talked about three popular datasets in computer vision, namely ImageNet, MS COCO, and Google Open Images. Then, we also mentioned Kaggle datasets for those who want to experiment with other datasets.