In 2016, we introduced Open Images, a collaborative release of ~9 million images annotated with labels spanning thousands of object categories. Since its initial release, we've been hard at work updating and refining the dataset, in order to provide a useful resource for the computer vision community to develop new models.
Today, we are happy to announce Open Images V4, containing 15.4M bounding-boxes for 600 categories on 1.9M images, making it the largest existing dataset with object location annotations. The boxes have been largely manually drawn by professional annotators to ensure accuracy and consistency. The images are very diverse and often contain complex scenes with several objects (8 per image on average; visualizer).
|Annotated images from the Open Images dataset. Left: Mark Paul Gosselaar plays the guitar by Rhys A. Right: Civilization by Paul Downey. Both images used under CC BY 2.0 license.|
This challenge is unique in several ways:
- 12.2M bounding-box annotations for 500 categories on 1.7M training images,
- A broader range of categories than previous detection challenges, including new objects such as “fedora” and “snowman”.
- In addition to the object detection main track, the challenge includes a Visual Relationship Detection track, on detecting pairs of objects in particular relations, e.g. “woman playing guitar”.
In addition to the above, Open Images V4 also contains 30.1M human-verified image-level labels for 19,794 categories, which are not part of the Challenge. The dataset includes 5.5M image-level labels generated by tens of thousands of users from all over the world at crowdsource.google.com.