The Effect of Improving Annotation Quality on Object Detection Datasets: A Preliminary Study

In this study, we partially reannotate conventional benchmark datasets for object detection and check whether there is performance improvement/drop compared with the original annotations.

Recent studies on the annotation qualities of ImageNet for image classification revealed some issues of how to associate only a single label to each image accurately. Object detection, on the other hand, should have other nontrivial issues because there are multiple objects in a single image, and realizing consistency among bounding boxes is challenging.

A team of professional annotators was formed for MS COCO and Google Open Images datasets. To realize highly-consistent annotations, we prepared carefully designed guidelines for each category and selected quality inspectors who checked the annotation quality of each annotator. Finally, we applied conventional object detection methods for reannotated parts of each dataset. We found mixed results: whether the performance dropped or improved depended on each category and dataset.


If you would like to download the dataset, please fill out the form below with your contact information.


  • Image annotation

    We have been providing image annotation services since 2015 such as Bounding Box, Polygon, Semantic Segmentation and Keypoint for images, as well as Video annotation. We have also undertaken many multimodal projects such as captioning videos and still images. In addition to annotation, we also train and evaluate models, and provide Baobab AutoML Vision Report, which is an assessment report to help improve your data.

  • Audio transcription/annotation

    Transcription and tagging of audio utilizing ELAN and other tools.
    We provide this service in multiple languages including Japanese, English and Chinese.

  • Text annotation

    Tagging, classification, pronoun extraction, etc.

  • Construction of training data for machine translation systems

    Since Baobab's inception, we have always worked closely with research institutions and universities developing machine translation systems, for whom we have worked on many multi-million character projects, creating bilingual training data faster and at a more reasonable price than anywhere else.

  • Bilingual scenario creation in multiple languages

    We create audio data of native speakers reading scripted conversations based on your required settings or predetermined conversation scenarios, or simulated conversations between 2 speakers speaking freely, and deliver written transcriptions in your desired format.

  • Image collection/sound collection

    Our partners around the world working remotely collect images, multilingual speeches and other sounds using the Moringa mobile apps developed by Baobab.