Baobab is not a crowdsourcing company.
Our ability to deliver the highest quality deliverables in the world in a time-sensitive manner is all because of our partners (Baoparts) who are constantly striving for the best quality.
We make sure Baoparts have the necessary training they need for each project they undertake.
Baobab's greatest strengths come from the professionalism of this unique community, imbued with a culture of mutual recognition of differences and mutual assistance, and whose members are thorough and polite in their communication.

Provided services

LLM development/data set creation for fine tuning/RLHF/model evaluation

Comprehensive service offering everything from the development of large language models (LLMs) by experts with extensive knowledge and experience in natural language AI development, to the creation of data sets for the fine tuning that is essential for improving the accuracy of the model, reinforcement learning from human feedback (RLHF) to maximise the performance of the AI, and evaluation of the resulting model.

Image annotation

We have been providing image annotation services since 2015 such as Bounding Box, Polygon, Semantic Segmentation and Keypoint for images, as well as Video annotation. We have also undertaken many multimodal projects such as captioning videos and still images. In addition to annotation, we also train and evaluate models, and provide Baobab AutoML Vision Report, which is an assessment report to help improve your data.

Audio transcription/annotation

Transcription and tagging of audio utilizing ELAN and other tools.
We provide this service in multiple languages including Japanese, English and Chinese.

Text annotation

Tagging, classification, pronoun extraction, etc.

Construction of training data for machine translation systems

Since Baobab's inception, we have always worked closely with research institutions and universities developing machine translation systems, for whom we have worked on many multi-million character projects, creating bilingual training data faster and at a more reasonable price than anywhere else.

Bilingual scenario creation in multiple languages

We create audio data of native speakers reading scripted conversations based on your required settings or predetermined conversation scenarios, or simulated conversations between 2 speakers speaking freely, and deliver written transcriptions in your desired format.

Image collection/sound collection

Our partners around the world working remotely collect images, multilingual speeches and other sounds using the Moringa mobile apps developed by Baobab.

Why Baobab?

Baobab's annotation work has been praised by clients around the world for the following reasons:

1.Construction of high-quality data

  • We have established sophisticated systems, organisations and workflows to ensure the output of high-quality deliverables. For each project, the project manager (Baocaptain) writes exact project specifications and clear guidelines for partners (Baoparts) and keeps in polite and effective communication with both clients and Baoparts, while the quality assurance manager performs daily progress and data checks and carries out a thorough quality check before delivery.

2.Speedy delivery

  • Our proprietary annotation tools are constantly updated in response to our partners' (Baoparts) feedback to facilitate the speedy creation of accurate data. We work together with Baoparts every day to improve productivity. Our system is developed in-house and is also safe in terms of security.

3.Small lot test orders

  • If you prefer a preview with a small amount of data before ordering the creation of a large amount of data, we have you covered. We gladly accept small lot test orders. Similarly, if you wish to carry out a rapid PDCA cycle to build a prototype, we are able to respond to your needs as necessary.

4.The diversity of working partners

  • We have multinational partners (Baoparts) located globally, providing multilingual voice data collection and translation services. This means we are able to respond immediately to large-scale, worldwide projects, as there is no need to create a new team.
    Baopart locations include Japan, Vietnam, Thailand, China, Taiwan, English-speaking locations and elsewhere.

The Secret to Quality

The key to being able to speedily deliver high-quality data lies in Baobab's company structure, which is as follows:

  • 1. A unique training program aimed at quality

    At Baobab, all staff involved in any project (project captains, checkers, leaders, partners) learn the importance of annotation work and receive training to acquire the necessary skills required to produce quality work.
    At the end of training, staff are assessed, and only those who pass are allowed to participate on a project.

  • 2. Assigning and training the right people

    Annotation guidelines and elements to be annotated vary widely from project to project. For instance, even if working with the same elements, different guidelines can change the work entirely.
    Whenever we assign work, we initially take partners' (Baoparts) skill levels into account. Each Baopart is then required to undergo project-specific training before participating, with participation on the actual project dependent on the successful completion of assessment tests.

  • 3. Internal communication to ensure high quality

    The leader and checkers of each project provide partners (Baoparts) with full support throughout the entire process, from training to delivery. They are there to provide an environment that allows Baoparts to work with confidence: responding to questions promptly (within 24 hours) and revising guidelines when necessary.
    Our culture is such that when issues do arise, we address the problem without blame, and offer praise where due, which allows for an atmosphere where Baoparts stay highly motivated in their work.

Baobab's Proprietary Tools for High-quality Annotations

We update the tools daily in-house to facilitate efficient and precise work.

  • Baobab Pose Annotation

    A web-based tool for bounding objects (polygon/rectangle), adding key points and tagging them

  • Semantic Segmentation

    A web-based tool for carrying out semantic segmentation

  • Baobab-Caption

    A web-based tool for captioning videos and still images

  • Moringa-i

    A smartphone app for collecting images and adding tags and captions to collected images

Project Management Structure

1) Project consultation

We hold consultation to gather details of your request and assess whether we can take on the project

2) Definition of requirements

Once project is accepted, specific requirements are established and guidelines are set for partners.

3) Worker training

To ensure the highest quality, we carry out orientation, training and tests in accordance with set guidelines, after which we select a partner and create a project team.

4) Annotation work

Partners carry out annotation work.

5) In-house quality checks

Two-stage data checking is carried out by leaders and checkers to ensure quality.

6) Delivery

Key Results in Image Annotation

  • Image captioning

    Adding explanatory captions to images

    400,000 captions
    Time taken:
    90 days
  • MS COCO dataset
    Re-annotation work

    Baobab carried out re-annotation for 5 tags within the MS COCO dataset
    (Research paper accepted at CVPR WS 2022)

    637,717 objects
    Time taken:
    26 days
  • Annotating traffic images

    Placing bounding boxes and tags on people, vehicles, street fixtures, etc., in photos taken on public roads

    48,875 Objects
    Time taken:
    9 days
  • Fork pallet
    annotation work

    Placing bounding boxes and 4 key points on specified points of fork pallets

    6,951 Objects
    Time taken:
    5 days
  • Fruit annotation

    Placing bounding boxes on fruits, and tagging according to ripeness

    15,750 objects
    Time taken:
    12 days
  • Annotation of road damage

    Placing bounding boxes and tags on cracks in the road

    12,236 Objects
    Time taken:
    7 days
  • Tagging
    occlusion levels

    Checking the occlusion level of target objects and assigning one of 3 types of tag

    100,841 Objects
    Time taken:
    11 days
  • Building annotation

    Enclosing outlines of buildings with polygons and tagging

    2,290 objects
    Time taken:
    8 days
  • Annotation of satellite images

    Enclosing specified terrain in satellite images with polygons and tagging

    1,783 Objects
    Time taken:
    5 days
  • Face annotation

    Marking 68 specified parts of the face with keypoints

    839 Objects
    Time taken:
    12 days
  • Annotation of monkeys

    Enclosing the outlines of monkeys with polygons and marking 17 specified parts of the body with keypoints, such as joints and facial features

    10,000 objects
    Time taken:
    9 days
  • Annotating marine debris

    Performing semantic segmentation by type for marine debris washed ashore photographed by drone

    5,650 Segments
    Time taken:
    9 days
  • Terrain annotation

    Performing semantic segmentation for specified terrain

    21,346 Segments
    Time taken:
    28 days

Key Results in Other Works (Text and NLP-related Training Data)

  • Dialogue transcription and labelling

    Transcribing dialogues and labelling specific parts of speech

    200 dialogues
    (approx. 30 hours' worth)
  • Dialogue data annotation

    Separated by speech segmentation and tagged by speech intent

    340 dialogues
    (approx. 110 hours' worth)
  • Annotating names

    Extracting names from dialogue text and tagging the attribute category and relationship

    500 dialogues
  • Quality assessment of Japanese text

    Evaluating the fluency and appropriateness of Japanese text

    20,000 texts
  • Japanese article summary corpus creation

    Extracting keywords to create article summaries
    Creating article summaries using keywords extracted in advance

    1,700 texts
  • Creation of reading comprehension data

    Creating questions that can answered by reading specific texts, and marking where the answers appear in the text

    3,200 sets
  • Annotating relationship
    between labels

    Annotating the relationship between 2 labels

    230,000 pairs
  • Labelling tweets

    Labelling Japanese tweets

    10,000 tweets
  • Creation of a bilingual corpus of
    dialogue data

    Translation work to create a Japanese-English corpus of dialogue data

    460 dialogues
  • Collection of
    English text data

    Describing the difference between 2 audio files in English

    40,000 sentences
  • Creation of corpus of dialogues
    in a business setting

    Creating Japanese-English dialogue scenarios for use in machine translation

    97,000 examples

Client testimonials

Chief Operating Officer, Preferred Networks, Inc.

Daisuke Okanohara

"Baobab creates high quality image annotation data sets for us according to various different requirements. Furthermore, since the annotators are individually managed, we also entrust the company with the annotation of highly sensitive data."

Associate Professor at the Carnegie Mellon University Language Technology Institute (CMU-LTI)

Graham Neubig

"I have asked Baobab to create data for research many times, and I really appreciate their willingness and flexibility in responding to even slightly unusual requests. I thoroughly recommend them."