- Top
- Services
Services
Baobab is not a crowdsourcing company.
Our ability to deliver the highest quality deliverables in the world in a time-sensitive manner is all because of our partners (Baoparts) who are constantly striving for the best quality.
We make sure Baoparts have the necessary training they need for each project they undertake.
Baobab's greatest strengths come from the professionalism of this unique community, imbued with a culture of mutual recognition of differences and mutual assistance, and whose members are thorough and polite in their communication.
Provided services
LLM development/data set creation for fine tuning/RLHF/model evaluation
Comprehensive service offering everything from the development of large language models (LLMs) by experts with extensive knowledge and experience in natural language AI development, to the creation of data sets for the fine tuning that is essential for improving the accuracy of the model, reinforcement learning from human feedback (RLHF) to maximise the performance of the AI, and evaluation of the resulting model.
Image annotation
We have been providing image annotation services since 2015 such as Bounding Box, Polygon, Semantic Segmentation and Keypoint for images, as well as Video annotation. We have also undertaken many multimodal projects such as captioning videos and still images. In addition to annotation, we also train and evaluate models, and provide Baobab AutoML Vision Report, which is an assessment report to help improve your data.
Audio transcription/annotation
Transcription and tagging of audio utilizing ELAN and other tools.
We provide this service in multiple languages including Japanese, English and Chinese.
Text annotation
Tagging, classification, pronoun extraction, etc.
Construction of training data for machine translation systems
Since Baobab's inception, we have always worked closely with research institutions and universities developing machine translation systems, for whom we have worked on many multi-million character projects, creating bilingual training data faster and at a more reasonable price than anywhere else.
Bilingual scenario creation in multiple languages
We create audio data of native speakers reading scripted conversations based on your required settings or predetermined conversation scenarios, or simulated conversations between 2 speakers speaking freely, and deliver written transcriptions in your desired format.
Image collection/sound collection
Our partners around the world working remotely collect images, multilingual speeches and other sounds using the Moringa mobile apps developed by Baobab.
Why Baobab?
Baobab's annotation work has been praised by clients around the world for the following reasons:
1.Construction of high-quality data
-
We have established sophisticated systems, organisations and workflows to ensure the output of high-quality deliverables. For each project, the project manager (Baocaptain) writes exact project specifications and clear guidelines for partners (Baoparts) and keeps in polite and effective communication with both clients and Baoparts, while the quality assurance manager performs daily progress and data checks and carries out a thorough quality check before delivery.
2.Speedy delivery
-
Our proprietary annotation tools are constantly updated in response to our partners' (Baoparts) feedback to facilitate the speedy creation of accurate data. We work together with Baoparts every day to improve productivity. Our system is developed in-house and is also safe in terms of security.
3.Small lot test orders
-
If you prefer a preview with a small amount of data before ordering the creation of a large amount of data, we have you covered. We gladly accept small lot test orders. Similarly, if you wish to carry out a rapid PDCA cycle to build a prototype, we are able to respond to your needs as necessary.
4.The diversity of working partners
-
We have multinational partners (Baoparts) located globally, providing multilingual voice data collection and translation services. This means we are able to respond immediately to large-scale, worldwide projects, as there is no need to create a new team.
Baopart locations include Japan, Vietnam, Thailand, China, Taiwan, English-speaking locations and elsewhere.
The Secret to Quality
The key to being able to speedily deliver high-quality data lies in Baobab's company structure, which is as follows:
-
1. A unique training program aimed at quality
At Baobab, all staff involved in any project (project captains, checkers, leaders, partners) learn the importance of annotation work and receive training to acquire the necessary skills required to produce quality work.
At the end of training, staff are assessed, and only those who pass are allowed to participate on a project. -
2. Assigning and training the right people
Annotation guidelines and elements to be annotated vary widely from project to project. For instance, even if working with the same elements, different guidelines can change the work entirely.
Whenever we assign work, we initially take partners' (Baoparts) skill levels into account. Each Baopart is then required to undergo project-specific training before participating, with participation on the actual project dependent on the successful completion of assessment tests. -
3. Internal communication to ensure high quality
The leader and checkers of each project provide partners (Baoparts) with full support throughout the entire process, from training to delivery. They are there to provide an environment that allows Baoparts to work with confidence: responding to questions promptly (within 24 hours) and revising guidelines when necessary.
Our culture is such that when issues do arise, we address the problem without blame, and offer praise where due, which allows for an atmosphere where Baoparts stay highly motivated in their work.
Baobab's Proprietary Tools for High-quality Annotations
We update the tools daily in-house to facilitate efficient and precise work.
-
Baobab Pose Annotation
A web-based tool for bounding objects (polygon/rectangle), adding key points and tagging them
-
Semantic Segmentation
A web-based tool for carrying out semantic segmentation
-
Baobab-Caption
A web-based tool for captioning videos and still images
-
Moringa-i
A smartphone app for collecting images and adding tags and captions to collected images
Project Management Structure
1) Project consultation
We hold consultation to gather details of your request and assess whether we can take on the project
2) Definition of requirements
Once project is accepted, specific requirements are established and guidelines are set for partners.
3) Worker training
To ensure the highest quality, we carry out orientation, training and tests in accordance with set guidelines, after which we select a partner and create a project team.
4) Annotation work
Partners carry out annotation work.
5) In-house quality checks
Two-stage data checking is carried out by leaders and checkers to ensure quality.
6) Delivery
Key Results in Image Annotation
-
Image captioning
Adding explanatory captions to images
- Volume:
- 400,000 captions
- Time taken:
- 90 days
-
MS COCO dataset
Re-annotation workBaobab carried out re-annotation for 5 tags within the MS COCO dataset
(Research paper accepted at CVPR WS 2022)- Volume:
- 637,717 objects
- Time taken:
- 26 days
-
Annotating traffic images
Placing bounding boxes and tags on people, vehicles, street fixtures, etc., in photos taken on public roads
- Volume:
- 48,875 Objects
- Time taken:
- 9 days
-
Fork pallet
annotation workPlacing bounding boxes and 4 key points on specified points of fork pallets
- Volume:
- 6,951 Objects
- Time taken:
- 5 days
-
Fruit annotation
Placing bounding boxes on fruits, and tagging according to ripeness
- Volume:
- 15,750 objects
- Time taken:
- 12 days
-
Annotation of road damage
Placing bounding boxes and tags on cracks in the road
- Volume:
- 12,236 Objects
- Time taken:
- 7 days
-
Tagging
occlusion levelsChecking the occlusion level of target objects and assigning one of 3 types of tag
- Volume:
- 100,841 Objects
- Time taken:
- 11 days
-
Building annotation
Enclosing outlines of buildings with polygons and tagging
- Volume:
- 2,290 objects
- Time taken:
- 8 days
-
Annotation of satellite images
Enclosing specified terrain in satellite images with polygons and tagging
- Volume:
- 1,783 Objects
- Time taken:
- 5 days
-
Face annotation
Marking 68 specified parts of the face with keypoints
- Volume:
- 839 Objects
- Time taken:
- 12 days
-
Annotation of monkeys
Enclosing the outlines of monkeys with polygons and marking 17 specified parts of the body with keypoints, such as joints and facial features
- Volume:
- 10,000 objects
- Time taken:
- 9 days
-
Annotating marine debris
Performing semantic segmentation by type for marine debris washed ashore photographed by drone
- Volume:
- 5,650 Segments
- Time taken:
- 9 days
-
Terrain annotation
Performing semantic segmentation for specified terrain
- Volume:
- 21,346 Segments
- Time taken:
- 28 days
Key Results in Other Works (Text and NLP-related Training Data)
-
Dialogue transcription and labelling
Transcribing dialogues and labelling specific parts of speech
- Volume:
- 200 dialogues
(approx. 30 hours' worth)
-
Dialogue data annotation
Separated by speech segmentation and tagged by speech intent
- Volume:
- 340 dialogues
(approx. 110 hours' worth)
-
Annotating names
Extracting names from dialogue text and tagging the attribute category and relationship
- Volume:
- 500 dialogues
-
Quality assessment of Japanese text
Evaluating the fluency and appropriateness of Japanese text
- Volume:
- 20,000 texts
-
Japanese article summary corpus creation
Extracting keywords to create article summaries
Creating article summaries using keywords extracted in advance- Volume:
- 1,700 texts
-
Creation of reading comprehension data
Creating questions that can answered by reading specific texts, and marking where the answers appear in the text
- Volume:
- 3,200 sets
-
Annotating relationship
between labelsAnnotating the relationship between 2 labels
- Volume:
- 230,000 pairs
-
Labelling tweets
Labelling Japanese tweets
- Volume:
- 10,000 tweets
-
Creation of a bilingual corpus of
dialogue dataTranslation work to create a Japanese-English corpus of dialogue data
- Volume:
- 460 dialogues
-
Collection of
English text dataDescribing the difference between 2 audio files in English
- Volume:
- 40,000 sentences
-
Creation of corpus of dialogues
in a business settingCreating Japanese-English dialogue scenarios for use in machine translation
- Volume:
- 97,000 examples
Client testimonials
Daisuke Okanohara
"Baobab creates high quality image annotation data sets for us according to various different requirements. Furthermore, since the annotators are individually managed, we also entrust the company with the annotation of highly sensitive data."
Graham Neubig
"I have asked Baobab to create data for research many times, and I really appreciate their willingness and flexibility in responding to even slightly unusual requests. I thoroughly recommend them."