Thinking Like an Annotator: Generation of Dataset Labeling Instructions

06/24/2023
by   Nadine Chang, et al.
0

Large-scale datasets are essential to modern day deep learning. Advocates argue that understanding these methods requires dataset transparency (e.g. "dataset curation, motivation, composition, collection process, etc..."). However, almost no one has suggested the release of the detailed definitions and visual category examples provided to annotators - information critical to understanding the structure of the annotations present in each dataset. These labels are at the heart of public datasets, yet few datasets include the instructions that were used to generate them. We introduce a new task, Labeling Instruction Generation, to address missing publicly available labeling instructions. In Labeling Instruction Generation, we take a reasonably annotated dataset and: 1) generate a set of examples that are visually representative of each category in the dataset; 2) provide a text label that corresponds to each of the examples. We introduce a framework that requires no model training to solve this task and includes a newly created rapid retrieval system that leverages a large, pre-trained vision and language model. This framework acts as a proxy to human annotators that can help to both generate a final labeling instruction set and evaluate its quality. Our framework generates multiple diverse visual and text representations of dataset categories. The optimized instruction set outperforms our strongest baseline across 5 folds by 7.06 mAP for NuImages and 12.9 mAP for COCO.

READ FULL TEXT

page 1

page 5

page 6

page 8

page 9

page 14

page 19

page 20

research
04/17/2023

LongForm: Optimizing Instruction Tuning for Long Text Generation with Corpus Extraction

Instruction tuning enables language models to generalize more effectivel...
research
06/07/2023

M^3IT: A Large-Scale Dataset towards Multi-Modal Multilingual Instruction Tuning

Instruction tuning has significantly advanced large language models (LLM...
research
06/26/2023

Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning

Despite the promising progress in multi-modal tasks, current large multi...
research
04/17/2019

CraftAssist Instruction Parsing: Semantic Parsing for a Minecraft Assistant

We propose a large scale semantic parsing dataset focused on instruction...
research
05/25/2023

RewriteLM: An Instruction-Tuned Large Language Model for Text Rewriting

Large Language Models (LLMs) have demonstrated impressive zero-shot capa...
research
10/27/2022

Bridging the visual gap in VLN via semantically richer instructions

The Visual-and-Language Navigation (VLN) task requires understanding a t...
research
06/30/2023

Ticket-BERT: Labeling Incident Management Tickets with Language Models

An essential aspect of prioritizing incident tickets for resolution is e...

Please sign up or login with your details

Forgot password? Click here to reset