Column Type Annotation using ChatGPT

06/01/2023
by   Keti Korini, et al.
0

Column type annotation is the task of annotating the columns of a relational table with the semantic type of the values contained in each column. Column type annotation is a crucial pre-processing step for data search and integration in the context of data lakes. State-of-the-art column type annotation methods either rely on matching table columns to properties of a knowledge graph or fine-tune pre-trained language models such as BERT for the column type annotation task. In this work, we take a different approach and explore using ChatGPT for column type annotation. We evaluate different prompt designs in zero- and few-shot settings and experiment with providing task definitions and detailed instructions to the model. We further implement a two-step table annotation pipeline which first determines the class of the entities described in the table and depending on this class asks ChatGPT to annotate columns using only the relevant subset of the overall vocabulary. Using instructions as well as the two-step pipeline, ChatGPT reaches F1 scores of over 85 model needs to be fine-tuned with 300 examples. This comparison shows that ChatGPT is able deliver competitive results for the column type annotation task given no or only a minimal amount of task-specific demonstrations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/17/2021

AMALGAM: A Matching Approach to fairfy tabuLar data with knowledGe grAph Model

In this paper we present AMALGAM, a matching approach to fairify tabular...
research
06/16/2023

CHORUS: Foundation Models for Unified Data Discovery and Exploration

We explore the application of foundation models to data discovery and ex...
research
12/15/2020

Semantic Annotation for Tabular Data

Detecting semantic concept of columns in tabular data is of particular i...
research
12/15/2022

DeepJoin: Joinable Table Discovery with Pre-trained Language Models

Due to the usefulness in data enrichment for data analysis tasks, joinab...
research
07/24/2023

Comprehending Semantic Types in JSON Data with Graph Neural Networks

Semantic types are a more powerful and detailed way of describing data t...
research
06/03/2023

Extending an Event-type Ontology: Adding Verbs and Classes Using Fine-tuned LLMs Suggestions

In this project, we have investigated the use of advanced machine learni...
research
09/27/2022

SANTOS: Relationship-based Semantic Table Union Search

Existing techniques for unionable table search define unionability using...

Please sign up or login with your details

Forgot password? Click here to reset