Semantic Annotation for Tabular Data

by   Udayan Khurana, et al.

Detecting semantic concept of columns in tabular data is of particular interest to many applications ranging from data integration, cleaning, search to feature engineering and model building in machine learning. Recently, several works have proposed supervised learning-based or heuristic pattern-based approaches to semantic type annotation. Both have shortcomings that prevent them from generalizing over a large number of concepts or examples. Many neural network based methods also present scalability issues. Additionally, none of the known methods works well for numerical data. We propose C^2, a column to concept mapper that is based on a maximum likelihood estimation approach through ensembles. It is able to effectively utilize vast amounts of, albeit somewhat noisy, openly available table corpora in addition to two popular knowledge graphs to perform effective and efficient concept prediction for structured data. We demonstrate the effectiveness of C^2 over available techniques on 9 datasets, the most comprehensive comparison on this topic so far.


page 6

page 8


Column Type Annotation using ChatGPT

Column type annotation is the task of annotating the columns of a relati...

Sato: Contextual Semantic Type Detection in Tables

Detecting the semantic types of data columns in relational tables is imp...

Sherlock: A Deep Learning Approach to Semantic Data Type Detection

Correctly detecting the semantic type of data columns is crucial for dat...

ColNet: Embedding the Semantics of Web Tables for Column Type Prediction

Automatically annotating column types with knowledge base (KB) concepts ...

A Machine Learning Based Analytical Framework for Semantic Annotation Requirements

The Semantic Web is an extension of the current web in which information...

DCoM: A Deep Column Mapper for Semantic Data Type Detection

Detection of semantic data types is a very crucial task in data science ...

Prediction of concept lengths for fast concept learning in description logics

Concept learning approaches based on refinement operators explore partia...

Please sign up or login with your details

Forgot password? Click here to reset