Semantic Annotation for Tabular Data

12/15/2020
by   Udayan Khurana, et al.
7

Detecting semantic concept of columns in tabular data is of particular interest to many applications ranging from data integration, cleaning, search to feature engineering and model building in machine learning. Recently, several works have proposed supervised learning-based or heuristic pattern-based approaches to semantic type annotation. Both have shortcomings that prevent them from generalizing over a large number of concepts or examples. Many neural network based methods also present scalability issues. Additionally, none of the known methods works well for numerical data. We propose C^2, a column to concept mapper that is based on a maximum likelihood estimation approach through ensembles. It is able to effectively utilize vast amounts of, albeit somewhat noisy, openly available table corpora in addition to two popular knowledge graphs to perform effective and efficient concept prediction for structured data. We demonstrate the effectiveness of C^2 over available techniques on 9 datasets, the most comprehensive comparison on this topic so far.

READ FULL TEXT

page 6

page 8

research
06/01/2023

Column Type Annotation using ChatGPT

Column type annotation is the task of annotating the columns of a relati...
research
11/14/2019

Sato: Contextual Semantic Type Detection in Tables

Detecting the semantic types of data columns in relational tables is imp...
research
05/25/2019

Sherlock: A Deep Learning Approach to Semantic Data Type Detection

Correctly detecting the semantic type of data columns is crucial for dat...
research
11/04/2018

ColNet: Embedding the Semantics of Web Tables for Column Type Prediction

Automatically annotating column types with knowledge base (KB) concepts ...
research
04/26/2011

A Machine Learning Based Analytical Framework for Semantic Annotation Requirements

The Semantic Web is an extension of the current web in which information...
research
06/24/2021

DCoM: A Deep Column Mapper for Semantic Data Type Detection

Detection of semantic data types is a very crucial task in data science ...
research
07/10/2021

Prediction of concept lengths for fast concept learning in description logics

Concept learning approaches based on refinement operators explore partia...

Please sign up or login with your details

Forgot password? Click here to reset