Semantic Labeling Using a Deep Contextualized Language Model

10/30/2020
by   Mohamed Trabelsi, et al.
0

Generating schema labels automatically for column values of data tables has many data science applications such as schema matching, and data discovery and linking. For example, automatically extracted tables with missing headers can be filled by the predicted schema labels which significantly minimizes human effort. Furthermore, the predicted labels can reduce the impact of inconsistent names across multiple data tables. Understanding the connection between column values and contextual information is an important yet neglected aspect as previously proposed methods treat each column independently. In this paper, we propose a context-aware semantic labeling method using both the column values and context. Our new method is based on a new setting for semantic labeling, where we sequentially predict labels for an input table with missing headers. We incorporate both the values and context of each data column using the pre-trained contextualized language model, BERT, that has achieved significant improvements in multiple natural language processing tasks. To our knowledge, we are the first to successfully apply BERT to solve the semantic labeling task. We evaluate our approach using two real-world datasets from different domains, and we demonstrate substantial improvements in terms of evaluation metrics over state-of-the-art feature-based methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/27/2022

StruBERT: Structure-aware BERT for Table Search and Matching

A large amount of information is stored in data tables. Users can search...
research
11/14/2019

Sato: Contextual Semantic Type Detection in Tables

Detecting the semantic types of data columns in relational tables is imp...
research
07/09/2021

Can Deep Neural Networks Predict Data Correlations from Column Names?

For humans, it is often possible to predict data correlations from colum...
research
09/20/2019

Automatic Table completion using Knowledge Base

Table is a popular data format to organize and present relational inform...
research
06/24/2021

DCoM: A Deep Column Mapper for Semantic Data Type Detection

Detection of semantic data types is a very crucial task in data science ...
research
12/15/2022

DeepJoin: Joinable Table Discovery with Pre-trained Language Models

Due to the usefulness in data enrichment for data analysis tasks, joinab...
research
11/08/2022

Active Learning with Tabular Language Models

Despite recent advancements in tabular language model research, real-wor...

Please sign up or login with your details

Forgot password? Click here to reset