StruBERT: Structure-aware BERT for Table Search and Matching

03/27/2022
by   Mohamed Trabelsi, et al.
0

A large amount of information is stored in data tables. Users can search for data tables using a keyword-based query. A table is composed primarily of data values that are organized in rows and columns providing implicit structural information. A table is usually accompanied by secondary information such as the caption, page title, etc., that form the textual information. Understanding the connection between the textual and structural information is an important yet neglected aspect in table retrieval as previous methods treat each source of information independently. In addition, users can search for data tables that are similar to an existing table, and this setting can be seen as a content-based table retrieval. In this paper, we propose StruBERT, a structure-aware BERT model that fuses the textual and structural information of a data table to produce context-aware representations for both textual and tabular content of a data table. StruBERT features are integrated in a new end-to-end neural ranking model to solve three table-related downstream tasks: keyword- and content-based table retrieval, and table similarity. We evaluate our approach using three datasets, and we demonstrate substantial improvements in terms of retrieval and classification metrics over state-of-the-art methods.

READ FULL TEXT
research
05/13/2021

Semantic Table Retrieval using Keyword and Table Queries

Tables on the Web contain a vast amount of knowledge in a structured for...
research
10/30/2020

Semantic Labeling Using a Deep Contextualized Language Model

Generating schema labels automatically for column values of data tables ...
research
09/07/2020

A Lightweight Algorithm to Uncover Deep Relationships in Data Tables

Many data we collect today are in tabular form, with rows as records and...
research
11/13/2018

Text Assisted Insight Ranking Using Context-Aware Memory Network

Extracting valuable facts or informative summaries from multi-dimensiona...
research
02/16/2018

Ad Hoc Table Retrieval using Semantic Similarity

We introduce and address the problem of ad hoc table retrieval: answerin...
research
10/16/2019

Content Enhanced BERT-based Text-to-SQL Generation

We present a simple methods to leverage the table content for the BERT-b...
research
08/25/2020

TabSim: A Siamese Neural Network for Accurate Estimation of Table Similarity

Tables are a popular and efficient means of presenting structured inform...

Please sign up or login with your details

Forgot password? Click here to reset