Structure-aware Pre-training for Table Understanding with Tree-based Transformers

10/21/2020
by   Zhiruo Wang, et al.
0

Tables are widely used with various structures to organize and present data. Recent attempts on table understanding mainly focus on relational tables, yet overlook to other common table structures. In this paper, we propose TUTA, a unified pre-training architecture for understanding generally structured tables. Since understanding a table needs to leverage both spatial, hierarchical, and semantic information, we adapt the self-attention strategy with several key structure-aware mechanisms. First, we propose a novel tree-based structure called a bi-dimensional coordinate tree, to describe both the spatial and hierarchical information in tables. Upon this, we extend the pre-training architecture with two core mechanisms, namely the tree-based attention and tree-based position embedding. Moreover, to capture table information in a progressive manner, we devise three pre-training objectives to enable representations at the token, cell, and table levels. TUTA pre-trains on a wide range of unlabeled tables and fine-tunes on a critical task in the field of table structure understanding, i.e. cell type classification. Experiment results show that TUTA is highly effective, achieving state-of-the-art on four well-annotated cell type classification datasets.

READ FULL TEXT

page 4

page 8

page 10

research
06/26/2020

TURL: Table Understanding through Representation Learning

Relational tables on the Web store a vast amount of knowledge. Owing to ...
research
06/01/2021

Volta at SemEval-2021 Task 9: Statement Verification and Evidence Finding with Tables using TAPAS and Transfer Learning

Tables are widely used in various kinds of documents to present informat...
research
06/06/2021

TabularNet: A Neural Network Architecture for Understanding Semantic Structures of Tabular Data

Tabular data are ubiquitous for the widespread applications of tables an...
research
10/17/2022

Table-To-Text generation and pre-training with TabT5

Encoder-only transformer models have been successfully applied to differ...
research
05/16/2023

Generative Table Pre-training Empowers Models for Tabular Prediction

Recently, the topic of table pre-training has attracted considerable res...
research
04/20/2018

A Mixed Hierarchical Attention based Encoder-Decoder Approach for Standard Table Summarization

Structured data summarization involves generation of natural language su...
research
01/05/2022

TableParser: Automatic Table Parsing with Weak Supervision from Spreadsheets

Tables have been an ever-existing structure to store data. There exist n...

Please sign up or login with your details

Forgot password? Click here to reset