HYTREL: Hypergraph-enhanced Tabular Data Representation Learning

07/14/2023
by   Pei Chen, et al.
0

Language models pretrained on large collections of tabular data have demonstrated their effectiveness in several downstream tasks. However, many of these models do not take into account the row/column permutation invariances, hierarchical structure, etc. that exist in tabular data. To alleviate these limitations, we propose HYTREL, a tabular language model, that captures the permutation invariances and three more structural properties of tabular data by using hypergraphs - where the table cells make up the nodes and the cells occurring jointly together in each row, column, and the entire table are used to form three different types of hyperedges. We show that HYTREL is maximally invariant under certain conditions for tabular data, i.e., two tables obtain the same representations via HYTREL iff the two tables are identical up to permutations. Our empirical results demonstrate that HYTREL consistently outperforms other competitive baselines on four downstream tasks with minimal pretraining, illustrating the advantages of incorporating the inductive biases associated with tabular data into the representations. Finally, our qualitative analyses showcase that HYTREL can assimilate the table structures to generate robust representations for the cells, rows, columns, and the entire table.

READ FULL TEXT
research
05/06/2021

TABBIE: Pretrained Representations of Tabular Data

Existing work on tabular representation learning jointly models tables a...
research
03/01/2022

TableFormer: Robust Transformer Modeling for Table-Text Encoding

Understanding tables is an important aspect of natural language understa...
research
05/31/2019

Table2Vec: Neural Word and Entity Embeddings for Table Population and Retrieval

Tables contain valuable knowledge in a structured form. We employ neural...
research
11/13/2021

Visual Understanding of Complex Table Structures from Document Images

Table structure recognition is necessary for a comprehensive understandi...
research
03/28/2021

StreamTable: An Area Proportional Visualization for Tables with Flowing Streams

Let M be an r× c table with each cell weighted by a nonzero positive num...
research
05/19/2022

TransTab: Learning Transferable Tabular Transformers Across Tables

Tabular data (or tables) are the most widely used data format in machine...
research
07/18/2023

UniTabE: Pretraining a Unified Tabular Encoder for Heterogeneous Tabular Data

Recent advancements in Natural Language Processing (NLP) have witnessed ...

Please sign up or login with your details

Forgot password? Click here to reset