TabText: a Systematic Approach to Aggregate Knowledge Across Tabular Data Structures

06/21/2022
by   Dimitris Bertsimas, et al.
0

Processing and analyzing tabular data in a productive and efficient way is essential for building successful applications of machine learning in fields such as healthcare. However, the lack of a unified framework for representing and standardizing tabular information poses a significant challenge to researchers and professionals alike. In this work, we present TabText, a methodology that leverages the unstructured data format of language to encode tabular data from different table structures and time periods efficiently and accurately. We show using two healthcare datasets and four prediction tasks that features extracted via TabText outperform those extracted with traditional processing methods by 2-5 framework against different choices for sentence representations of missing values, meta information and language descriptiveness, and provide insights into winning strategies that improve performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/20/2022

UniHPF : Universal Healthcare Predictive Framework with Zero Domain Knowledge

Despite the abundance of Electronic Healthcare Records (EHR), its hetero...
research
05/31/2018

Tokenized Data Markets

We formalize the construction of decentralized data markets by introduci...
research
01/08/2022

A Unified Review of Deep Learning for Automated Medical Coding

Automated medical coding, an essential task for healthcare operation and...
research
06/21/2023

Multimodality Fusion for Smart Healthcare: a Journey from Data, Information, Knowledge to Wisdom

Multimodal medical data fusion has emerged as a transformative approach ...
research
07/20/2020

Learning from Data to Speed-up Sorted Table Search Procedures: Methodology and Practical Guidelines

Sorted Table Search Procedures are the quintessential query-answering to...

Please sign up or login with your details

Forgot password? Click here to reset