Selecting Sub-tables for Data Exploration

03/05/2022
by   Kathy Razmadze, et al.
0

We present a framework for creating small, informative sub-tables of large data tables to facilitate the first step of data science: data exploration. Given a large data table table T, the goal is to create a sub-table of small, fixed dimensions, by selecting a subset of T's rows and projecting them over a subset of T's columns. The question is: which rows and columns should be selected to yield an informative sub-table? We formalize the notion of "informativeness" based on two complementary metrics: cell coverage, which measures how well the sub-table captures prominent association rules in T, and diversity. Since computing optimal sub-tables using these metrics is shown to be infeasible, we give an efficient algorithm which indirectly accounts for association rules using table embedding. The resulting framework can be used for visualizing the complete sub-table, as well as for displaying the results of queries over the sub-table, enabling the user to quickly understand the results and determine subsequent queries. Experimental results show that we can efficiently compute high-quality sub-tables as measured by our metrics, as well as by feedback from user-studies.

READ FULL TEXT

page 1

page 3

research
08/06/2018

The Bases of Association Rules of High Confidence

We develop a new approach for distributed computing of the association r...
research
09/08/2018

Typed Table Transformations

Spreadsheet tables are often labeled, and these labels effectively const...
research
09/07/2020

A Lightweight Algorithm to Uncover Deep Relationships in Data Tables

Many data we collect today are in tabular form, with rows as records and...
research
02/05/2019

TableNet: An Approach for Determining Fine-grained Relations for Wikipedia Tables

Wikipedia tables represent an important resource, where information is o...
research
05/23/2018

Predicting football tables by a maximally parsimonious model

This paper presents some useful mathematical results involved in footbal...
research
04/19/2018

VeriTable: Fast Equivalence Verification of Multiple Large Forwarding Tables

Due to network practices such as traffic engineering and multi-homing, t...
research
11/19/2021

Types for Tables: A Language Design Benchmark

Context: Tables are ubiquitous formats for data. Therefore, techniques f...

Please sign up or login with your details

Forgot password? Click here to reset