Guided Exploration of Data Summaries

05/27/2022
by   Brit Youngmann, et al.
0

Data summarization is the process of producing interpretable and representative subsets of an input dataset. It is usually performed following a one-shot process with the purpose of finding the best summary. A useful summary contains k individually uniform sets that are collectively diverse to be representative. Uniformity addresses interpretability and diversity addresses representativity. Finding such as summary is a difficult task when data is highly diverse and large. We examine the applicability of Exploratory Data Analysis (EDA) to data summarization and formalize Eda4Sum, the problem of guided exploration of data summaries that seeks to sequentially produce connected summaries with the goal of maximizing their cumulative utility. EdA4Sum generalizes one-shot summarization. We propose to solve it with one of two approaches: (i) Top1Sum which chooses the most useful summary at each step; (ii) RLSum which trains a policy with Deep Reinforcement Learning that rewards an agent for finding a diverse and new collection of uniform sets at each step. We compare these approaches with one-shot summarization and top-performing EDA solutions. We run extensive experiments on three large datasets. Our results demonstrate the superiority of our approaches for summarizing very large data, and the need to provide guidance to domain experts.

READ FULL TEXT

page 1

page 8

research
12/29/2017

Deep Reinforcement Learning for Unsupervised Video Summarization with Diversity-Representativeness Reward

Video summarization aims to facilitate large-scale video browsing by pro...
research
08/28/2022

Podcast Summary Assessment: A Resource for Evaluating Summary Assessment Methods

Automatic summary assessment is useful for both machine-generated and hu...
research
10/15/2020

GSum: A General Framework for Guided Neural Abstractive Summarization

Neural abstractive summarization models are flexible and can produce coh...
research
03/18/2021

Optimally Summarizing Data by Small Fact Sets for Concise Answers to Voice Queries

Our goal is to find combinations of facts that optimally summarize data ...
research
03/15/2021

DeepOPG: Improving Orthopantomogram Finding Summarization with Weak Supervision

Finding summaries from an orthopantomogram, or a dental panoramic radiog...
research
10/08/2021

HydraSum – Disentangling Stylistic Features in Text Summarization using Multi-Decoder Models

Existing abstractive summarization models lack explicit control mechanis...
research
11/27/2017

One-Shot Coresets: The Case of k-Clustering

Scaling clustering algorithms to massive data sets is a challenging task...

Please sign up or login with your details

Forgot password? Click here to reset