METAM: Goal-Oriented Data Discovery

04/18/2023
by   Sainyam Galhotra, et al.
7

Data is a central component of machine learning and causal inference tasks. The availability of large amounts of data from sources such as open data repositories, data lakes and data marketplaces creates an opportunity to augment data and boost those tasks' performance. However, augmentation techniques rely on a user manually discovering and shortlisting useful candidate augmentations. Existing solutions do not leverage the synergy between discovery and augmentation, thus under exploiting data. In this paper, we introduce METAM, a novel goal-oriented framework that queries the downstream task with a candidate dataset, forming a feedback loop that automatically steers the discovery and augmentation process. To select candidates efficiently, METAM leverages properties of the: i) data, ii) utility function, and iii) solution set size. We show METAM's theoretical guarantees and demonstrate those empirically on a broad set of tasks. All in all, we demonstrate the promise of goal-oriented data discovery to modern data science applications.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/28/2018

Data science is science's second chance to get causal inference right: A classification of data science tasks

Causal inference from observational data is the goal of many health and ...
research
05/31/2021

Federated Estimation of Causal Effects from Observational Data

Many modern applications collect data that comes in federated spirit, wi...
research
09/10/2023

A compendium of data sources for data science, machine learning, and artificial intelligence

Recent advances in data science, machine learning, and artificial intell...
research
11/25/2019

Machine-learned metrics for predicting thelikelihood of success in materials discovery

Materials discovery is often compared to the challenge of finding a need...
research
11/25/2019

Machine-learned metrics for predicting the likelihood of success in materials discovery

Materials discovery is often compared to the challenge of finding a need...
research
06/01/2023

Cross Modal Data Discovery over Structured and Unstructured Data Lakes

Organizations are collecting increasingly large amounts of data for data...
research
05/06/2023

Actively Discovering New Slots for Task-oriented Conversation

Existing task-oriented conversational search systems heavily rely on dom...

Please sign up or login with your details

Forgot password? Click here to reset