Toward a System Building Agenda for Data Integration

09/29/2017
by   AnHai Doan, et al.
0

In this paper we argue that the data management community should devote far more effort to building data integration (DI) systems, in order to truly advance the field. Toward this goal, we make three contributions. First, we draw on our recent industrial experience to discuss the limitations of current DI systems. Second, we propose an agenda to build a new kind of DI systems to address these limitations. These systems guide users through the DI workflow, step by step. They provide tools to address the "pain points" of the steps, and tools are built on top of the Python data science and Big Data ecosystem (PyData). We discuss how to foster an ecosystem of such tools within PyData, then use it to build DI systems for collaborative/cloud/crowd/lay user settings. Finally, we discuss ongoing work at Wisconsin, which suggests that these DI systems are highly promising and building them raises many interesting research challenges.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/21/2017

Big Data Systems Meet Machine Learning Challenges: Towards Big Data Science as a Service

Recently, we have been witnessing huge advancements in the scale of data...
research
03/17/2020

A new paradigm for accelerating clinical data science at Stanford Medicine

Stanford Medicine is building a new data platform for our academic resea...
research
04/09/2021

Agile (data) science: a (draft) manifesto

Science has a data management as well as a project management problem. W...
research
03/02/2021

Technical Report on Data Integration and Preparation

AI application developers typically begin with a dataset of interest and...
research
02/27/2019

Data-centric online ecosystem for digital materials science

Materials science is becoming increasingly more reliant on digital data ...
research
02/14/2019

Theory-plus-code documentation of the DEPAM workflow for soundscape description

In the Big Data era, the community of PAM faces strong challenges, inclu...
research
06/30/2017

From Big Data to Big Displays: High-Performance Visualization at Blue Brain

Blue Brain has pushed high-performance visualization (HPV) to complement...

Please sign up or login with your details

Forgot password? Click here to reset