Graph integration of structured, semistructured and unstructured data for data journalism

07/23/2020
by   Oana Balalau, et al.
0

Nowadays, journalism is facilitated by the existence of large amounts of digital data sources, including many Open Data ones. Such data sources are extremely heterogeneous, ranging from highly struc-tured (relational databases), semi-structured (JSON, XML, HTML), graphs (e.g., RDF), and text. Journalists (and other classes of users lacking advanced IT expertise, such as most non-governmental-organizations, or small public administrations) need to be able to make sense of such heterogeneous corpora, even if they lack the ability to de ne and deploy custom extract-transform-load work ows. These are di cult to set up not only for arbitrary heterogeneous inputs , but also given that users may want to add (or remove) datasets to (from) the corpus. We describe a complete approach for integrating dynamic sets of heterogeneous data sources along the lines described above: the challenges we faced to make such graphs useful, allow their integration to scale, and the solutions we proposed for these problems. Our approach is implemented within the ConnectionLens system; we validate it through a set of experiments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/05/2021

Data Validation for Big Live Data

Data Integration of heterogeneous data sources relies either on periodic...
research
09/09/2020

Graph-based keyword search in heterogeneous data sources

Data journalism is the field of investigative journalism which focuses o...
research
08/16/2018

Towards Automated Data Integration in Software Analytics

Software organizations want to be able to base their decisions on the la...
research
02/08/2021

Empowering Investigative Journalism with Graph-based Heterogeneous Data Management

Investigative Journalism (IJ, in short) is staple of modern, democratic ...
research
02/28/2023

Gradient-Boosted Based Structured and Unstructured Learning

We propose two frameworks to deal with problem settings in which both st...
research
09/03/2019

Local Embeddings for Relational Data Integration

Integrating information from heterogeneous data sources is one of the fu...
research
07/27/2020

A Bayesian Hierarchical Network for Combining Heterogeneous Data Sources in Medical Diagnoses

Computer-Aided Diagnosis has shown stellar performance in providing accu...

Please sign up or login with your details

Forgot password? Click here to reset