Empowering Investigative Journalism with Graph-based Heterogeneous Data Management

Investigative Journalism (IJ, in short) is staple of modern, democratic societies. IJ often necessitates working with large, dynamic sets of heterogeneous, schema-less data sources, which can be structured, semi-structured, or textual, limiting the applicability of classical data integration approaches. In prior work, we have developed ConnectionLens, a system capable of integrating such sources into a single heterogeneous graph, leveraging Information Extraction (IE) techniques; users can then query the graph by means of keywords, and explore query results and their neighborhood using an interactive GUI. Our keyword search problem is complicated by the graph heterogeneity, and by the lack of a result score function that would allow to prune some of the search space. In this work, we describe an actual IJ application studying conflicts of interest in the biomedical domain, and we show how ConnectionLens supports it. Then, we present novel techniques addressing the scalability challenges raised by this application: one allows to reduce the significant IE costs while building the graph, while the other is a novel, parallel, in-memory keyword search engine, which achieves orders of magnitude speed-up over our previous engine. Our experimental study on the real-world IJ application data confirms the benefits of our contributions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/09/2020

Graph-based keyword search in heterogeneous data sources

Data journalism is the field of investigative journalism which focuses o...
research
08/09/2022

Integrating connection search in graph queries

Graph data management and querying has many practical applications. When...
research
07/23/2020

Graph integration of structured, semistructured and unstructured data for data journalism

Nowadays, journalism is facilitated by the existence of large amounts of...
research
01/19/2020

Efficient Radial Pattern Keyword Search on Knowledge Graphs in Parallel

Recently, keyword search on Knowledge Graphs (KGs) becomes popular. Typi...
research
01/19/2023

Keyword Embeddings for Query Suggestion

Nowadays, search engine users commonly rely on query suggestions to impr...
research
08/05/2021

VisualTextRank: Unsupervised Graph-based Content Extraction for Automating Ad Text to Image Search

Numerous online stock image libraries offer high quality yet copyright f...
research
01/13/2022

Ontological model identification based on data from heterogeneous sources

The development of a company often entails the emergence of autonomous d...

Please sign up or login with your details

Forgot password? Click here to reset