Graph-based keyword search in heterogeneous data sources

09/09/2020
by   Mhd Yamen Haddad, et al.
0

Data journalism is the field of investigative journalism which focuses on digital data by treating them as first-class citizens. Following the trends in human activity, which leaves strong digital traces, data journalism becomes increasingly important. However, as the number and the diversity of data sources increase, heterogeneous data models with different structure, or even no structure at all, need to be considered in query answering. Inspired by our collaboration with Le Monde, a leading French newspaper, we designed a novel query algorithm for exploiting such heterogeneous corpora through keyword search. We model our underlying data as graphs and, given a set of search terms, our algorithm nds links between them within and across the heterogeneous datasets included in the graph. We draw inspiration from prior work on keyword search in structured and unstructured data, which we extend with the data heterogeneity dimension, which makes the keyword search problem computationally harder. We implement our algorithm and we evaluate its performance using synthetic and real-world datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/08/2021

Empowering Investigative Journalism with Graph-based Heterogeneous Data Management

Investigative Journalism (IJ, in short) is staple of modern, democratic ...
research
08/09/2022

Integrating connection search in graph queries

Graph data management and querying has many practical applications. When...
research
11/02/2019

Do Chinese Internet Users Exist Heterogeneity in Search Behavior?

Investor attention is an important concept in behavioral finance. Many a...
research
07/23/2020

Graph integration of structured, semistructured and unstructured data for data journalism

Nowadays, journalism is facilitated by the existence of large amounts of...
research
03/21/2021

Structural Textile Pattern Recognition and Processing Based on Hypergraphs

The humanities, like many other areas of society, are currently undergoi...
research
11/01/2021

Heterogeneous Graph Neural Networks for Large-Scale Bid Keyword Matching

Digital advertising is a critical part of many e-commerce platforms such...
research
09/03/2023

DKWS: A Distributed System for Keyword Search on Massive Graphs (Complete Version)

Due to the unstructuredness and the lack of schemas of graphs, such as k...

Please sign up or login with your details

Forgot password? Click here to reset