Enhancing OBDA Query Translation over Tabular Data with Morph-CSV
Ontology-Based Data Access (OBDA) has traditionally focused on providing a unified view of heterogeneous datasets (e.g., relational database, CSV, JSON), either by materializing integrated data into RDF or by performing on-the-fly integration via SPARQL-to-SQL query translation. In the specific case of tabular datasets comprised of several CSV or Excel files, query translation approaches have been applied taking as input a lightweight schema with table and column names, and considering each source as a single table that can be loaded into a relational database system (RDB). This naïve approach does not consider implicit constraints in this type of data, e.g., referential integrity among data sources, datatypes, or data integrity; We propose Morph-CSV, a framework that enforces constraints and can be used together with any SPARQL-to-SQL OBDA engine. Morph-CSV resorts to both a Constraints component and a set of operators that apply each type of constraint to the input with the aim of enhancing query completeness and performance. We evaluate Morph-CSV against a set of real-world open tabular datasets in the domain of the public transport; Morph-CSV is compared with existing approaches in terms of query result completeness and performance.
READ FULL TEXT