A Tree Pattern Matching Algorithm for XML Queries with Structural Preferences

In the XML community, exact queries allow users to specify exactly what they want to check and/or retrieve in an XML document. When they are applied to a semi-structured document or to a document with an overly complex model, the lack or the ignorance of the explicit document model (DTD-Document Type Definition, Schema, etc.) increases the risk of ob-taining an empty result set when the query is too specific, or, too large result set when it is too vague (e.g. it contains wildcards such as "*"). The reason is that in both cases, users write queries according to the document model they have in mind; this can be very far from the one that can actually be extracted from the document. Opposed to exact queries, preference queries are more flexible and can be relaxed to expand the search space during their evalua-tions. Indeed, during their evaluation, certain constraints (the preferences they contain) can be relaxed if necessary to avoid precisely empty results; moreover, the returned answers can be filtered to retain only the best ones. This paper presents an algorithm for evaluating such queries inspired by the TreeMatch algorithm proposed by Yao et al. for exact queries. In the pro-posed algorithm, the best answers are obtained by using an adaptation of the Skyline operator (defined in relational databases) in the context of documents (trees) to incrementally filter into the partial solutions set, those which satisfy the maximum of preferential constraints. The only restriction imposed on documents is No-Self-Containment.

READ FULL TEXT
research
06/07/2019

Holistic evaluation of XML queries with structural preferences on an annotated strong dataguide

With the emergence of XML as de facto format for storing and exchanging ...
research
01/20/2022

JEDI: These aren't the JSON documents you're looking for... (Extended Version*)

The JavaScript Object Notation (JSON) is a popular data format used in d...
research
07/28/2019

TopicSifter: Interactive Search Space Reduction Through Targeted Topic Modeling

Topic modeling is commonly used to analyze and understand large document...
research
05/19/2023

QUEST: A Retrieval Dataset of Entity-Seeking Queries with Implicit Set Operations

Formulating selective information needs results in queries that implicit...
research
11/02/2021

Explaining Documents' Relevance to Search Queries

We present GenEx, a generative model to explain search results to users ...
research
03/16/2020

Supporting Hard Queries over Probabilistic Preferences

Preference analysis is widely applied in various domains such as social ...
research
09/02/2020

Tree Automata for Extracting Consensus from Partial Replicas of a Structured Document

In an asynchronous cooperative editing workflow of a structured document...

Please sign up or login with your details

Forgot password? Click here to reset