Progressive Evaluation of Queries over Untagged Data

by   Dhrubajyoti Ghosh, et al.

Modern information systems often collect raw data in the form of text, images, video, and sensor readings. Such data needs to be further interpreted/enriched prior to being analyzed. Enrichment is often a result of automated machine learning and or signal processing techniques that associate appropriate but uncertain tags with the data. Traditionally, with the notable exception of a few systems, enrichment is considered to be a separate pre-processing step performed independently prior to data analysis. Such an approach is becoming increasingly infeasible since modern data capture technologies enable creation of very large data collections for which it is computationally difficult/impossible and ultimately not beneficial to derive all tags as a preprocessing step. Hence, approaches that perform tagging at query/analysis time on the data of interest need to be considered. This paper explores the problem of joint tagging and query processing. In particular, the paper considers a scenario where tagging can be performed using several techniques that differ in cost and accuracy and develops a progressive approach to answering queries (SPJ queries with a restricted version of join) that enriches the right data to the right degree so as to maximize the quality of the query results. The experimental results show that proposed approach performs significantly better compared to baseline approaches.


page 1

page 2

page 3

page 4


Progressive Evaluation of Queries over Tagged Data

Modern information systems often collect raw data in the form of text, i...

Resource Utilization Monitoring for Raw Data Query Processing

Scientific experiments, simulations, and modern applications generate la...

Learnable Front Ends Based on Temporal Modulation for Music Tagging

While end-to-end systems are becoming popular in auditory signal process...

Verifying the Correctness of Analytic Query Results

Data outsourcing is a cost-effective solution for data owners to tackle ...

Efficient Approximate Query Answering over Sensor Data with Deterministic Error Guarantees

With the recent proliferation of sensor data, there is an increasing nee...

Supervised Machine Learning for Extractive Query Based Summarisation of Biomedical Data

The automation of text summarisation of biomedical publications is a pre...

Reliable Part-of-Speech Tagging of Historical Corpora through Set-Valued Prediction

Syntactic annotation of corpora in the form of part-of-speech (POS) tags...

Please sign up or login with your details

Forgot password? Click here to reset