DQSOps: Data Quality Scoring Operations Framework for Data-Driven Applications

03/27/2023
by   Firas Bayram, et al.
0

Data quality assessment has become a prominent component in the successful execution of complex data-driven artificial intelligence (AI) software systems. In practice, real-world applications generate huge volumes of data at speeds. These data streams require analysis and preprocessing before being permanently stored or used in a learning task. Therefore, significant attention has been paid to the systematic management and construction of high-quality datasets. Nevertheless, managing voluminous and high-velocity data streams is usually performed manually (i.e. offline), making it an impractical strategy in production environments. To address this challenge, DataOps has emerged to achieve life-cycle automation of data processes using DevOps principles. However, determining the data quality based on a fitness scale constitutes a complex task within the framework of DataOps. This paper presents a novel Data Quality Scoring Operations (DQSOps) framework that yields a quality score for production data in DataOps workflows. The framework incorporates two scoring approaches, an ML prediction-based approach that predicts the data quality score and a standard-based approach that periodically produces the ground-truth scores based on assessing several data quality dimensions. We deploy the DQSOps framework in a real-world industrial use case. The results show that DQSOps achieves significant computational speedup rates compared to the conventional approach of data quality scoring while maintaining high prediction performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/26/2023

Transcending Traditional Boundaries: Leveraging Inter-Annotator Agreement (IAA) for Enhancing Data Management Operations (DMOps)

This paper presents a novel approach of leveraging Inter-Annotator Agree...
research
08/26/2020

Bandit Data-driven Optimization: AI for Social Good and Beyond

The use of machine learning (ML) systems in real-world applications enta...
research
05/23/2017

Her2 Challenge Contest: A Detailed Assessment of Automated Her2 Scoring Algorithms in Whole Slide Images of Breast Cancer Tissues

Evaluating expression of the Human epidermal growth factor receptor 2 (H...
research
12/11/2019

Callisto: Entropy based test generation and data quality assessment for Machine Learning Systems

Machine Learning (ML) has seen massive progress in the last decade and a...
research
07/28/2023

Framework to Automatically Determine the Quality of Open Data Catalogs

Data catalogs play a crucial role in modern data-driven organizations by...
research
02/27/2020

Action Quality Assessment using Siamese Network-Based Deep Metric Learning

Automated vision-based score estimation models can be used as an alterna...
research
03/16/2023

IRIS: Interpretable Rubric-Informed Segmentation for Action Quality Assessment

AI-driven Action Quality Assessment (AQA) of sports videos can mimic Oly...

Please sign up or login with your details

Forgot password? Click here to reset