Measuring uncertainty when pooling interval-censored data sets with different precision

10/25/2022
by   Krasymyr Tretiak, et al.
0

Data quality is an important consideration in many engineering applications and projects. Data collection procedures do not always involve careful utilization of the most precise instruments and strictest protocols. As a consequence, data are invariably affected by imprecision and sometimes sharply varying levels of quality of the data. Different mathematical representations of imprecision have been suggested, including a classical approach to censored data which is considered optimal when the proposed error model is correct, and a weaker approach called interval statistics based on partial identification that makes fewer assumptions. Maximizing the quality of statistical results is often crucial to the success of many engineering projects, and a natural question that arises is whether data of differing qualities should be pooled together or we should include only precise measurements and disregard imprecise data. Some worry that combining precise and imprecise measurements can depreciate the overall quality of the pooled data. Some fear that excluding data of lesser precision can increase its overall uncertainty about results because lower sample size implies more sampling uncertainty. This paper explores these concerns and describes simulation results that show when it is advisable to combine fairly precise data with rather imprecise data by comparing analyses using different mathematical representations of imprecision. Pooling data sets is preferred when the low-quality data set does not exceed a certain level of uncertainty. However, so long as the data are random, it may be legitimate to reject the low-quality data if its reduction of sampling uncertainty does not counterbalance the effect of its imprecision on the overall uncertainty.

READ FULL TEXT

page 16

page 17

research
05/14/2000

Modeling the Uncertainty in Complex Engineering Systems

Existing procedures for model validation have been deemed inadequate for...
research
02/03/2021

A few statistical principles for data science

In any other circumstance, it might make sense to define the extent of t...
research
08/18/2023

Uncertainty-based quality assurance of carotid artery wall segmentation in black-blood MRI

The application of deep learning models to large-scale data sets require...
research
05/23/2021

Data Quality in Empirical Software Engineering: A Targeted Review

Context: The utility of prediction models in empirical software engineer...
research
01/05/2022

Data-driven Model Generalizability in Crosslinguistic Low-resource Morphological Segmentation

Common designs of model evaluation typically focus on monolingual settin...
research
04/15/2021

Towards Handling Uncertainty-at-Source in AI – A Review and Next Steps for Interval Regression

Most of statistics and AI draw insights through modelling discord or var...
research
03/27/2013

Map Learning with Indistinguishable Locations

Nearly all spatial reasoning problems involve uncertainty of one sort or...

Please sign up or login with your details

Forgot password? Click here to reset