The Value of Information in Retrospect

06/05/2018
by   Jacob Parsons, et al.
0

In the course of any statistical analysis, it is necessary to consider issues of data quality and model appropriateness. Value of information (VOI) methods were initially put forward in the middle of the twentieth century to understand how important a portion of data is in the decision making. However, since their genesis, VOI methods have been largely neglected by statisticians. In this paper we review and extend existing VOI methods and recommend the use of three quantities for identifying influential and outlying data: an influence measure previously suggested by Kempthorne (1986), a related quantity known as the expected value of sample information that is used to gauge how much influence we would expect data to have, and the ratio of the two which serves as a comparison between observed influence and expected influence. We study the theoretical properties of those quantities and implement our proposed approach on two datasets. A data set of employment rates and economic factors in U.S. (Longley, 1967) is used as an example of the linear regression. It was also used by Cook (1977) to introduce the Cook's distance, a common frequentist measure of influence. The HIV surveillance data has been the main data sources for monitoring the HIV epidemics in low and middle income countries. The Swaziland HIV prevalence data contains the number of HIV+ patients observed at multiple clinics over years. It is used as an example of the generalized linear mixed models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/27/2013

Value of Evidence on Influence Diagrams

In this paper, we introduce evidence propagation operations on influence...
research
04/09/2021

Assessment of the influence of features on a classification problem: an application to COVID-19 patients

This paper deals with an important subject in classification problems ad...
research
11/30/2020

Data Fusion for Joining Income and Consumption Information Using Different Donor-Recipient Distance Metrics

Data fusion describes the method of combining data from (at least) two i...
research
09/14/2023

System Effects in Identifying Risk-Optimal Data Requirements for Digital Twins of Structures

Structural Health Monitoring (SHM) technologies offer much promise to th...
research
01/17/2021

Estimating informativeness of samples with Smooth Unique Information

We define a notion of information that an individual sample provides to ...
research
03/10/2023

Optimal Design of Validation Experiments for the Prediction of Quantities of Interest

Numerical predictions of quantities of interest measured within physical...

Please sign up or login with your details

Forgot password? Click here to reset