The Value of Information in Retrospect
In the course of any statistical analysis, it is necessary to consider issues of data quality and model appropriateness. Value of information (VOI) methods were initially put forward in the middle of the twentieth century to understand how important a portion of data is in the decision making. However, since their genesis, VOI methods have been largely neglected by statisticians. In this paper we review and extend existing VOI methods and recommend the use of three quantities for identifying influential and outlying data: an influence measure previously suggested by Kempthorne (1986), a related quantity known as the expected value of sample information that is used to gauge how much influence we would expect data to have, and the ratio of the two which serves as a comparison between observed influence and expected influence. We study the theoretical properties of those quantities and implement our proposed approach on two datasets. A data set of employment rates and economic factors in U.S. (Longley, 1967) is used as an example of the linear regression. It was also used by Cook (1977) to introduce the Cook's distance, a common frequentist measure of influence. The HIV surveillance data has been the main data sources for monitoring the HIV epidemics in low and middle income countries. The Swaziland HIV prevalence data contains the number of HIV+ patients observed at multiple clinics over years. It is used as an example of the generalized linear mixed models.
READ FULL TEXT