Log In Sign Up

A Test for Differential Ascertainment in Case-Control Studies with Application to Child Maltreatment

by   Matteo Sordello, et al.

We propose a method to test for the presence of differential ascertainment in case-control studies, when data are collected by multiple sources. We show that, when differential ascertainment is present, the use of only the observed cases leads to severe bias in the computation of the odds ratio. The use of the estimates that our method of testing for differential ascertainment naturally provides can alleviate the effect of such bias. We apply it to a dataset obtained from the National Violent Death Reporting System, with the goal of checking for the presence of differential ascertainment by race in the count of deaths caused by child maltreatment.


page 1

page 2

page 3

page 4


Accounting for recall bias in case-control studies: a causal inference approach

A case-control study is designed to help determine if an exposure is ass...

The Hypothesis of Testing: Paradoxes arising out of reported coronavirus case-counts

Many statisticians, epidemiologists, economists and data scientists have...

Multivariate Hierarchical Frameworks for Modelling Delayed Reporting in Count Data

In many fields and applications count data can be subject to delayed rep...

Stochastic hybrid differential games and match race problems

We discuss the general framework of a stochastic two-player, hybrid diff...

Low c-Differential Uniformity of the Swapped Inverse Function in Odd Characteristic

The study of Boolean functions with low c-differential uniformity has be...

National Bias of International Gymnastics Judges during the 2013-2016 Olympic Cycle

National bias in sports judging is a well-known issue and has been obser...

1 The Problem of Differential Ascertainment in Case-Control Studies

In case-control studies, one wants to understand the association between some risk factor and a rare outcome. The odds of a risk factor is compared among cases (subjects with the outcome) vs. controls (subjects without the outcome). When it is possible to get a complete count of the cases or to obtain a random sample of the cases, then estimating the prevalence of the risk factor among the cases is straightforward. But for some settings, it is not possible to get a complete count or take a random sample of the cases. This problem does not arise strictly in case-control studies, but in any situation in which one wants to count the number of individuals that belong to a certain group. In such circumstances we say that we are in the presence of imperfect ascertainment. It can happen in a multitude of different contexts, for example when counting the number of addicts from certain drugs (Domingo-Salvany et al., 1995), of civilians that died during a war (Ball et al., 2002) or of people that suffer from diabetes (International Working Group for Disease Monitoring and Forecasting, 1995). A further complication might arise if the group of people under consideration is divided into two subpopulations, according to a variable. It such cases, the ascertainment can be not only imperfect, but possibly easier in one of these two subpopulations, and harder in the other. One can think, for example, that there is a threshold on the variable age under which it is easier to ascertain if a person is a drug addict, or that gender is a relevant factor in the ascertainment of civilian deaths and of individuals suffering from diabetes. In practical terms, this means that the ratio of the observed counts in the two subpopulations is different from the real underlying ratio. Here we say that we are in the presence of differential ascertainment, a situation that is harder to detect than simple imperfect ascertainment.

In this work, we focus on a specific situation where we suspect that differential ascertainment could be involved, and develop the machinery to be able to perform a rigorous statistical analysis. The question we want to answer concerns deaths caused by maltreatment of children aged 10 or less. Notice that here we are in the presence of imperfect ascertainment, since it is not always possible to assess whether or not a death has been caused by child maltreatment (see for example Herman-Giddens et al. (1999)). We are interested in knowing whether the deaths caused by child maltreatment are differentially ascertained according to the variable race. There are two main reasons we are interested in this question. The first one is that most of the research done in this direction has the goal of providing guidelines that can be used to develop and evaluate intervention strategies, ultimately to reduce the number of deaths caused by child maltreatment. If the real proportions in the population are different from the ones observed, then the resources put forward to solve the problem may be poorly allocated and less effective than they should be. The second reason is that differential ascertainment can lead case-control studies to provide false conclusions. To explain this, we introduce some notation for case-control studies (for a thorough discussion of case-control studies see Keogh and Cox (2014)). For a case-control study, let be the bivariate exposure, stands for exposed and for unexposed, and be the outcome of interest, where is the indicator of the cases and of the controls. We want to infer the distribution of . Unfortunately, given that it is structured retrospectively, a case-control study can only provide information on the distribution of . The following equality comes to our aid, creating a link between these distributions.

The left hand side is called the odds ratio (OR), and it is a fundamental quantity to assess the association between exposure and outcome. Let be the total number of exposed individuals and the total number of unexposed individuals in the whole population of interest (so counting both cases and controls). We assume that and are known. Let now be the number of exposed cases and the number of unexposed cases. The odds ratio is then

When the cases are very rare, compared to the size of the whole population, then one can approximate and get

If there is imperfect ascertainment of the cases, the real values of and are unknown. However, if one makes the assumption that the real proportion of exposed and unexposed cases is the same as the observed one, then it is still possible to compute the OR. If there is differential ascertainment, on the contrary, we know by definition that the use of the observed ratio instead of the real will lead to some bias in the computation of OR, which can be severe, as we will show in Section 2.2.

In order to create a test for the presence of differential ascertainment, we introduce a simple modification of the original Rasch model (Rasch, 1960) in which we consider and

as realizations of a Poisson random variable with mean

and respectively. We estimate those means, together with a parameter representing the amount of differential ascertainment in the data. We use and in the computation of OR, and we use to test if the differential ascertainment is significant or not, and in which direction. If it is not significant, then the observed counts of exposed and unexposed cases can also be used instead of the estimated means.

It is important to note that if there is only one source of information that counts the total number of cases, then it is impossible to know if we are in the presence of differential ascertainment. However, when numerous sources of information are available, we can detect it and mitigate its effect.

In the next section, we discuss a dataset for studying factors related to death due to child maltreatment that contains multiple sources of information.

1.1 The National Violent Death Reporting System

To address our question about differential ascertainment of death due to child maltreatment by race, we used a dataset obtained from the National Violent Death Reporting System (NVDRS). Here we provide a brief description of the NVDRS, but a more detailed explanation can be found in Blair et al. (2016).

The NVDRS is a surveillance system created in 2002 from a pilot program called the National Violent Injury Statistics System. In 2010, 19 States were participating in the data collection, and in 2014 there were 14 additional States that joined the system. The goal of the system is to monitor the occurrence of homicides, suicides, unintentional firearm deaths, deaths of undetermined intent, and deaths from legal intervention (excluding legal executions) in the US. The system uses different data sources and merges them together to obtain a more complete description of the event. The primary data sources used for the system are death certificates, coroner/medical examiner reports (including toxicology reports) and law enforcement reports. Coroner/medical examiner reports and law enforcement reports contain narratives that can provide detailed information regarding the circumstances and characteristics of incidents. The manner of death is captured by International Classification of Diseases, Tenth Revision (ICD-10) codes. Other relevant information collected by the system are demographics (for victims and suspects), method of injury (eg, sharp instrument), place of injury, information on the victim’s life stressors, the victim-suspect relationship (eg, spouse or partner), the presence of intimate partner violence (IPV), toxicology (for victims), weapon information and whether other crimes occurred that are related to the incident (eg, robbery followed by homicide).

As already mentioned, we focus our attention on the deaths of children aged 10 or less, and considered as cases the deaths that were caused by child maltreatment. The timeframe of interest is 2010 to 2015. We follow Leeb et al. (2008) in considering child maltreatment to be “any act or series of acts of commission or omission by a parent or other caregiver that results in harm, potential for harm, or threat of harm to a child”.

We are interested in determining if race played a role in the ascertainment of the cases. For doing so we will consider the three main sources of information used in the NVDRS, and provide a method to test for differential ascertainment, explained in Section 3.

2 The Model

As anticipated before, we consider a bivariate exposure and a total number of cases in each group. We first introduce the classical version of the Rasch model (Rasch, 1960), which relies on two important assumptions.

Assumption 2.1 (Individual Independence).

The ascertainment of each individual in any list is independent from the ascertainment of any other individual.

Assumption 2.2 (List Independence).

For any individual, the ascertainment happens in a list independently from any other list.

Assumption 2.2 is not completely reasonable, since we can imagine that being ascertained in one or more lists makes an individual more or less prone to be ascertained again. This mechanism is explained in Bishop et al. (2007) under the name of social visibility/invisibility or trap fascination/avoidance, depending on the context. In Section 2.1 we will get rid of this assumption and explain how to include dependence among the lists. We will sometimes refer to the concept of ascertainment as capture, given the similarity of our framework with the capture-recapture one. This connection is made explicit in Appendix A.

The number of lists is denoted by . Conditionally on the value , the ascertainment matrix has dimension , and, for and , each entry takes values

These Bernoulli random variables take value

with probability


The parameters represent the individual propensity to be captured for an individual with exposure , and are the capture strengths of each list, which are assumed to be independent from the exposure. This model naturally creates, for each exposure

, a contingency table

with entries, each entry being the number of individuals that are ascertained in a specific combination of lists. We denote , where , to be the count of individuals that are ascertained in all the lists with corresponding index that takes value , and not ascertained in the lists with index . For example, when (the setting we will use from now on, since it is what we will be dealing with in the NVDRS dataset), is the count of individuals that appear in list and list , but not in list . In Table 1 we show a contingency table for the three lists setting.

List 1
List 2
List 3 Y
Table 1: Complete contingency table for the count of individuals with exposure in the three-list setting.

Since we are only concerned with the difference between exposed and unexposed cases, we assume that the values of are constant among the two groups. For this reason we set for any unexposed individual, and for any exposed one. The parameter is going to be the main object of our study.

A richer model is considered in Appendix B

, where we assume that, instead of being fixed, the individual propensities to be captured come from a normal distribution. We use

for unexposed cases and for the exposed. We will see, however, that we don’t really need to introduce variability in these parameters, since our estimate of is going to be extremely small and all the other parameters very close to the ones estimated with the initial setting.

Using Assumption 2.2, we compute the probability for each exposed individual to appear in a specific cell identified by subscripts , which is just the product of the single lists’ probabilities


while for unexposed cases we have


Thanks to Assumption 2.1, the likelihood of a complete table with total count is given by a multinomial distribution


where the probabilities are given by (2) or (3). A consequence of Assumption 2.1 is that the contingency tables associated with different exposures are independent. Then the total likelihood of the two complete contingency tables we are considering is


Notice that up to now we were able to condition on the total number of cases , since we were dealing with contingency tables where no cell was missing. In real applications, however, the contingency tables are not complete, and the cells are missing, since they represent the elements that are not ascertained in any list. This means that is now unknown, and the definition of the likelihood is not as immediate as before. We then add the assumption that, for any , the count . If we let be the contingency table without the unobserved cell and the corresponding count of all the observed cases, then the likelihood of the observed table is


where is a complete contingency table with observations in the bottom-right cell, hence distributed according to a multinomial distribution with probabilities given in (2) or (3), and is Poisson with mean . The likelihood of the two incomplete contingency tables is, again using Assumption 2.1,


2.1 Dependence among lists

While the independence among the individuals is an important assumption, we can extend this model to include dependence among the lists. We consider the dynamical Rasch model (Verhelst and Glas, 1993), which eliminates the independence assumption among the lists with a simple modification of the probabilities in (1). Instead of having them fixed, we generate them sequentially based on the previous realizations of the ascertainment variables. These are the same for each individual, and are defined, for those that are exposed, as


and if the number of lists is greater we can keep generating and based on the previous values of . Here we decided to consider only the two-list interactions, given by the parameters and . It is also possible to use, as the number of lists grows, higher-order interactions. For unexposed individuals the procedure is analogous, just setting as before . Then we have, for exposed individuals


and for the unexposed ones


The model for two incomplete and independent contingency tables is then the same as in (2) and (7), only with (9) and (10) instead of (2) and (3) in the multinomial probabilities.

Remark 2.1.

Why is responsible for differential ascertainment? Let’s consider and fixed. We want to compute in order to estimate the odds ratio, but since the cells and are unobserved we cannot have a precise count of cases. However from (9) and (10) we know that

We also know that and . This tells us that if and only if , which is true only if , thanks to the monotonicity of the denominator of with respect to . The largest is , the larger the difference will be between the ratio of the observed counts and the real ratio .

2.2 The Disastrous Effect of Differential Ascertainment in Estimating OR

We present here a simulation that shows that the use of only the observed counts can lead to severe bias in the estimate of the odds ratio. We set and , and consider five possible values for the differential ascertainment parameter, . For each of these values we generate times the two counts and

from a Poisson distribution with mean

and respectively. At every iteration, the ascertainment of the cases happens with the probabilities defined in (2.1), obviously setting for the unexposed individuals.

When we consider only one list, we let . When there are three lists, we use for the individual capture strength and for the two-list interactions. When there are five lists, we only use two-lists interactions with parameter together with individual capture strength , and set to zero every higher-order interaction. The observed ratio of exposed over unexposed individuals is recorded in Table 2, together with the confidence interval.

The first immediate realization that we have when looking at the table is that, when we created differential ascertainment by setting , the observed ratio can be extremely biased. It is also clear that the bias increases with in each direction, and that it is smaller the more lists we use. This is because, when grows, even if the individual ascertainment probabilities for each list are small, the chance that at least one list will capture an individual grows, so the count of individuals that are missed by all the lists gets very low.

This analysis tells us that, in a situation where we don’t have a large number of different sources of information available, and we are unsure about the presence of differential ascertainment, it can be very risky to just trust the observed ratio. When only one source of information is available, there is unfortunately not much we can do. But when we are dealing with multiple lists, we can use the model described before to estimate the mean counts and test for the presence of differential ascertainment, as explained in Section 3.

[innerwidth=0.7cm]J -1 -0.5 0 0.5 1
1 0.733
(0.642, 0.829)
(0.521, 0.660)
(0.437, 0.559)
(0.394, 0.507)
(0.361, 0.466)
3 0.625
(0.558, 0.702)
(0.484, 0.606)
(0.449, 0.559)
(0.434, 0.537)
(0.424, 0.531)
5 0.548
(0.489, 0.611)
(0.461, 0.570)
(0.446, 0.558)
(0.444, 0.551)
(0.444, 0.551)
Table 2: The point estimate and confidence interval for when using different numbers of lists . We created differential ascertainment through the parameter .

2.2.1 How the model performs in this situation

When only one list is available, there is not much we can do to test if we are in the presence of differential ascertainment, and when we have five lists we just saw that the estimated ratios are not very far from the truth. Here we want to apply our model to data generated with three lists, with the same parameters as in 2.2, with that takes value in . In Table 3 we see the average estimates for all the parameters involved out of simulations, together with a confidence interval.

Notice that, since the confidence intervals for do not cover , this is intuitively telling us that the differential ascertainment is significant (and we know it is, since we created it). However, these estimates come from observations of different populations generated with the same parameters, and when dealing with real data we don’t have access to that. In Section 3 we formalize a testing procedure based on the three-sided test (Goeman et al., 2010), to test the significance of and its sign. It is also important to note that the estimates for and are very close to the true values, confirming that the method recovers the true underlying number of cases in both exposed and unexposed groups even when strong differential ascertainment is present.

[innerwidth=2.5cm, innerrightsep = .5cm]Estimates -1 1
0.502 (0.36, 0.64) 0.500 (0.35, 0.61)
0.507 (0.26, 0.72) 0.506 (0.23, 0.74)
0.522 (0.21, 0.85) 0.495 (0.20, 0.79)
-0.207 (-0.48, 0.11) -0.208 (-0.47, 0.09)
-0.229 (-0.53, 0.08) -0.194 (-0.44, 0.04)
-0.212 (-0.45, -0.02) -0.212 (-0.46, 0.05)
499 (436, 554) 501 (458, 548)
1000 (934, 1075) 1004 (942, 1063)
-0.996 (-1.21, -0.82) 1.009 (0.84, 1.15)
Table 3: The point estimate and confidence interval for each parameter of the model, when the differential ascertainment parameter is set to be . We used for any and . Each population is created 200 times and ascertained with 3 lists.

3 Three-sided test

We want to test if there is evidence that is greater, smaller or equal to 0. We could simply test

, but this null hypothesis, if not rejected, would not be enough to provide evidence of equivalence. Instead, we use a three-sided test

(Goeman et al., 2010), which considers an equivalence region for the parameter and the three different hypotheses

By the partitioning principle, thanks to the non overlapping of the hypothesis, all three tests can be performed separately at the desired level , while maintaining the family-wise error rate at level . Let’s define by the level quantile of the empirical distribution of under the null hypothesis , which means that . The three-sided testing procedure is defined as:

  • Reject if .

  • Reject if

  • Reject if either or

Notice that if is rejected then we also necessarily reject one of and . Another possibility is that is accepted, in which case and might both be rejected, only one rejected or neither rejected; the third scenario would happen if and .

To obtain the empirical distribution for under the null hypothesis , we use the following procedure.

Procedure to find the empirical distribution for under the null : Estimate the parameters and fitting model (7) to the real data and using the probabilities in (10) for both exposed and unexposed cases. Generate populations according to the estimated parameters, i.e., for any
where are given in (10) and are used also for the exposed cases.
Create eliminating from the cell corresponding to the unobserved individuals, and do the same for . Fit (7) on and , this time using probabilities (9) and (10) for exposed and unexposed cases respectively, and store .

This empirical distribution is used to compute the relevant quantiles at the desired level . Fitting model (7) to the real data this time with the probabilities in (9) and (10) gives us a point estimate for that is used in the three sided test with the chosen tolerance .

4 Description of the NVDRS Dataset

We applied the method just described to a dataset obtained from the National Violent Death Reporting System about child maltreatment that happened from 2010 to 2015. The dataset consists of 1983 deaths of children from ages 0 to 10 years old and 62 variables that describe the deaths. Among these variables, there is one that describes the police report, one for the coroner report and others linked to the death certificate, for example the code for cause of death, the manner of death and some descriptive variables. From these variables, we constructed 3 separate lists, all of them trying to count the total number of deaths caused by child maltreatment. For the police report (LE) and coroner report (CME) there are in the dataset two variables, called LE_DeathAbuse and CME_DeathAbuse, that define which deaths were considered to be caused by child maltreatment according to the police report and coroner’s report respectively. Unfortunately, these variables don’t capture all of the cases, so further analysis is needed. Below are the details of how the cases were defined in the three lists.

  • We started considering as cases the individuals for which LE_DeathAbuse = 1. Moreover we also read the narratives of the victims that were initially coded as controls, to account for the inefficiency of the variable LE_DeathAbuse. Here we followed the definition of abuse contained in Leeb et al. (2008), and in particular we focused on the following aspects:

    • The perpetrator has to be a caregiver for the victim, i.e., a person that at the time of the maltreatment is in a permanent or temporary custodial role.

    • When considering acts of commission (Child Abuse) the action should be intentional, but the harm to the child need not be the intended consequence. The acts of omission (Child Neglect) are more rare because NVDRS tends to not capture them, but also in this case the harm to the child need not be the intended consequence.

    Initially there were 495 cases captured by the variable LE_DeathAbuse regarding individuals of race black or white. We manually added another 189, bringing the total number to 684. Of these, 370 are white and 314 are black.

  • We used the same rule as for the police narrative, initially using the variable CME_DeathAbuse and then reading the remaining narratives.

    Here we initially had 534 cases captured by the variable CME_DeathAbuse, and we manually added another 210. Of the 744 total cases, 411 are white and 333 are black.

  • For creating this list, we didn’t have access to a full narrative, but only to a group of variables. Following a procedure similar to the one described in Klevens and Leeb (2010), we coded a death as being caused by child maltreatment if (1) the ICD-10 code was Y06 or Y07 (where Y06 refers to neglect and abandonment and Y07 to other maltreatment syndromes), (2) the ICD-10 code was one of assault, i.e., X85, X86, X87, X88, X89, X90, X91, X92, X93, X94, X95, X96, X97, X98, X99, Y00, Y01, Y02, Y03, Y04, Y08 or Y09 and the variable DeathCause contained one of the following words: abandon, head trauma, shaken, abusive or abuse, and (3) the ICD-10 code was one of assault and the suspect was the caregiver.

    This list captured 298 white cases and 237 black cases, for a total of 535.

5 Analysis of the NVDRS Data

The total number of cases observed is 921. As anticipated, we define the exposure based on the variable race, and consider exposed the cases where the victim is white and unexposed the cases where the victim is black. The observed ratio of white and black cases is . We want to test if there is any evidence of differential ascertainment, i.e., if the parameter is significantly different from , or if we can say with confidence that , where is the tolerance. If is small, and if we reject and , then the use of the observed ratio instead of the estimated one will not have a big impact on our analysis. Table 4 is the contingency table for the observed cases, divided by whites and blacks (written in boldface).

N ?
Table 4: Contingency table of exposed (white) and unexposed (black) cases. The cell at the bottom right is missing because it corresponds to the count of individuals that are not ascertained in any of the three lists.

We fit the incomplete model (7) to these data using the probabilities in (9) and (10). The order of the lists is the one presented in Table 1, so LE represents list 1, CME list 2 and DC list 3. With this notation, we get the parameter estimates in Table 5.

-0.099 0.059 -0.961 0.830 1.062 2.224 626 506 -0.020
Table 5: MLE estimates of the parameters in the model, when fit on the incomplete contingency tables in Table 4 generated with the NVDRS data.

The estimated values for and tell us that the unobserved values in the missing cell are on average white cases and black cases. Moreover, our estimate of the true underlying ratio of exposed and unexposed cases is , which is extremely close to the observed one. This is consistent with the fact that is very close to compared to the other estimates of lists capture strength. This observation alone is not sufficient, but we can perform a three-sided test to check the significance of this result.

5.1 Three-Sided Test

We first want to find the empirical distribution of under the null hypothesis . To do so, we need to fit the model again on the NVDRS data, this time using the probabilities in (10) for both exposed and unexposed cases. The estimates are reported in Table 6.

-0.110 0.048 -0.972 0.830 1.061 2.225 625 508
Table 6: MLE estimates of the parameters in the model when we impose . These estimates are very similar to the ones in Table 5.

Using those values, we generate new populations, each time removing the count corresponding to the cells . Then we fit the incomplete model, and get the empirical distribution for which is reported in Figure 1, together with some of the quantiles and the observed value .

Figure 1: Histogram of . The black line is the observed value and the other vertical lines are the relevant quantiles for .

Setting , the relevant quantiles of the empirical distribution are and . We immediately notice that, since , there is no value of for which we will reject . Moreover, we define and and notice that we have three possible choices:

  • If we set , then no hypothesis is rejected.

  • If , then but , then we can reject but not . This means that we are confident that .

  • If , then we reject both and , so we are confident that only holds and that real and observed ratio are not further apart than .

0.301 0.252 0.365
0.030 0.030 0.029
0.072 0.067 0.076
0.209 0.198 0.218
0.067 0.075 0.056
0.060 0.069 0.050
0.071 0.073 0.066
0.189 0.235 0.140
Table 7: On the left, the probabilities for the exposed cases when is either the estimated value or , where is the smallest value for which we reject superiority and inferiority. On the right, the probabilities for the unexposed cases.

The main takeaway of this analysis is that, if we are fine with having an indifference region , then we are confident that the real value lies in such indifference region. Moreover, there is no value of that will make us reject the null hypothesis that .

Interpreting the magnitude of is not intuitive. Is a value around too big, or is it acceptable in terms of differential ascertainment? We observe in Table 7 the probabilities for the exposed cases of following in each cell of the contingency table, when gets three possible values: the estimated and the extreme values .

6 Conclusion

In this work, we developed a method to test for the presence of differential ascertainment when data are collected in multiple lists. We showed through a simulation the dangerous effect that differential ascertainment can have on estimating the ratio of exposed over unexposed cases in observational studies, and used the proposed method to detect when it is present and correct its effect.

We applied it to a dataset obtained from the National Violent Death Reporting System, where the cases of child maltreatment that led to death of children from 0 to 10 years old are recorded. When considering race as exposure, the observed ratio of the cases is . We showed that the estimated ratio is very close to the observed one, and used three-sided testing to provide a guarantee on the significance of this similarity.

The proposed method can also be applied to datasets with different sources that try to estimate the total number of casualties during a war. Since the total number of victims is usually impossible to get, one can use capture-recapture to estimate the total number. However, the presence of differential ascertainment can create complications in obtaining a precise estimate, and it is therefore advisable to first check if there is significant evidence of differential ascertainment.


The authors are grateful to Dr. Joanne Klevens for useful discussions and suggestions on the NVDRS data.


  • Ball et al. (2002) Ball, P., Betts, W., Scheuren, F., Dudukovich, J., and Asher, J. (2002). Killings and refugee flow in kosovo march-june 1999. Report to the Intl Criminal Tribunal for the former Yugoslavia.
  • Bishop et al. (2007) Bishop, Y. M., Fienberg, S. E., and Holland, P. W. (2007).

    Discrete multivariate analysis: theory and practice

    Springer Science & Business Media.
  • Blair et al. (2016) Blair, J. M., Fowler, K. A., Jack, S. P., and Crosby, A. E. (2016). The national violent death reporting system: overview and future directions. Injury prevention, 22(Suppl 1):i6–i11.
  • Domingo-Salvany et al. (1995) Domingo-Salvany, A., Hartnoll, R. L., Maguire, A., Suelves, J. M., and Anto, J. (1995). Use of capture-recapture to estimate the prevalence of opiate addiction in barcelona, spain, 1989. American Journal of Epidemiology, 141(6):567–574.
  • Goeman et al. (2010) Goeman, J. J., Solari, A., and Stijnen, T. (2010). Three-sided hypothesis testing: Simultaneous testing of superiority, equivalence and inferiority. Statistics in medicine, 29(20):2117–2125.
  • Herman-Giddens et al. (1999) Herman-Giddens, M. E., Brown, G., Verbiest, S., Carlson, P. J., Hooten, E. G., Howell, E., and Butts, J. D. (1999). Underascertainment of child abuse mortality in the united states. Jama, 282(5):463–467.
  • International Working Group for Disease Monitoring and Forecasting (1995) International Working Group for Disease Monitoring and Forecasting (1995). Capture-recapture and multiple-record systems estimation i: History and theoretical development. American Journal of Epidemiology, 142(10):1047–1058.
  • Keogh and Cox (2014) Keogh, R. H. and Cox, D. R. (2014). Case-control studies, volume 4. Cambridge University Press.
  • Klevens and Leeb (2010) Klevens, J. and Leeb, R. T. (2010). Child maltreatment fatalities in children under 5: Findings from the national violence death reporting system. Child abuse & neglect, 34(4):262–266.
  • Leeb et al. (2008) Leeb, R., Paulozzi, L., Melanson, C., Simon, T., and Arias, I. (2008). Child maltreatment surveillance: Uniform definitions for public health and recommended data elements. Technical report, Centers for Disease Control and Prevention (CDC).
  • Rasch (1960) Rasch, G. (1960). Studies in mathematical psychology: I. probabilistic models for some intelligence and attainment tests.
  • Verhelst and Glas (1993) Verhelst, N. D. and Glas, C. A. (1993). A dynamic generalization of the rasch model. Psychometrika, 58(3):395–415.
  • White (1982) White, G. C. (1982). Capture-recapture and removal methods for sampling closed populations. Los Alamos National Laboratory.

Appendix A Capture-Recapture

Capture-recapture is a method commonly used to estimate an animal population size in a determined region, when a direct count is impossible. For a full discussion, see White (1982). First, a certain number of animals is captured, marked and released. After some time, when it is possible to assume that the marked animals are well mixed among the others, another portion of the population is captured and the number of marked elements is counted. From this simple procedure, it is possible to get an estimate of the total population with the so called Lincoln-Petersen estimator

Here is the number of animals marked first and then recaptured, while and are respectively the number of animals only captured the first and the second time. This idea can be extended considering more than two phases, but for our purposes we drop the temporal interpretation and instead use different lists that are all trying to count the total number of elements in the population. As in Bishop et al. (2007), we consider a log-linear model to extend the two lists case. When we are dealing with three lists, for example, we can write

where is a value in a cell of the contingency Table 1. This model is called saturated, and all the possible interaction parameters are included (note that the three-lists interaction term is not included to guarantee identifiability, since only 7 values are observed). If any of the coefficients of the model is set in advance to be null, then we talk about a reduced model. The estimate of the total population, in any dimension , is where is the number of observed individuals and

We define to be the product of all the maximum likelihood estimates with equal to an odd number, and analogously for .

In this framework, one can perform model selection in various ways. For example considering Akaike’s Information Criterion (AIC), Bayesian information criterion (BIC) or the Pearson chi-square statistics

where, with a notation similar but not exactly the same as before, and are respectively the observed and estimated data counts in the cell, and is the total number of observed cells in the three-list case. We decided to use the method proposed in Ball et al. (2002), where an upper and lower bound are set on the p-values of the observed Pearson chi square statistic. Since we only have few models to consider, our upper bound simply consists in excluding the saturated model, that perfectly fits the data and has a p-value of . With a lower bound of , the only available model for both exposed and unexposed cases is the one that sets to zero the coefficient of the interaction between the lists LE and DC. Using the values of estimated from this model, the number of unobserved exposed cases is estimated to be , while for the unobserved unexposed cases is . We can then assume that the total counts of cases is exposed and unexposed, which will be considered fixed. The only source of randomness comes from the multinomial distributions of these cases in the two contingency tables. Here, our estimate of the true underlying ratio of exposed and unexposed cases is , again very close, even if slightly bigger, to the observed ratio and the one estimated in Section 5. The estimates that we get from this model are reported in Table 8.

0.023 0.315 -0.667 0.584 0.874 2.012 -0.037
Table 8: MLE estimates of the parameters in the model. We used capture-recapture with model selection to estimate the count of the missing cells.

We notice that, like before, the estimated value of is negative and small in magnitude. We now need to find its empirical distribution under to perform a three-sided test.

If we decided to just consider the saturated model and not use any type of model selection, we would have gotten an estimated count of 126 for the unobserved exposed cases and of 84 for the unobserved unexposed ones. The numbers in the three-sided test would be slightly different, but not the overall results.

a.1 Three-Sided Test

We start by fitting the model with on the complete tables, with the numbers and in the bottom right cells of the exposed and unexposed contingency tables respectively. The estimates we got are reported in Table 9.

0.002 0.294 -0.688 0.584 0.873 2.013
Table 9: MLE estimates of the parameters in the model when we imposed . We used capture-recapture with model selection to estimate the count of the missing cells.

Using them, we generate new populations from two multinomial distributions with probabilities given by (9) or (10). We then fit the complete model on these populations, and get the empirical distribution for which is plotted in Figure 2, together with the relevant quantiles. For , these are and . We set and . Similar to what happened before, we have again three options

  • If we set , then no hypothesis is rejected.

  • If , then but , then we can reject but not . This means that we are confident that .

  • If , then we reject both and , so we are confident that only holds and that real and observed ratio are not further apart then .

Figure 2: Histogram of when using capture-recapture with model selection. The black line is the observed value , the other vertical lines are the relevant quantiles for .

This time the value of is a bit smaller then before. We still don’t have a natural intuition on how to quantify the differential ascertainment that it can cause, so we compute in Table 10 the capture probabilities for each cell for exposed and unexposed cases, where can assume value or .

0.313 0.278 0.372
0.035 0.036 0.034
0.080 0.077 0.083
0.225 0.220 0.229
0.067 0.074 0.057
0.061 0.067 0.051
0.072 0.075 0.065
0.145 0.172 0.108
Table 10: On the left, the probabilities for the exposed cases when is either the estimated value or , where is the smallest value for which we reject superiority and inferiority. On the right, the probabilities for the unexposed cases. We are using the estimates obtained by fitting the model on the complete tables, after using capture-recapture with the best model selected.

Appendix B An Extension of the Model

As anticipated in Section 2, we can consider an extension of the model that keeps into account the individual differences in the propensity to be captured. Instead of simply setting and for any , , we allow these values to vary, and assume they are normally distributed, and . Each value is generated independently, and the parameter now represents the amount of differential ascertainment.

We start by examining the situation in which the two contingency tables are complete. In our main example on NVDRS data, this means that we apply the capture-recapture method described in Appendix A, and add the values 85 and 63 in the empty cells of the exposed and unexposed contingency tables respectively. We can consider the total counts as fixed, and the likelihood of each contingency table is then:


Here is the multinomial likelihood where the total count is and the probabilities are given in (9) or (10). The density is the normal density with mean for the exposed, and for the unexposed cases. When fitting this likelihood on the NVDRS data, again using the independence assumption 2.1, we get the estimates reported in Table 11.

0.023 0.315 -0.667 0.585 0.874 2.012 -0.038 0.0010
Table 11: MLE estimates of the parameters in the extended model. We used capture-recapture with model selection to estimate the count of the missing cells.

We happily notice that the values in Table 11 are extremely similar to the ones in Table 8, and that is very small. This suggests that the original model with and fixed is appropriate for the data we are studying. To confirm this theory, we consider the natural extension of our original model, where the contingency tables are incomplete and the individual capture strengths are normally distributed. We assume as before that . Starting from (2), and treating the complete contingency table as in (11), we have


Fitting this likelihood to our data, we get the estimates in Table 12.

-0.099 0.059 -0.961 0.830 1.062 2.224 626 506 -0.020 0.0004
Table 12: MLE estimates of the parameters in the extended model when using the Poisson assumption on the total count of cases.

Again it looks like allowing variability in the individual capture strength is not necessary, since the estimates are nearly the same as the ones in Table 5, and is very small.