Inference for a test-negative case-control study with added controls

05/14/2020 ∙ by Bikram Karmakar, et al. ∙ University of Florida 0

Test-negative designs with added controls have recently been proposed to study COVID-19. An individual is test-positive or test-negative accordingly if they took a test for a disease but tested positive or tested negative. Adding a control group to a comparison of test-positives vs test-negatives is useful since additional comparison of test-positives vs controls can have potential biases different from the first comparison. Bonferroni correction ensures necessary type-I error control for these two comparisons done simultaneously. We propose two new methods for inference which have better interpretability and higher statistical power for these designs. These methods add a third comparison that is essentially independent of the first comparison, but our proposed second method often pays much less for these three comparisons than what a Bonferroni correction would pay for the two comparisons.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 6

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

References

  • Karmakar et al. 2020 Bikram Karmakar, Chyke A Doubeni, and Dylan S Small. Evidence factors in a case-control study with application to the effect of flexible sigmoidoscopy screening on colorectal cancer. Annals of Applied Statistics, forthcoming, 2020.
  • Miyara et al. 2020 Makoto Miyara, Florence Tubach, and Zahir Amoura. Low incidence of daily active tobacco smoking in patients with symptomatic covid-19 infection. Preprint, 04 2020. doi: 10.32388/WPP19W.
  • Rosenbaum 2010 Paul R Rosenbaum. Evidence factors in observational studies. Biometrika, 97(2):333–345, 2010.
  • Søgaard et al. 2017 Mette Søgaard, Uffe Heide-Jørgensen, Jan P Vandenbroucke, Henrik C Schønheyder, and CMJE Vandenbroucke-Grauls. Risk factors for extended-spectrum -lactamase-producing escherichia coli urinary tract infection in the community in denmark: a case–control study. Clinical Microbiology and Infection, 23(12):952–960, 2017.
  • Sullivan et al. 2016 Sheena G Sullivan, Eric J Tchetgen Tchetgen, and Benjamin J Cowling. Theoretical basis of the test-negative study design for assessment of influenza vaccine effectiveness, 2016.
  • Susser 1973 Mervyn Susser. Causal thinking in the health sciences: concepts and strategies of epidemiology. In Causal thinking in the health sciences: concepts and strategies of epidemiology. 1973.
  • Vandenbroucke and Pearce 2019 Jan P Vandenbroucke and Neil Pearce. Test-negative designs: Differences and commonalities with other case–control studies with “other patient” controls. Epidemiology, 30(6):838–844, 2019.
  • Vandenbroucke et al. 2020 Jan P Vandenbroucke, Elizabeth B Brickley, Christina MJE Vandenbroucke-Grauls, and Neil Pearce. Analysis proposals for test-negative design and matched case-control studies during widespread testing of symptomatic persons for sars-cov-2. arXiv preprint arXiv:2004.06033, 2020.

1 Familywise error rate control

Setup: Consider the following three null hypotheses: , no difference in exposure between test-positives and test-negatives; , no difference in exposure between test-positives and controls; and , no difference in the test-positives or test-negatives and controls. In the following , and correspond to the three p-values calculated for these hypotheses from the corresponding comparisons.

In this setup a method provides a level

familywise error rate control if the probability of rejecting any true null hypothesis among the three null hypotheses is at most

. In the following we let denote the event that at least one of the nulls are rejected among where . We show here that familywise error rate is controlled for both Method 1 and Method 2.

Method 1. Note first that is false when and only when one of or were false.

Since Method 1 can reject in step (2) only when both and are rejected at step (1), we have , hence .

To show familywise error rate control, consider now the different possibilities of the three hypotheses being true or false separately.

(a)  When all three hypotheses are true, the familywise error rate is

(b)  When is true but is false, hence is false, the familywise error rate is

(c)  Finally, when is true but is false, hence is false, the familywise error rate is

Hence, the familywise error rate is always controlled.

Method 2.  First we expand the notation to denote the event that at least one of the nulls are rejected among where , where . Thus, is false is the same as at least one is false, and only when both and are true we will have true.

Now we use the result that and are essentially independent and , Fisher’s combination of these two p-values, is a valid p-value under .1 (see footnote)

Two analyses are essentially independent if the joint distribution of the p-values from these analyses is stochastically larger than the uniform distribution on unit square. Here, (i) and (iii) are nearly independent since we can show

for all . With larger sample size this inequality becomes sharper, and asymptotically they are independent.

Consider again the different combinations of the three hypotheses being true or false. We can reduce some effort in this enumeration by noting that is false when and only when one of or were false.

(a)  When all three of , and are true, the familywise error rate is

(b)  When is true but is false, hence is false, the familywise error rate is

(c)  Finally, when is true but is false, hence is false, the familywise error rate is

Hence, the familywise error rate is always controlled.

2 Confidence sets for the magnitude of effects

Notation:  We can create confidence sets for the effects of the exposure using the methods discussed in the letter. Some new notation are needed. In the following a subscript is for test-positives, for test-negatives, and for the added controls. Also, with appropriate subscript denotes the counts of a particular group of individuals. For example, denotes the number of exposed test-positives and the number of unexposed test-negatives, and is the number of exposed test-positives or test-negatives.

Data tables:  The collected data can be tabulated in three tables corresponding to the three comparisons (i), (ii) and (iii).

Comparison (i) Exposed Unexposed Test-positive Test-negative Total        Comparison (ii) Exposed Unexposed Test-positive Control Total

Comparison (iii) Exposed Unexposed Test-positive or negative Control Total

A p-value for a given one of the three comparisons can be calculated from the corresponding table, e.g., using Fisher’s exact test. For example, is the p-value calculated from the 2-by-2 table above with the numbers and .

Effects of interest:  We have three effects of interest for the exposure, between test-positives and test-negatives, between test-positives and controls, and one between test-negatives and controls. We denote these effects as and , which are defined below. These are called attributable effects.

The effect is the ratio of the number of individuals who became test-positive because of the exposure, but in the absence of it would have been test-negative minus the number of individuals who became test-negative because of the exposure but in the absence of it would have been test-positive, divided by the number of exposed test-positives or test-negatives. Notice that is a number between -1 and 1; if exposure did not move anyone from being test-positive compared to test-negative without exposure or the reverse. If is positive, there individuals for whom the exposure caused them to become test-positive. Similarly, if is negative, there are individuals for whom the exposure caused them to become test-negative. In summary, is the net effect of the exposure on becoming test-positive over test-negative for exposed tested individuals.

The second effect is defined similarly. By our definition, is the net effect of the exposure for test-positives versus controls relative to all exposed individuals either test-positive or control. We have if the exposure did not make any change in who became test-positive over control or the reverse.

Finally, we define a third attributable effect in the same way to denote the net effect of the exposure on becoming test-negative over control for all exposed non test-positive individuals.

A method that calculates p-values using the three tables above is testing the hypothesis of no effect of the exposure that and .

Confidence sets:  We construct confidence sets for the effects and . To do this we have to explain how to test that and where and could be different from 0, not no effect of the exposure. When they are different from 0, we adjust the observed tables based on these effects to create tables of the potential outcomes under no exposure.

Adjusted comparison (i) Exposed Unexposed Test-positive Test-negative

Adjusted comparison (ii) Exposed Unexposed Test-positive Control

Adjusted comparison (iii) Exposed Unexposed Test-positive or negative Control

Using either Method 1 or Method 2 we could test these three tables at level . Either method will make decisions to reject or not reject these adjusted tables. Then we write and

as binary variables which are 1 or 0 according to whether comparison (i), (ii) or (iii) is rejected, respectively, based on these adjusted tables. Our confidence interval is

Since either method performed at level provides familywise error rate control at , this confidence interval will have a minimal coverage of for both Method 1 and Method 2.

3 R code to implement new method (2)

Let p_i, p_ii and p_iii be variables in R that record the p-values from the three comparisons. They can be calculated using the syntax p_i = fisher.test(e_i, g_i)$p where e_i is a variable recording of exposure status, and g_i is a variable recording the case status only for the test-positives and test-negatives. e_ii, g_ii and e_iii, g_iii have the same role in the following code corresponding to the comparisons (i) and (iii) respectively.

## Significance level for familywise error rate control
alpha <- 0.05
alpha.2 <- alpha/2

## p-values computed from the three comparisons
p_i = fisher.test(e_i, g_i)$p
p_ii = fisher.test(e_ii, g_ii)$p
p_iii = fisher.test(e_iii, g_iii)$p

### Start of Method 2Ψ###
r_i = r_ii = r_iii = 0 ΨΨ# an inference for reject, value 1, or 0.
## Step (1)
r_ii = 1*(p_ii < alpha.2)
lambda = ifelse(r_ii, alpha, alpha-alpha.2)
## Step (2)
# Fisher’s combination
p_i_or_iii = pchisq(-2*log(p_i*p_iii), 4, lower.tail=FALSE)
r_i_or_iii = 1*(p_i_or_iii < lambda)
## Step (3)
if(r_i_or_iii)
     r_i = 1*(r_i < lambda);    r_iii = 1*(r_iii < lambda)
## Step (4)
if(r_i & r_iii)  r_ii = 1*(p_ii < alpha)
### Final inference
c(r_i, r_ii, r_iii)
#### END OF CODE ####