1 Brief study overview
The SEARCHIPT Study is a cluster randomized trial designed to evaluate whether a multicomponent intervention increases uptake of isoniazid (INH) preventive therapy (IPT) and reduces the incidence of tuberculosis (TB) in Uganda (Clinicaltrials.gov: NCT03315962). Details of the multiphase trial design and procedures can be found in the Study Protocol. Here, we provide the statistical analysis plan for health outcomes in Phase 1 of the study. Analysis plans for qualitative outcomes, costeffectiveness outcomes, and business outcomes are available elsewhere. A history of changes to this analysis plan is given in the Appendix.
Study design: Phase 1 of the SEARCHIPT Study aims to increase provider prescription and patient use of IPT among adults in HIV care in Uganda. Within the Eastern, EastCentral, and Southwestern regions of Uganda, districts were first grouped into 14 clusters of 47 districts. To improve statistical power [1, 2], these groups were pairmatched on characteristics expected to be predictive of IPT uptake: region, number of adults in active HIV care, urbanicity, and presence of a SEARCH universal HIV “test and treat” trial site. Within each pair, the groups were then randomized to intervention or the standardofcare. There are 7 groups in each arm. Formal power calculations are given in the Appendix.
Study arms: In districts randomized to the intervention, the groups are the minicollaboratives in which the intervention package is delivered to District Health Officers (DHOs) and District TB/Leprosy Supervisors (DTLSs), midlevel health managers who oversee health service delivery (DHOs) and TBspecific activities (DTLSs) in Uganda. The SEARCHIPT intervention package is detailed in the Study Protocol. In brief, the social networkbased, behavioral change model consists of teaching minicollaboratives with business leadership and management training; reporting, review and and discussion of IPT uptake at minicollaborative meetings; and a twoway SMS system for DHOs/DTLSs and their frontline providers to help enable IPT uptake.
Districts randomized to the control receive no study interventions, beyond ensuring that the Ministry of Health IPT guidelines are disseminated.
Study timeline: As described elsewhere, there were many events unrelated to the study, which are likely to impact health outcomes in Phase 1. Immediately prior to study launch (end of 2017), there was a massive shortage of IPT that lasted until the end of 2018. Then starting in the third quarter of 2019 (Q32019), the Ministry of Health led a countrywide program to increase IPT use with support from PEPFAR (hereafter called “the 100day IPT push”). Additionally in 2019, the Ugandan government divided select districts into two new districts. Finally, the COVID19 pandemic began in March 2020.
To account for drug shortages and answer the most relevant scientific question, the measurement period for the main study outcomes (IPT uptake and HIVassociated TB incidence) will begin at the quarter corresponding to the firstyear meeting of the minicollaboratives, which occurred in the intervention arm only. Specifically, followup begins in Q12019 in Southwestern districts and in Q22019 in the Eastern and EastCentral districts. All districts are followed through the close of Phase 1, and analyses will be over two years (8 quarters) of followup. Data from the quarter immediately preceding the followup period (i.e., Q42018 in the Southwest and Q12019 in the East and EastCentral) are assumed to provide baseline data.
Descriptive analyses: At baseline, we will describe the study population by region (East, EastCentral, and Southwest) by providing summary measures of districtlevel characteristics, including but not limited to

Number of districts

Number of participating DHOs/DTLSs

Number of adults in active HIV care at the districtlevel

Number of adults in active HIV care in two largest clinics in each district

HIV prevalence

Incidence of IPT uptake

Incidence of HIVassociated TB disease
We will also provide these summary measures by arm. We will additionally provide descriptions of secular events and nonstudy activities that are expected to impact study outcomes.
2 Primary outcome: IPT uptake
The primary endpoint is “IPT uptake”: the rate at which eligible HIVpositive adults receive an IPT prescription in health facilities overseen by DHOs participating in the SEARCHIPT trial. This endpoint will be measured over 8 quarters (starting Q12019 in the Southwest and in Q22019 in the East and EastCentral) among adults in active HIV care in two clinics in each district in the trial. These clinics were selected based on size, generally the biggest, and are assumed to be representative of the overall district. (This assumption of generalizability may or may not be reasonable in smaller and more rural districts.)
Data source: For each clinic, aggregate data on the total number of IPT starts and the total number of adults in active HIV care in each quarter will be extracted from the Ministry of Health HMIS database. Data on IPT initiation are available by sex, and data on adults in active care are available by sex and 1st/2nd/3rd line ARV regime. Primary analyses will pool over sex and ARV regime. Secondary analyses will be stratified on sex.
Missing data: There is no reason to believe data will be differentially captured by arm. Additionally, we are unable to differentiate between missing data on IPT prescriptions and simply 0 prescriptions for that clinicquarter. Therefore, if data are missing on IPT prescriptions for a given clinicquarter, we will assume there were 0 starts in that clinicquarter; this will provide a lower bound on the total IPT starts for that clinic. However, if for a given clinic, data on IPT prescriptions are missing for all quarters, we will exclude that clinic.
Likewise, there is no way to differentiate between missing data on the number of patients 2nd/3rd line ARV regimens or simply 0 participants on those lines. If data on the number of patients on 2nd or 3rd line ARV regimen are missing, we will assume that 0 patients are on those lines for that clinicquarter. We do not anticipate data on the active care size to be completely missing for a given clinicquarter; if this happens, we will estimate the clinicquarter active care size by averaging the size in the prior clinicquarter with the size in the subsequent clinicquarter.
Incidence rate calculations:
Because the number of adults in HIV care changes over time and the probability of starting IPT is not uniform over the 8 quarters of followup, we focus on estimation and inference of the incidence rate of IPT initiation, as opposed to the cumulative incidence. For each clinic, we will determine the total number of IPT starts and the total persontimeatrisk of IPT initiation (described below) over the twoyear followup period. In primary analyses, we will calculate the districtlevel incidence rate as the total starts in a district (summed over the 2 clinics) divided by the total persontimeatrisk for that district (again summed over the 2 clinics). In secondary analyses, we will calculate the districtlevel incidence rate by averaging the clinicspecific incidence rates within each district.
Because data for each clinicquarter are only available at the aggregatelevel, we will take the following approach to calculating the persontimeatrisk (PT). In line with standard practice [3], we will assume that IPT initiations occur uniformly in each clinicquarter. In primary analyses, we will also assume that all who started IPT during followup remain in active HIV care for the remainder of followup. While the latter assumption is supported by prescribing practices (i.e., to patients who are stable in care), we will conduct two sensitivity analyses. In the first, we will assume that 97.5% of persons who previously started IPT remain in active care at the same clinic from quartertoquarter. In the second, we will assume that 0% of persons who previously started IPT remain in active care from quartertoquarter. The second sensitivity analysis is expected to be extremely conservative and will provide a lower bound in the estimated incidence rates. Altogether, PT for each clinicquarter will be calculated as the quarterspecific number in active HIV care minus half the IPT starts in that quarter and X% of the previous IPT starts, where X%=100% in the primary analysis, 97.5% in the first sensitivity analysis, and 0% in the second sensitivity analysis. The resulting incidences rates will be converted from being in ‘clinicquarters’ to ‘personyears’ for ease of interpretation.
In a secondary analysis, we will calculate the twoyear, cumulative incidence of IPT initiation as the total number of IPT starts, divided by the average active care size over followup. As before, we will calculate the districtlevel cumulative incidence in two ways: (1) the total starts in a district (summed over the 2 clinics) divided by the total active care size for that district (again summed over the 2 clinics); and (2) by averaging the clinicspecific cumulative incidences within each district. We will also report total starts by arm.
All analyses will assume that all adults in active HIV care are eligible for IPT at the beginning of followup (Q12019 in the Southwest and Q22019 in the East/EastCentral). By randomization, the number in active HIV care and the number who had previously started IPT should be balanced between arms.
Effect estimation: To estimate the effect of the SEARCHIPT intervention on IPT uptake, we will compare IPT incidence rates by arm using a twostage approach that appropriately accounts for the cluster randomized study design [1]. In Stage 1, the incidence of IPT uptake will be estimated for each district, as described above. Then in Stage 2, uptake will be compared between arms with targeted minimum lossbased estimation (TMLE) [4]
, adaptively selecting the districtlevel adjustment variables that optimize precision, while preserving TypeI error control
[5, 6]. Specifically, we will use leaveoneout crossvalidation to select from the following set of baseline covariates the ones which maximize empirical efficiency: baseline IPT uptake, baseline active care size, or nothing (unadjusted). In secondary analyses, we will implement an unadjusted effect estimator as the contrast in the average IPT uptake between arms.The primary effect measure will be on the relative scale and for the districts involved in the study (i.e., the sample rate ratio) [7, 8, 9, 10, 11]:
where are weights for the district and is the counterfactual IPT uptake over the twoyears of followup if possibly contrarytofact district were in treatment arm . We will test the null hypothesis that the SEARCHIPT intervention did not improve IPT uptake, as compared to standardofcare, with a onesided test at the 5% significance level. In secondary analyses, we will examine the effect on the absolute scale:
. We will report point estimates and twosided 95% confidence intervals for the armspecific endpoints as well as the relative and absolute effects.
Primary analyses will weight the districts equally: =1 for all . In secondary analyses, we will evaluate the effect at the cliniclevel, and in sensitivity analyses, we will weight proportionally to active care size [12].
Primary analyses will include all districts, including those created during followup. Secondary analyses will restrict to the original districts, who had the longest exposure to the intervention.
Statistical inference: Recall that 47 districts were put into groups for the purposes of pairmatching and treatment randomization. In the intervention arm, these clusters are the basis for minicollaborative activities. In the control arm, there are no minicollaborative meetings or any other study activities. Instead, the control condition corresponds with the standard of care. Therefore, it is reasonable to assume districts in the control arm are independent.
Given this independence structure (at the minicollaborative level in the intervention arm and at the district level in the control arm), we take the following approach to statistical inference. In the primary analysis, we will assume there are 7 independent units corresponding to the minicollaboratives in the intervention arm, and 40 independent units corresponding to the districts in the control arm (total ). In sensitivity analyses, we will treat the randomized cluster as the independent unit (total
). In all analyses, we will use the influence function for standard error estimation
[11] and the Student’s distribution with degrees of freedom as finite sample approximation to the standard normal distribution [1].Primary analyses will break the matched pairs used for randomization. For completeness, we will also conduct a paired ttest (
/arm), implement a permutation test, and also estimate the intervention effect with mixed (random) effects models and with generalized estimating equations (GEE) [13, 14].Sensitivity analyses
: In addition to the sensitivity analyses described above, we will conduct the following sensitivity analyses. To minimize the impact of secular trends, we will exclude the “100day IPT push” (Q32019). To better understand the impact the COVID19 pandemic which caused a lockdown to be initiated in Q22020, we will conduct an analysis excluding the postlockdown period and an analysis excluding the prelockdown period. Given the expected skewed distribution of clinic sizes, we will conduct two sensitivity analyses excluding clinics with baseline active care size exceeding 2500 patients and exceeding 5000 patients.
Subgroup analyses: We will repeat the above analyses within strata defined by sex, baseline active care size ( 1000 vs. 1000), and region.
Additional analyses: We will examine how IPT uptake varies over time by plotting the quarterly incidence rate (as described above) and the cumulative incidence (quarterspecific starts divided by quarterspecific number of adults on ART) over time by arm, region, and sex. We will also formally evaluate trends over time using TMLE to estimate parameters of a marginal structural model and using GEE [4, 14, 15]. We will also examine how DHO/DTLSs participation mediates the intervention effect with TMLE. Finally, we will conduct an “astreated” analysis where intervention districts whose DHOs/DTLSs had poor participation in minicollaboratives are analyzed as receiving the standardofcare (i.e., control).
3 Secondary outcome: HIVassociated TB incidence
We will also examine the impact of the SEARCHIPT intervention on “HIVassociated TB incidence”: the rate at which persons in active HIV care are diagnosed with TB disease in districts overseen by DHOs participating in the SEARCHIPT trial. Thus, the target and analytic population include all persons in active HIV care, regardless of their age, sex, ARV regimen, IPT eligibility, or IPT uptake. As with the primary endpoint, TB incidence will be measured over 8 quarters (starting Q12019 in the Southwest and in Q22019 in the East and EastCentral).
Data source: For each district, aggregate data on the total number of TB diagnoses among persons in active HIV care and the total number of persons in active HIV care in each quarter will be extracted from the Ministry of Health HMIS database. While data on the active size are available by 1st/2nd/3rd line ARV regime, all analyses will pool ARV regime.
Missing data: There is no reason to believe data will be differentially captured by arm. As before, we are unable to distinguish between missing values and 0 persons diagnosed with TB. Therefore, if data are missing on TB cases for a given districtquarter, we will assume there were 0 cases in that districtquarter. However, if for a given district, data on TB cases are missing for all quarters, we will exclude that district from the analysis.
If data on the number of patients on 2nd or 3rd line ARV regimen are missing, we will again assume that 0 patients are on those lines in that districtquarter. We do not anticipate data on the active care size to be completely missing for a given districtquarter; if this happens, we will estimate the districtquarter active care size by averaging the size in the prior districtquarter with the size in the subsequent districtquarter.
Incidence calculations: Because the number of persons in active HIV care changes over time and the rate of TB disease diagnosis may not be uniform over the 8 quarters of followup, we again focus on the incidence rate, as opposed to the cumulative incidence. For each district, we will estimate the TB incidence rate as the total number of cases divided by the total persontimeatrisk (PT) over the twoyear followup period. As before, when calculating the incidence rate, we will assume events (here, TB disease diagnoses) occur uniformly in each districtquarter and that a fixed proportion of prior TB cases remain in active care within the same district from quartertoquarter. Then in each districtquarter, the PT will be calculated as the quarterspecific number in active HIV care minus half the cases occurring in that quarter and X% of the previous cases, where X%=100% in the primary analysis, 97.5% in the first sensitivity analysis, and 0% in the second sensitivity analysis. The resulting incidences rates will again be converted from being in ‘clinicquarters’ to ‘personyears’ for ease of interpretation. In secondary analyses, we will estimate the districtspecific cumulative incidence of TB as the total number of cases divided by the average active care size. We will also report total number of TB diseases diagnoses by arm.
All analyses will assume all persons in active HIV care are at risk of TB disease at the beginning of followup (Q12019 in the Southwest and Q22019 in the East/EastCentral). By randomization, the number in active HIV care and the number who had previously been diagnosed with TB disease should be balanced between arms.
Effect estimation & statistical inference: To estimate and obtain inference for the effect of the SEARCHIPT intervention on HIVassociated TB, we will compare the districtlevel TB incidence rates by arms using an analogous twostage approach as for the primary outcome. In Stage 1, the incidence of TB diagnoses will be estimated for each district, as described above. Then in Stage 2, districtlevel incidence will be compared between arms with TMLE to adaptively select from the following set of districtlevel adjustment variables the ones that optimize precision: baseline TB incidence, baseline active care size, or nothing (unadjusted). In secondary analyses, we will implement the unadjusted effect estimator as the contrast in the average TB incidence rates between arms. We will test the null hypothesis that the SEARCHIPT intervention did not reduce the incidence of HIVassociated TB, as compared to standardofcare, with a onesided test at the 5% significance level.
As before, the primary effect measure will be on the relative scale and for the districts involved in the study (i.e., the sample incidence ratio). In secondary analyses, we will examine the effect on the absolute scale. We will also report point estimates and twosided 95% confidence intervals for the armspecific endpoints and effect estimates. Primary analyses will weight the districts equally, while sensitivity analyses will weight proportionally to baseline active care size [12]. We will use the same approach to statistical inference as the primary endpoint.
Sensitivity analyses: We will also conduct the sensitivity analyses, as specified for the primary endpoint. Here, we will exclude districts whose baseline active care size is 10,000 patients and 20,000 patients.
Subgroup analyses: We will repeat the above analyses within strata defined by baseline active care size ( 5000 vs. 5000), region, and age (pending data availability).
Additional analyses: We will examine how TB incidence varies over time by plotting the quarterly incidence rate (as described above) and the cumulative incidence (quarterspecific number of TB cases divided by quarterspecific number on ART) over time by arm and region. Using analogous methods to the primary endpoint, we will evaluate time trends and the impact of DHO/DTLSs participation.
Secondary endpoint: Pending data availability, we will repeat the above analyses to examine the SEARCHIPT effect on TB treatment initiation.
4 Secondary outcome: Knowledge, attitudes, and practices regarding IPT
To assess mechanisms through which the intervention operates, we will conduct annual quantitative surveys among DHOs and DTLSs to evaluate changes in their knowledge, attitudes, and practices regarding IPT. Details of these surveys are available elsewhere; here, we focus on the statistical analysis to evaluate change from baseline in familiarity with IPT, knowledge of IPT’s health benefits, and practical challenges in TB management. Specific survey questions and their coding are given in the Appendix.
Data source: Surveys will be administrated to all DHOs and DTLSs by study staff at the time of randomization and then 1 and 2 years later. Study staff will attempt to reach and survey all participants in both arms.
As an example of a question, consider “On a scale of 1 to 5, with 5 being very familiar with IPT, and 1, not knowing much yet about IPT, how familiar are you with IPT?” The possible answers are “1: No knowledge of IPT; 2: Somewhat unfamiliar–low knowledge of IPT; 3: Somewhat familiar; 4: Familiar; 5: Very familiar–high knowledge of IPT”. Thus, the survey questions will elicit Likerttype answers with evenly spaced responses. In line with standard practice for such outcomes [1], we will evaluate them quantitatively. Specifically, for each district, we will calculate the average response among its representatives at baseline, at year1, and at year2.
Effect estimation: To estimate and obtain inference for the effect of the SEARCHIPT intervention on IPT knowledge/attitudes/practices, we will first calculate a changeoutcome as the average response at followup minus the average response at baseline in each district. We will then compare these outcomes with the unadjusted effect estimator (equivalent to the Student’s test) and test the null hypothesis of no improvement in IPT knowledge/attitudes/practice, as compared to standardofcare, with a onesided test at the 5% significance level.
We will focus on effect measures on the absolute scale and for the districts with both baseline and followup data. Primary analyses will focus on the change from baseline to year1. Secondary analyses will focus on change from baseline to year2 and from year1 to year2. We will report point estimates and twosided 95% confidence intervals for the armspecific annual responses, the armspecific estimates of change, and the effect estimates. Primary analyses will weigh the districts equally and include all districts, including those created during followup. We will use the same approach to statistical inference as the primary endpoint.
Additional analyses: We will repeat the above analyses stratifying on whether the respondent is a DHO or a DTLS.
5 Secondary outcome: IPT completion among persons starting IPT
A key intermediate between IPT uptake and TB prevention is IPT completion. Therefore, we will also assess the SEARCHIPT intervention on “IPT completion”, defined as dispensation of a full course of IPT within 9 months of initiation. As detailed in the corresponding SOP, this endpoint will be measured among (400 persons/arm) who are aged 15+ years, HIVinfected, and started IPT at one of 16 facilities in Southwestern region. Within each clinic, these participants will be sampled such that half initiated IPT before Q32019 and half initiated in Q32019 (during the “100day IPT push”).
Data source: For each sampled participant, deidentified data on IPT completion (via INH pills dispensed over 9 months) will be obtained from the IPT register and blue cards. Additionally, data on the following variables will be recorded: gender, date of birth, marital status, ARV regimen, ARV switches while on IPT (date and new regimen), most recent HIV RNA viral load, if on Septrin or Dapsone while taking INH, and diagnosis of active TB.
Effect estimation: When evaluating IPT completion among the subsample of initiators, we will evaluate the armspecific and relative risk of IPT completion with an individuallevel TMLE, adjusting for sex, age, and timing of IPT initiation. The details of this approach are given in van der Laan et al. [16]. In brief, this approach assumes the adjustment covariates are sufficient to “block” the impact of shared factors that may differ between clusters, here clinics. In other words, two participants from two different clinics would have the same outcome probability if they had the same gender, age, and IPT initiation date and their respective clinics had received the same level of the treatment. This approach also relies on the assumption that sampled participants are conditionally independent, given the treatment assignment and the adjustment set. These assumptions are reasonable given that, beyond the intervention, the major driver of IPT prescribing practices and, thus, completion are secular factors (e.g., the Q32019 “100day IPT push”) and the participant’s timing of IPT initiation is included in the adjustment set.
To flexibly control for sex, age, and timing of IPT initiation, we will implement TMLE with Super Learner, an ensemble machine learning algorithm that creates the best weighted combination of predictions from a set of candidate learners
[17]. Our candidates will include multivariate adaptive regression splines, main terms logistic regression, and the simple mean. In sensitivity analyses, we will implement TMLE using main terms logistic regression instead of Super Learner. Additional sensitivity analyses will include marital status in the adjustment set and include penalized regression in the Super Learner library.
We will also evaluate IPT completion using an analogous analysis as for the primary endpoint. Specifically, treating the clinic as the independent unit, we will use adaptive prespecification to select among the following individuallevel adjustment variables the ones that maximize precision in TMLE: sex, age, and timing of IPT initiation [5, 18]. We will also report unadjusted estimates from this approach as well as conduct an analysis weighting proportionally to the clinic’s total IPT starts and an analysis treating clinics as fixed effects.
For all approaches, we will test the null hypothesis of no improvement from the intervention with a onesided test at the 5% significance level. We will also report point estimates and twosided 95% confidence intervals for the relative effect, the absolute effect, and the armspecific mean outcomes. We will also report total completions by arm.
Additional analyses:
We will repeat the above analyses within subgroups defined by sex, timing of IPT initiation (before Q32019 “100day IPT push” or during Q32019), age group (1524 years or 25+ years), and marital status (never married, married/living together, divorced/separated/widowed, missing). We will also use an individuallevel TMLE to examine predictors of noncompletion. We will provide descriptive statistics on the subsample, including their demographics and care outcome (e.g., ARV switches).
Secondary endpoint: We will repeat the above analyses to examine the effect of the SEARCHIPT intervention on viral suppression, defined as HIV RNA viral level 400 copies/mL.
6 Appendix
6.1 History of changes:

Version 1.0 was locked on July 18, 2021 prior to unblinding and effect estimation for IPT uptake (the primary outcome), IPT completion among persons starting IPT, and HIVassociated TB incidence.

Version 2.0 was created to include the prespecified analysis plan for the quantitative surveys on knowledge, attitudes, and practices regarding IPT. Version 2.0 was locked on September 15, 2021, prior to unblinding and effect estimator for the survey outcomes.

Version 2.1, this version, was created to correct grammatical errors and other typos. Version 2.1 was locked on November 10, 2021 prior to submission of the manuscript.
6.2 Select survey questions:
The following are examples of survey questions, designed to understand the impact of the SEARCHIPT intervention on IPT knowledge, attitudes, and practices among DHOs and DTLSs. Full survey materials are available elsewhere.

How familiar are you with IPT? With responses: 1: no knowledge of IPT; 2: somewhat unfamiliar – low knowledge of IPT; 3: somewhat familiar; 4: familiar; 5: very familiar – high knowledge of IPT.

How strong is the evidence that INH prevents active TB in HIVinfected patients? With responses: 1: very weak; 2: weak; 3: mixed; 4: strong; 5: very strong.

How strong is the evidence that INH prevents death in HIVinfected patients? With responses: 1: very weak; 2: weak; 3: mixed; 4: strong; 5: very strong.

How much risk of INH drug resistance is there if a person develops active TB after completing IPT? With responses: 1: no risk of INH resistance; 2: low risk; 3: moderate risk; 4: high risk; 5: very high risk.

How difficult is it for providers in this district to add INH to standard care for HIVinfected people in order to prevent TB? With responses: 1: very easy; 2: easy; 3: neither hard nor easy; 4: difficult; 5: very difficult.

How hard is it to influence changes in practice among frontline providers around TB management? With responses: 1: very easy; 2: easy; 3: neither hard nor easy; 4: difficult; 5: very difficult.
6.3 Power calculations:
Sample size and power calculations were based off the standard formulas for onesided tests in cluster randomized trials with an incidence rate endpoint [1]. As a conservative approximation, these calculations considered the randomization groups to be the independent unit. We also expect these calculations to be conservative, because of the precision gained through covariate adjustment in the analysis.
We estimated 14 groups (7 groups/arm) would provide 80% power to detect at least a 10% absolute increase in IPT uptake (the primary outcome) from 22 per 100 personyears under the control, assuming a coefficient of variation of and 21,500 personyears of followup in each group. As shown in the following Figure, these calculations are fairly insensitive to expected amount of followup in group. If uptake is greater than expected (e.g., 30 per 100 personyears) under the control, we anticipate remaining wellpowered to detect at least a 13.5% increase in uptake.
References
 Hayes and Moulton [2009] R.J. Hayes and L.H. Moulton. Cluster Randomised Trials. Chapman & Hall/CRC, Boca Raton, 2009.
 Balzer et al. [2015] L.B. Balzer, M.L. Petersen, M.J. van der Laan, and the SEARCH Consortium. Adaptive pairmatching in randomized trials with unbiased and efficient effect estimation. Statistics in Medicine, 34(6):999–1011, 2015. doi: 10.1002/sim.6380.
 Rothman et al. [2008] K.J. Rothman, S. Greenland, and T.L. Lash. Modern Epidemiology. Lippincott Williams & Wilkins, Phildelphia, 3rd edition, 2008.
 van der Laan and Rose [2011] M. van der Laan and S. Rose. Targeted Learning: Causal Inference for Observational and Experimental Data. Springer, New York Dordrecht Heidelberg London, 2011.
 Balzer et al. [2016a] L. Balzer, M. van der Laan, M. Petersen, and the SEARCH Collaboration. Adaptive prespecification in randomized trials with and without pairmatching. Statistics in Medicine, 35(10):4528–4545, 2016a. doi: 10.1002/sim.7023.
 Balzer et al. [2021] L.B. Balzer, M. van der Laan, J. Ayieko, M. Kamya, et al. Twostage TMLE to reduce bias and improve efficiency in cluster randomized trials. Biostatistics, In press, 2021. URL https://arxiv.org/abs/2106.15737.
 Neyman [1923] J. Neyman. Sur les applications de la theorie des probabilites aux experiences agricoles: Essai des principes (In Polish). English translation by D.M. Dabrowska and T.P. Speed (1990). Statistical Science, 5:465–480, 1923.
 Rubin [1990] D.B. Rubin. Comment: Neyman (1923) and causal inference in experiments and observational studies. Statistical Science, 5(4):472–480, 1990.
 Imbens [2004] G.W. Imbens. Nonparametric estimation of average treatment effects under exogeneity: a review. Review of Economics and Statistics, 86(1):4–29, 2004. doi: 10.1162/003465304323023651.
 Imai [2008] K. Imai. Variance identification and efficiency analysis in randomized experiments under the matchedpair design. Statistics in Medicine, 27(24):4857–4873, 2008. doi: 10.1002/sim.3337.
 Balzer et al. [2016b] L.B. Balzer, M.L. Petersen, and M.J. van der Laan. Targeted estimation and inference of the sample average treatment effect in trials with and without pairmatching. Statistics in Medicine, 35(21):3717–3732, 2016b. doi: 10.1002/sim.6965.
 Benitez et al. [2021] A. Benitez, M.L. Petersen, M. van der Laan, N. Santos, E. Butrick, D. Walker, R. Ghosh, P. Otieno, P. Waiswa, and L.B. Balzer. Comparative methods for the analysis of cluster randomized trials. https://arxiv.org/abs/2110.09633, 2021.
 Laird and Ware [1982] N.M. Laird and J.H. Ware. Randomeffects models for longitudinal data. Biometrics, 38(4):963–974, 1982. PMID: 7168798.
 Liang and Zeger [1986] K.Y. Liang and S.L. Zeger. Longitudinal Data Analysis Using Generalized Linear Models. Biometrika, 73(1):13–22, 1986. URL http://www.jstor.org/stable/2336267.
 Petersen et al. [2014] M.L. Petersen, J. Schwab, S. Gruber, N. Blaser, M. Schomaker, and M.J. van der Laan. Targeted maximum likelihood estimation for dynamic and static longitudinal marginal structural working models. Journal of Causal Inference, 2(2), 2014. doi: 10.1515/jci20130007.
 van der Laan et al. [2013] M.J. van der Laan, M.L. Petersen, and W. Zheng. Estimating the effect of a communitybased intervention with two communities. Journal of Causal Inference, 1(1):83–106, 2013.
 van der Laan et al. [2007] M.J. van der Laan, E.C. Polley, and A.E. Hubbard. Super learner. Statistical Applications in Genetics and Molecular Biology, 6(1):25, 2007. doi: 10.2202/15446115.1309.
 Balzer et al. [2019] L.B. Balzer, W. Zheng, M.J. van der Laan, M.L. Petersen, and the SEARCH Collaboration. A new approach to hierarchical data analysis: Targeted maximum likelihood estimation for the causal effect of a clusterlevel exposure. Stat Meth Med Res, 28(6):1761–1780, 2019.
Comments
There are no comments yet.