1 Introduction
Researchers are often interested in using longitudinal data to estimate the causal effects of hypothetical timevarying treatment interventions (equivalently, strategies or rules) on the mean or risk of a future outcome in a study population. Standard regression or conditioning methods for confounding control generally fail to recover such causal effects when timevarying confounders are themselves affected by past treatment (Robins and Hernán, 2009). For example, in studies of the effect of different timevarying antiretroviral treatment strategies on longterm mortality risk in HIVinfected patients, CD4 cell count is a timevarying confounder; i.e., CD4 cell count at followup time is a risk factor for both future treatment initiation and future mortality. In addition, CD4 cell count at followup time is, itself, affected by whether treatment has been previously initiated. As another example, in studies of the effect of different timevarying interventions on daily minutes of physical activity on longterm coronary heart disease (CHD) risk in a healthy population, body mass index (BMI) is a timevarying confounder; i.e., BMI at followup time is a risk factor for future inactivity and future CHD. Also, BMI at time is, itself, affected by an individual’s past physical activity.
In such settings, alternative estimators derived from Robins’s gformula may recover effects of timevarying treatment interventions under untestable assumptions, including that sufficient covariates are measured to control confounding by unmeasured risk factors (Robins, 1986). The package gfoRmula implements in R one such estimator: the parametric gformula, also known as parametric gcomputation or the plugin gformula (Hernán and Robins, 2018)
. In general, the gformula characterized by a userspecified treatment intervention is a highdimensional sum or integral over all observed treatment and confounder histories. This sum is over a function of (i) the observed outcome mean (or, in survival settings, timevarying hazards) conditional on each observed treatment and confounder history, (ii) the observed joint distribution of the confounders at each time
conditional on each observed treatment and confounder history, and (iii) an intervention density defining the timevarying userspecified treatment rule. The parametric gformula estimates this function in realistic highdimensional settings via a Monte Carlo simulation that relies on consistent estimates of the quantities (i) and (ii), as well as the userspecified intervention density.The gfoRmula package accommodates general userspecified timevarying treatment interventions which may be static or dynamic (Hernán et al., 2006; Orellana et al., 2010a, b; van der Laan et al., 2005; Murphy et al., 2001; Cain et al., 2010; Young et al., 2011) and, further, deterministic or random (Díaz Muñoz and van der Laan, 2012; Haneuse and Rotnitzky, 2013; Young et al., 2014, 2018; Kennedy, 2019). The algorithm implemented by the package can additionally estimate the extended gformula of Robins et al. (2004) which, under a stronger no unmeasured confounding assumption, may identify effects of treatment assignment rules that depend on the natural value of treatment at ; the value of treatment that would have been observed at were the intervention discontinued right before (Richardson and Robins, 2013). In addition to the quantities (i) and (ii), the parametric extended gformula relies on an estimate of the observed distribution of treatment at each time conditional on past treatment and confounder history.
The package also allows: 1) binary (e.g. treat versus do not treat) or continuous/multilevel (e.g. dose, daily minutes of physical activity) timevarying treatments; 2) different types of outcomes (survival or continous/binary end of followup); 3) data with competing events (survival outcomes) or truncation by death (end of followup outcomes) and loss to followup and other types of censoring events; 4) different options for handling competing events in the case of survival outcomes; 5) a random measurement/visit process; 6) joint interventions on multiple treatments; and 7) general incorporation of a priori knowledge of the data structure. The gfoRmula R package adapts many of the capabilities of the GFORMULA SAS macro to implement the parametric gformula (Logan et al., 2016). However, unlike the SAS macro, the gfoRmula R package more easily allows users to incorporate their own helper functions for estimating (i) and (ii) beyond automated options included within the package.
The structure of the article is as follows: In Section 2, we review the longitudinal data structure of interest, the gformula and identifying assumptions that give equivalence between this function and a counterfactual mean or risk under a userspecified timevarying treatment strategy. In Section 3, we review the parametric gformula estimation algorithm. In Section 4, we describe input data set requirements and core features of the package. In Section 5, we give various code examples. In Section 6, we describe additional, more advanced package features. Finally, in Section 7, we provide a discussion.
2 Background
2.1 Observed data structure
We consider a longitudinal study where measurements of treatment(s), and covariates are regularly updated on each of
subjects over a specified followup period. Let denote fixed followup intervals (e.g. months) with baseline measurements taken in interval and interval corresponding to the specified end of followup (e.g. 60 months).In each interval assume the following are measured: Let
be a treatment variable (or vector of treatment variables) measured in interval
(e.g. an indicator of antiretroviral treatment initiation by interval ; minutes of physical activity in interval ) and a vector of timevarying covariates (e.g., CD4 cell count, BMI) assumed to precede with possibly including timefixed baseline covariates (e.g., race, baseline age). In clinical cohorts, including studies based on electronic medical records, may be defined in terms of last measured values of a covariate relative to interval . For example, in a clinical cohort of HIVinfected patients, CD4 cell count is not measured every month. It is only measured in a month when the patient comes for a visit. In this case may be defined in terms of the last measured value of CD4 cell count relative to month . Further, may contain an indicator of whether that last measured value is current or not (i.e. if a visit at has occurred) (Hernán et al., 2009).Let denote the outcome of interest. may correspond to either a fixed (continuous or binary) end of followup outcome in interval (e.g. blood pressure in interval ; an indicator of obesity in interval ) or a survival outcome; i.e., an indicator of failure from an event of interest by (e.g. an indicator of stroke by ). For survival outcomes, we implicitly include in an indicator of failure from the event of interest by earlier (). Past values of outcomes at the end of followup may also be components of (e.g. when the outcome is obesity status at , obesity at any may be a component of ). Note that “end of followup” may correspond to any userspecified followup of interest for which data is available and need not correspond to the administrative end of the study (e.g. the user may specify as 10 month followup even if the administrative end of the study was 60 months).
Let denote an indicator of censoring and an indicator of a competing event (for a survival outcome) or truncation by death (for an end of followup outcome) by interval , respectively. When a survival outcome is not subject to competing events (e.g. allcause mortality), we define for all . For end of followup outcomes, the user must define as an implicit component of (i.e. as a censoring event). For survival outcomes, the user may choose to define as an implicit component of or . Under the latter choice, competing events are not treated as censoring events. See Young et al. (2019) as well as below regarding how this choice impacts the interpretation of the analysis and estimation algorithms.
We denote the history of a random variable using overbars; for example,
is the observed treatment history through interval . By notational convention, we set and to be identically .2.2 Causal effect definitions
In each interval , we generally define an intervention on that may, at most, depend on past measured variables as a random draw from an intervention density with a possible realization of .
We classify an intervention
on as deterministic if may only equal zero or one for all and ; otherwise we classify it as random. We may further classify the intervention as static if does not depend on for any ; otherwise we classify it as dynamic. See Young et al. (2014) and Young et al. (2018), as well as Sections 4.9 and 6.5, for examples.Beyond interventions of the form , we will also define an intervention on that, in addition to past measured variables, depends on the natural value of treatment at . We define such an intervention as a random draw from an alternative intervention density where is any value in the support of which, in the observational study, coincides with the natural value of treatment at (Richardson and Robins, 2013; Young et al., 2014).
Our goal is to estimate the causal effect on the mean of (which corresponds to the risk of the event of interest by in the case of survival outcomes) had, contrary to fact we implemented two different hypothetical interventions on at all in the study population where these interventions may be any userspecified choices of either or . Implicit in the definition of all interventions is “abolish censoring throughout followup” or “Set at all ”. By this, how the user chooses to define will impact the interpretation of the estimand (Hernán and Robins, 2018). For survival outcomes, when is defined as an implicit component of , this effect is a special case of a direct effect that does not capture any treatment effect on the competing event; otherwise it is a special case of a total effect that may capture these effects (Young et al., 2019).
2.3 Identifying assumptions and the gformula
Let be a deterministic regime, strategy, or intervention (static or dynamic) on that depends, at most, on the measured covariate and treatment history, characterized by the intervention density that also eliminates censoring, where is any component of and is recursively defined by the function of , .
Define and as the outcome and covariate histories, respectively, for an individual in the study population had, possibly contrary to fact, his/her treatment been assigned according to a deterministic regime . Further, let be the set of all deterministic interventions on (both static and dynamic). Following Robins (1986), the mean of had all subjects been assigned treatment according to and had censoring been eliminated is equivalent to where is a specific weight defined in terms of the choice of . See also the appendix of Young et al. (2014).
We now define three specific identifying conditions:

Exchangeability: For all
(1) The exchangeability condition (1) is expected to hold in an experiment where the treatment and censoring are physically randomized at each possibly depending on treatment and covariate history . This condition is not, however, guaranteed to hold in an observational study, or in a randomized trial with censoring, and cannot be empirically examined. Exchangeability is sometimes referred to as the assumption of “no unmeasured confounding” and the “measured confounder history through ”.

Positivity:
(2) where

Consistency: If and then and .
Given the three conditions above hold for all such that positivity holds when we replace the observed treatment density with the intervention density , then is equivalent to the gformula characterized by (Robins, 1986):
(3) 
where is the observed joint density of the confounders at conditional on treatment and covariate history, and remaining uncensored, through . We use a summation symbol in (3) and elsewhere for notational simplicity. However, in general, when contains any continuous components, sums would be replaced with integrals.
When interest is in a survival outcome, expression (3) is the gformula for risk of the event of interest by under intervention which, pulling the implicit prior survival indicator out of , can be more explicitly written as:
(4) 
As above, when the survival outcome is subject to competing events, we may choose to treat competing events () as an implicit component of or , . Under the former, expression (4) will not depend on the distribution of competing events. Under the latter, it will depend on this distribution; in this case, we can more explicitly write expression (4) as
(5) 
See Young et al. (2019) for details.
Conditions required for identification based on only observed variables of the mean/risk of under an alternative intervention characterized by , which additionally depends on the history of the natural value of treatment, are generally stronger than those defined above. We refer the reader elsewhere for details of these conditions (Richardson and Robins, 2013; Young et al., 2014). Provided these stronger conditions hold, the mean of under an intervention characterized by is equivalent to the extended gformula of Robins et al. (2004)
(6) 
where is the observed treatment density conditional on the measured past evaluated at some possibly realized values .
In motivating the general estimation algorithm described in the next section, it is useful to consider a generic version of the gformula for the mean/risk of the outcome characterized by either an intervention density which depends on the natural value of treatment at or which depends, at most, on the measured past treatment and confounder history. For and realization , we have :
(7) 
is algebraically equivalent to (6) when is chosen as some and algebraically equivalent to (3) when is alternatively chosen as some as, in this latter case, the expression does not depend on the observed treatment density (Young et al., 2014).
Similarly, for survival outcomes, we can expressly write the gformula for risk by under generic treatment interventions
(8) 
when the outcome is subject to competing events and competing events are not treated as censoring events. Following arguments above and in Young et al. (2019), replacing with 1 for all and would give the gformula for risk by under an intervention when competing events are treated as censoring events and, thus, under an intervention that eliminates competing events along with other possible forms of censoring.
3 Estimation algorithm
3.1 Survival outcomes
The following describes the general computational algorithm for estimating the gformula for risk of the outcome (the event of interest by ) under a userspecified intervention . This corresponds to (i) expression (8) when competing events are not treated as censoring events; (ii) a restricted version of (8) that replaces with 1 when competing events are treated as censoring events for all (i.e. an implicit component of ) and ; and (iii) a restricted version of (8) with for all when the outcome is not subject to competing events.
Let such that for all ,
where is the component of the vector , for a, generally, arbitrary permutation of these components. Note that certain permutations will be preferred when a priori knowledge of the distributions of certain components of conditional on values of other components is known (see Section 6.1).
For a userchosen value of , treatment rule and permutation of , , we apply the following algorithm to a subjectinterval input data set constructed according to the instructions of Section 4.2:

Using all subject interval records:

If , estimate the conditional densities under a distributional assumption on given “history”
, . This distribution will be used to simulate a value of in Step 2 at each . For example, dichotomouswill, by default, be generated from a Bernoulli distribution with mean
. For with many levels, the user can select among several prespecified distributions (e.g. Normal) or can rely on a usersupplied distribution. The user has the option to estimate the conditional mean of for a specified function of the “history” using prespecified fit options (e.g. generalized linear models with specified link function) or other usersupplied functions (e.g. generalized additive models, classification and regression trees) that produce a fitted object based on one line of data. Syntax for these specifications is described in Section 4.6 via specification of the parameterscovtypes
andcovparams
. These conditional mean models are pooled over time and thus the user must specify the function of time , if any, on which they will depend (see Sections 4.6 and 4.7). When the variance of
is not a function of the mean under the distributional assumption (e.g. when is assumed Normal), the variance is automatically estimated by the model residual mean squared error. 
Estimate the conditional probability (discrete hazard) of the event of interest at each time
conditional on treatment and covariate history and surviving and remaining uncensored to the previous time,. These estimates are automatically based on fitting a pooled over time logistic regression model with
the dependent variable where the user specifies the function of “history” included as independent variables in the fit. See Section 4.8. 
If the event of interest is subject to competing events and competing events are not defined as censoring events: Estimate the conditional probability (discrete hazard) of the competing event at each time conditional on treatment and covariate history and surviving and remaining uncensored to the previous time . These estimates are automatically based on fitting a pooled over time logistic regression model with the dependent variable where the user specifies the function of “history” and time included as independent variables in the fit. See Section 4.8.


Select a value , possibly such that . If resample the original ids times with replacement and create a new data set where each possibly resampled record has a unique id . Using the baseline covariate data from the original observations at baseline (if ) or the resampled observations (if ), for and , do the following:

If , set to the observed values of for id . Otherwise, if , iteratively draw from estimates of the conditional densities (based on specifications from step 1.a) evaluated at , , the previously drawn covariates and , the previously assigned treatment through under the userchosen strategy.

Denote the observed (at ) or drawn (at ) value(s) of treatment(s) in the vector as . Assign according to the user specified rule which may or may not depend on . For example, suppose is defined as the static deterministic strategy “always set treatment to the value 1.” In this case, is always set to 1. Another example is “if the natural value of treatment at is less than 30, set treatment to 30. Otherwise, do not intervene.” In this case, the observed/drawn value of is examined. If then is set to 30. Otherwise, is set to . A third example is “no intervention on ” or the socalled natural course intervention. In this case, the intervention density is defined as the observed treatment density and is always set to . See Sections 4.9 and 6.5 for required syntax for specifying the treatment rules.

Estimate based on specifications from step 1.b. Denote this estimate .

If the event of interest is subject to competing events and competing events are not defined as censoring events: Estimate based on specifications from step 1.c. Denote this estimate .


Compute the intervention risk estimate as
(9) In the special case where either the event of interest is not subject to competing events or competing events are treated as censoring events is set to 0 for all in (9).
Steps 2 and 3 above can be repeated for multiple userdefined treatment rules
and risk ratio/risk difference estimates constructed based on a specified reference intervention. The natural course intervention is always automatically implemented and is the default reference unless otherwise specified by the user. Ninetyfive percent confidence intervals are computed based on repeating the entire algorithm (Steps 13) in
bootstrap samples based on percentiles of the bootstrap distribution for user specified . The user can also opt to additionally output estimates of the risk (risk ratios/risk differences) by all times , in addition to the risk by .In addition to the parametric gformula estimate of the natural course risk, a nonparametric estimate will also be computed automatically and reported in the output. When censoring events are present in the data, this is a completely unadjusted estimate of the natural course risk (i.e. it relies on marginal exchangeability for censoring)(Young et al., 2011). When competing risks are absent or treated as censoring events, this is computed as the complement of a productlimit survival estimator (Kaplan and Meier, 1958) in the observed data. When competing risks are present and not treated as censoring events, this is computed using an Aalen Johansen estimator (Aalen and Johansen, 1978). See Young et al. (2019) and Logan et al. (2016) for additional details.
3.2 Fixed end of followup outcomes
The algorithm for estimating the gformula for the mean of an outcome at the end of followup under a userspecified intervention is nearly identical to that for the risk in the case of survival outcomes described in the previous section but relies on a slightly different input data structure (see Section 4.2). In the modified algorithm
Step 1.b. is replaced by

Modified Step 1.b.: Using only records on line , estimate the mean of the outcome at conditional on treatment and covariate history and remaining uncensored to the previous time,
. This estimate is automatically based on fitting a linear regression model with
the dependent variable where the user specifies the function of “history” . Because this fit will only use records on line of the data it will not be a function of time. See Section 4.8.
Step 2.c. is replaced by:

Modified Step 2.c.: For only, estimate based on specifications from step 1.b. Denote this estimate .
Step 3 is replaced by

Modified Step 3: Compute the intervention mean estimate as
(10)
As discussed above, in this case, if a subject dies prior to , this must be treated as a censoring event ( must be considered a component of the vector , ) and steps 1.c and 2.d are not implemented.
4 Using the gfoRmula package
4.1 Specifying the outcome type
The gfoRmula package gives the user access to three different gformula
function variants, each intended for a different outcome type. gformula_survival
should be used for survival outcomes (e.g. an indicator of allcause mortality by ); gformula_continuous_eof
should be used for fixed continuous end of followup outcomes (e.g. blood pressure at ); and gformula_binary_eof
should be used for fixed binary end of followup outcomes (e.g. indicator of high cognitive score at ).
4.2 Required structure of the input dataset
The input dataset to the gformula
function must be an R data.table
. For all outcome types, each row of the input dataset should contain one record for each time (which must be a numeric variable in R, and the name of which is specified by the parameter time_name
) for each subject present at baseline (specified by the parameter id
). The index must start at 0 and increase in increments of 1. Each column of the data set will index a timevarying covariate , . A subject’s baseline confounders (the timefixed components of ) should be repeated on each line for that subject in the data set. A subject who is censored in interval will have only records with time index on the last line. Other requirements are dependent on the outcome type which we describe below. Each subject can have a maximum of records with , the end of followup of interest.
4.2.1 Survival outcomes
In this case, the data set should additionally contain a column indicating, on line whether the event of interest occurred by the next interval. A subject who first has the event of interest in interval will have only records with time index on the last line. If then the user has the choice to code as either NA
or 0. Records coded as NA
will be excluded when fitting the discrete hazard model in step 1.b of the algorithm and in nonparametric estimation of the natural course risk (see Section 3) while records coded as 0 will be included in computing these estimates. This choice impacts how “risk sets” at are defined. It will make no difference to the estimates when the intervals are made small enough such that there are no failures in intervals where there are also censoring events. Otherwise, this choice may impact effect estimates to some degree. Note that by (i) the fact that the algorithms of the previous section do not rely on an estimate of the distribution of censoring and (ii) the required structure of the input data set such that subjects have no more records after they are censored, the input data set does not actually require any variables defining censoring indicators.
If the event of interest is subject to a competing event for some subjects and some and the user chooses to treat competing events as censoring events then should be treated like any component of and coding requirements outlined above must be applied. In turn, in this case, the input data set does not require a variable for . If, alternatively, competing events are not treated as censoring events, then the user must include in the input data set a timevarying indicator of the competing event . If for a given subject, then that subject will only have lines in the data with time index on the last line and, on that line, must be coded NA
. See the data sets basicdata_nocomp
and basicdata
included with the package for examples of input data sets for survival outcomes without and with competing events, respectively. The user can specify the parameter time_points
to be a followup interval in order to compute the risk estimates through (with the max number of records for an individual in the data set). The default value of time_points
is .
4.2.2 End of followup outcomes
In this case, the data set should additionally contain a column with the value of the outcome in interval . Only the value of this variable on line is used in the algorithm and it does not matter what values are coded on lines . These may be left blank or given any value. As discussed above, death or other truncation events, if present, must be treated as censoring events in this case. For subjects who are first censored in interval , these subjects will have records in the data and on line must be coded as NA
. For subjects censored prior to interval , as above, the value of will be ignored in the estimation algorithm. See the data set continuous_eofdata
included with the package for an example of an input data set for an end of followup outcome. Unlike for survival outcomes, the default value of time_points
() cannot be changed for end of followup outcomes.
4.3 Specifying the input data set, individual identifier, time index, outcome and competing events
The input data set is specified by the argument obs_data
, the column of this data set specifying the individual identifier by the argument id
, the column of this data set specifying the time index by the argument time_name
, the column of this data set specifying the outcome by the argument outcome_name
and, if present and not defined as a censoring event, the column of this data set specifying a competing event by the argument compevent_name
. Sample syntax is given by:
gformula_survival(..., obs_data = obs_data, id = ’ID’, time_name = ’t0’, outcome_name = ’Y’, compevent_name = ’D’)
The argument compevent_name
should only be specified for survival outcomes and when the user chooses not to treat competing events as censoring events. Note that the input data set must be a data.table
object.
4.4 Specifying timevarying and baseline covariates
The timevarying covariates are defined by the vector argument covnames
. Baseline confounders (the timefixed components of ) are defined by the vector argument basecovs
. Sample syntax is given by:
gformula_survival(..., covnames = c(’Z1’, ’Z2’, ’Z3’), basecovs = c(’race’,’sex’)¯¯)
When the user chooses to allow outcome and/or covariate distributions to be certain discrete functions of time, an additional argument must be added to covnames
(see Section 4.7).
4.5 Generating covariate histories
Recall that, in step 1.a. of the estimation algorithm (Section 3), the user must specify a function of “history” when estimating the conditional distribution of each conditional on “history” for each . For survival outcomes, the user must specify a function of “history” when estimating the observed conditional hazards of the event of interest at each in step 1.b. (and of the competing event in step 1.c. if competing events are not treated as censoring events) conditional on “history” . For end of followup outcomes, the user must specify a function of “history” when estimating the mean of the outcome at conditional on “history” only.
The arguments histvars
and histories
must be used to specify any desired functions of history that will be used for estimation in step 1 (and then, in turn, in the simulation in step 2) that cannot be defined within a model statement in R . For example, consider a timevarying covariate named Z4
in the input data set and suppose the user assumes that the distributions of “future” covariates , depend on the history of (or ) through the cumulative average of this variable through each ; i.e. for and otherwise. Then the user must use histvars
and histories
to create in the data set obs_data
this additional timevarying covariate containing the cumulative average of at each . Alternatively, if the user made the assumption that the distributions of these “future” covariates/outcomes only depend on through the current value at or a transformation of the current value (i.e. square root, restricted cubic spline, quadratic or cubic function) which can be made within an R model statement, then no additional covariates need to be created as these transformations can be made within a model statement (see Section 4.6).
Desired functions of history that cannot be created within a model statement in R must be created using histvars
and histories
to ensure that estimation decisions in step 1 are carried forward correctly through the simulation in step 2. Precoded functions of history for a covariate named Zj
in the input data set include:

lagged
: this adds a variable to the specified input data set namedlagi_Zj
, containing the th lag ofZj
relative to the current time (i.e. its value at for ) for all with the desired number of lags.lagi_Zj
is set to 0 on lines with 
cumavg
: this adds a variable to the specified input data set namedcumavg_Zj
, which contains the cumulative average ofZj
up until the current time . It is set toZj
at . 
lagavg
: this adds a variable to the specified input data set namedlag_cumavgi_Zj
which contains the th lag of the cumulative average ofZj
relative to the current time , .lag_cumavgi_Zj
is set to 0 on lines with .
Note the desired number of lags is specified using the argument covmodels
(see Section 4.6).
The package will apply the function of history listed in the th element of the vector argument histories
to all variables listed in the th element of the list argument histvars
. Therefore, the length of histvars
and histories
need to match. Sample syntax that would add lagged, cumulative average and lagged cumulative average functions of the covariates Z1
, Z2
and Z3
as new variables to the input data set dataname
for use in estimation of conditional distributions/hazards/means (see Sections 4.64.8) is as follows:
gformula_survival(..., obs_data = dataname, covnames = c(’Z1’, ’Z2’, ’Z3’), basecovs = c(’race’, ’sex’)¯, histories = c(lagged, cumavg, lagavg), histvars = list(c(’Z1’, ’Z2’, ’Z3’), c(’Z1’, ’Z2’, ’Z3’), c(’Z1’, ’Z2’, ’Z3’)))
4.6 Specifying covariate distributions
The vector argument covtypes
is used to specify the distributions of each timevarying covariate conditional on a function of history , and , how parameters of those distributions will be estimated and, in turn, how covariate values in step 2 of the algorithm will be simulated. The gfoRmula package supplies a number of preprogrammed options for input to the argument covtypes
which are described below: ’binary’
, ’normal’
, ’categorical’
, ’bounded normal’
, ’zeroinflated normal’
, ’truncated normal’
, ’absorbing’
and ’categorical time’
. Each of the elements of covtypes
requires its own set of specific subparameters, which are contained within the list argument covparams
. Each subparameter vector within this list must be the same length as covnames
and covtypes
.
We now give a description of each of the preprogrammed options for input to covtypes
along with required and optional subparameters:

’binary’
: The mean of the covariate conditional on history (see Section 4.5) is estimated via the estimated coefficients of a generalized linear model (GLM) (usingglm
in R), where the family of the GLM isbinomial
. The covariate values are simulated at each time in step 2 of the estimation algorithm (Section 3) given the previously simulated history under intervention by sampling from a Bernoulli distribution with parameter this estimated conditional mean. Assuming the covariate in question is the th entry incovnames
, then the subparametercovmodels
, which is a vector of lengthcovnames
, must contain as its th entry a model statement for estimating the mean of the covariate in the th entry ofcovnames
as a function of history and possiblytime_name
. Optionally, the user can also specify the th entry ofcovlink
, which is a vector containing the link function(s) for the GLM specified in the
Comments
There are no comments yet.