1 Introduction
Autoregressive models and their extensions are powerful tools for the analysis of timeseries data. These approaches have a rich history in forecasting (de200625; granger2014forecasting), econometrics (lutkepohl2004applied; hamilton2020time), and statistics (box2015time). Despite the widespread use of these methods, autoregressive models remain underutilized when estimating the effect of interventions, in part due to the difficulty interpreting the model parameters. The parameters from a standard autoregressive model are difficult to interpret because these models control for the lagged outcome, a variable which is endogenous to the intervention of interest. Because of this endogeneity, the estimated coefficient relating the intervention to the outcome represents only the fraction of the overall policy effect that is not mediated through the autoregressive effect (pearl2009causality). If the regression coefficient within an autoregressive model were to be interpreted as the intervention’s effect, it would be biased relative to the true effect (nickell1981biases; achen2000lagged).
For linear and additive autoregressive models, there is a literature on estimating the relationship of one time series to another. These approaches typically involve reparameterizing the autoregressive model to ensure that the model assumptions are met and that the model parameters represent the effect of interest, or to use vector autoregressive models
(engle1987co; johansen1991estimation; keele2004not; lutkepohl2004applied). A prominent example of using reparameterization is cointegration and error correction in the autoregressive distributed lag model (pesaran1998autoregressive; dufour1998exact; hassler2006autoregressive; philips2018have). While these approaches and concepts are well developed for linear and additive autoregressive models, guidance for nonlinear or nonadditive models is unclear. This is a particular challenge when studying incident data, such as counts of crime, mortality or disease, where linear models often make untenable assumptions, and models with nonlinear link functions and nonnormally distributed error are more appropriate and more commonly used.
Methods for modeling autoregressive incidence data vary and include approaches that treat the counts as continuous and use traditional autoregressive models, approaches that include the lagged count into the mean function (fokianos2009poisson), approaches that model the counts as mixtures of Poisson processes (mckenzie1986autoregressive), and statespace approaches that postulate a measurement model and a transition equation (brandt2000dynamic). With such varied approaches, there is little clarity on how best to model autoregressive count data and much less on how to do so in a way that ensures model coefficients are unbiased and can be interpreted as effect estimates.
Due to this complexity, methodological guidance for estimating intervention effects in timeseries data has often been to avoid autoregressive models, and use other approaches such as differenceindifferences methods (shang2018interaction; kondo2015difference; angrist2008mostly) and synthetic control methods (abadie2010synthetic; ben2021synthetic; arkhangelsky2019synthetic)
. However, a recent simulation study highlighted that many of these alternate approaches can suffer inflated TypeI error rates, bias, and extremely low power when estimating statelevel policy effects on firearm mortality data
(schell2018evaluating). The authors identified a negative binomial autoregressive model with the statelevel policy effects coded as first differences as being the optimal choice based on its empirical performance relative to the other methods considered. However, the recommended approach still showed a small amount of bias, was not clearly generalizable to other outcomes, and does not have a simple extension beyond the first order autoregressive model. Subsequent work by schell2020 improved on this approach by using numerical integration to estimate marginal effects that are interpretable, but this procedure is computationally burdensome and difficult for nontechnical audiences to understand. Moreover, it is unclear how higherorder autoregressive lags could be added to this model.In this paper, we explore the use of a modification of autoregressive models for count data originally proposed in zeger1988markov for Poisson models, and extended to generalized autoregressive moving average models by benjamin2003generalized. We show that a simple approximation facilitates direct interpretation of the model parameters and allows for any order autoregressive coefficient, overcoming a major limitation of using autoregressive models when analyzing the effects of state or local policies on crime data. We perform an empirical simulation study to verify the properties of the approach using 36 years of statelevel firearm mortality data from the United States. The simulation was designed to evaluate these methods in a complex nonlinear context: we use a negative binomial model, estimate effects that change over time, and investigate the approach using both first and secondorder autoregressive models. Finally, we present a case study demonstrating that the model recovers substantively equivalent estimates for one state gun law as found in schell2020.
2 Methods
2.1 Linear Autoregressive Models
We begin by defining effects of interest as the effects of a policy at specific times after implementation. While we focus our presentation on binary policy indicators, the effects and models discussed in this paper are appropriate for other variables of interest, e.g., continuous variables. Let be an outcome for observation at time , with and . Let denote an indicator of a policy of interest being active for observation at time . Denote the vector of for time to as . We begin by defining an effect of interest at time given :
where is a sequence of interest of . For example, could denote the implementation of a gun law at such that . In this case, would represent a comparison of the expected outcome at time with the policy implemented at time to the expected outcome at time had the policy not been implemented up through time .
Note that the effects defined here are a deviation from the typical shortrun or longrun interpretation of autoregressive model coefficients. Instead, these effects are defined at specific time points given a hypothetical implementation of a policy, which provides actionable information about the effect of a policy in the short and midrun. These effects and the results of this paper can be generalized to allow for a comparison of two arbitrary policy sequences by comparing to for two sequences of interest of , and .
As an illustrative example, consider the linear autoregressive model of order , such that:
where are independent with mean 0. Under this model, can be shown to be:
This highlights the main limitation of the linear autoregressive model for estimating the effect of a policy – the effect is a function of the autoregressive coefficient, the regression coefficient on the policy, and the policy sequence of interest. We consider a special case of benjamin2003generalized that avoids these limitations and has been discussed elsewhere (zeger1988markov), which we will refer to as the debiased autoregessive model (DAM):
(1) 
Figure 1 provides a graphical representation of this model. Note that the indirect path from to through is cancelled out by the direct path from to . This removes the accumulation of the policy effect over time through these indirect paths, leads to a model coefficient that is directly interpretable, and motivates the terminology “debiased” autoregressive model. Specifically, under the DAM specification it can be shown that
Here, we point out that the effect of interest is defined only by the regression coefficient and the value of the policy at time . In this way, the model coefficients provide a directly interpretable effect as the total effect of policy at time .
The DAM can be generalized to allow for higher order autoregressive terms, an arbitrary function of the policy vector, and inclusion of covariates. Let be the history of the policy up to time , and let be similarly defined covariate histories up to time . Let , , and be arbitrary scalar functions, and let . The DAM of autoregressive order , DAM(), is given by:
(2)  
Under this specification, the effect of interest can be shown to be . The effect of interest is directly defined by the functions specified in the regression model and is free of all autoregressive terms. We allow the functions , , and to be different at each time , but in practice, most specifications will parameterize the functions to pool information across time points. One example is a distributed lag formulation of order for , i.e., .
We partition the covariates into two sets, and , to allow for additional flexibility in the specification of the model. The inclusion of is flexibility beyond that allowed in benjamin2003generalized. If is empty, then the functions of the covariates specified in are interpretable in a manner similar to . If is empty, then the functions of the covariates specified in have properties similar to those of standard autoregressive models, e.g., the effect of the covariate is allowed to accumulate over time.
The proposed model in (2) includes some of the most common models for effect estimation in longitudinal data as special cases: firstdifferences models and twoway fixed effect models. In particular, a firstdifferences model is a special case of (2) when the autoregressive coefficient is set to 1. By allowing for the estimation of the autoregressive coefficient, (2) is appropriate in a wider range of settings where fixing the autoregressive coefficient to unity results in a violation of the model assumptions and poor effect estimates.
Similarly, the standard twoway fixed effect model falls out as a special case whenever time and unit fixed effects are added to (2). If the assumptions of the standard twoway fixed effect are met, i.e., errors are independent conditionally on the fixed effects, the autoregressive term will be zero. Thus, when the standard twoway fixed effects model assumptions are met in the sample, (2) with time and unit fixed effects reduces to the twoway fixed effect model. However, allowing the model to estimate the autoregressive term should allow the model to be more appropriate in a wider range of settings than the standard twoway fixed effects model.
2.2 Extensions to Negative Binomial Models
To facilitate the use of a negative binomial regression model with a log link function, the previously defined effect is modified to be a rate ratio instead of rate difference. That is,
(3) 
Motivated by the previous results, we describe two specifications of negative binomial models with similar form for the linear predictor of the expectation from (1) and (2). First, an autoregressive negative binomial of order is given by:
(4) 
Under this model, a simple closed form for the effects of interest in (3) is unlikely to exist except in special cases. Numerical integration approaches can be used to estimate the expectations in (3); see schell2020 for an example application. Second, a negative binomial DAM of order , NBDAM(), is given by:
(5)  
A technical discussion and the properties of this model when is empty can be found in benjamin2003generalized. Unlike the DAM(), the NBDAM() does not have a closed form for the effect of interest. To avoid computation problems associated with numerical integration and to facilitate both interpretation and inference, we provide an approximation to the effect of interest as:
(6) 
This approximation can be derived by replacing with its corresponding inside of (5). Such an approximation is not exact due to the nonlinearities in the model, but we will explore its properties using a simulation study in Section 3.2.
The structure of the given in (5) provides great flexibility in the specification of the policy effect on outcomes. As previously noted, one specification that maintains flexibility is a distributed lag structure, i.e., . However, it may be of interest to constrain the structure in a parsimonious fashion to further improve interpretability and boost power. With this in mind, we propose a simple specification of that allow for the effect of a policy to be a combination of an instantaneous effect and an effect that phases in linearly (on the log rate) over a prespecified time period. For illustration, we will assume that the policy becomes active at the beginning of the period covered by such that the linear phasein effect accumulates during the period represented by , and we assume the period of the linear phase in is . Then,
(7) 
Under this specification, the approximation to the total effect of a policy at year is given by:
The Appendix provides a more nuanced discussion of these effects, including an extension that account for the fact that a law may be in effect for a fraction of the time period associated with observations at time (e.g., laws are enacted at various time points throughout the year).
Bayesian estimation of this model is straightforward and allows for the specification of priors:
where , , , and
are hyperparmeters controlling the level of information in the prior distributions. The choice of a uniform distribution from 0 to 1 for the autoregressive coefficients rules out values of these coefficients that would lead to models that are outside of our expectations, e.g., a value of an autoregressive coefficient that is less than 0 or greater than 1. Implications of autoregressive coefficients outside of this range can be found in
benjamin2003generalized. The choice of hyperparameters for the
’s can be set such that the prior distributions are noninformative, or they can be chosen to place the prior distribution in a region based on expert knowledge.3 Demonstration of model performance
We demonstrate the performance of the NBDAM in two ways. First, we simulate the effects of hypothetical laws, comparing law effect estimates recovered using the proposed model to estimates from other candidate models. Next, using genuine state law data, we demonstrate through a case study that the NBDAM recovers estimates of the effects of child access prevention laws (CAP laws) on total firearm deaths that are substantively equivalent to using (4) with numerical integration (schell2020).
3.1 Data
Our simulation and our model of CAP law effects use longitudinal data from 50 states for the years 1980–2016. The primary outcome for these estimates are statelevel counts of annual firearm deaths from the Vital Statistics System, which contains information on coroners’ cause of death determinations for over 99% of all deaths in the U.S (nchs_vital).
Following procedures described in schell2020
, our model of the effects of state laws on firearms deaths controls for 34 statelevel timevarying demographic, economic, crime, law enforcement, and gun ownership characteristics. These include characteristics known to be associated with firearm deaths or which are are commonly used when analyzing statelevel differences in health or crime. These variables are all taken from federal government sources and constitute descriptive statistics for each state for each year in the studied period. Because many of these covariates are collinear, the model includes only the first 17 principal components extracted from the larger set of 34 covariates, which together accounted for 95% of the variance in the full matrix of covariates. The model also includes year fixed effects.
Additional information about each variable and our dimensionreduction techniques can be found in schell2018evaluating.
3.2 Simulation Study
This simulation study focuses on variations of negative binomial autoregressive models based on the findings of schell2018evaluating, who recommended these models over other common approaches such as twoway fixed effects models. The synthetic control method for staggered adoption (ben2021synthetic) was also considered, but using the software available at the time of writing, the approach had large variance and always estimated a null effect. We compare four alternative approaches to estimating the effect of laws, each a negative binomial model with lagged dependent variables:
In each model, the estimated effect is extracted using the model coefficient directly; that is, no numerical integration approaches are used. The change coded model was included as it was the preferred method of schell2018evaluating. The 17 principle components were included as linear effects in each model, and is left empty in the NBDAM models.
We consider a hypothetical policy that is implemented in 15 states on randomly selected years. The data generating scheme starts with the observed timeseries in each state, randomly draw 15 states that implement a hypothetical policy, randomly assign dates in which the policy takes effect, and adds or subtracts firearm deaths from the observed timeseries after the laws “implementation” based on the effect of the policy we have assigned to it. The purpose of this simulation is to verify that the proposed model in (5) and the approximation described in (6) provide approximately unbiased inference.
For a hypothetical policy that has an instantaneous risk ratio of and an effect with risk ratio that phases in over 5 years, the steps of the empirical simulation study are as follows:

Randomly select 15 states that implement the hypothetical policy.

Random assign implementation dates for each of the 15 randomly selected states between January 1981 and December 2007.

The total effect of the hypothetical policy during the first 5 years after implementation is given by (7) with . The total effect of the policy is assumed to be constant after 5 years, .

In each of the 15 states, multiply the observed number of firearm deaths by the total effect for each year after the implementation of the hypothetical policy.

Analyze these data using the methods under consideration.

Repeat 1,000 times.
This procedure is repeated for all combinations of
Table 1 shows the simulation results for a hypothetical policy that has no effect (). All models are approximately unbiased. The effect coded model has the lowest MSE, but suffers from inflated Type I error. Both the NBDAM(1) and NBDAM(2) have lower MSE than the change coded model, and both have slightly higher Type I error rates than the nominal 5%. The NBDAM(2) has lower MSE than the NBDAM(1).
Table 2 shows the simulation results for a hypothetical policy that has both an instantaneous and phasein effect (). The effect coded model is strongly biased towards the null and has low power, while the change coded model is has a small bias away from the null. Both the NBDAM(1) and NBDAM(2) are approximately unbiased. The NBDAM(2) has higher power than its NBDAM(1) counterpart (0.910 versus 0.807, respectively), and the lowest mean squared error of all approaches considered.
Figure 2 illustrates each model’s bias at different values of law effect parameters. Panel (a) show the bias 5years after the law’s implementation when varying the true instantaneous effect when there is no phase in effect (). Both NBDAMs are approximately unbiased, while the effect coded and change coded models are both biased. The effect coded model is biased towards the null, estimating a nearly null effect regardless of the true value. The change coded model is biased away from the null, with an increasing bias as the magnitude of the true effect increases.
Panel (b) show the bias 5years after the law’s implementation when varying the true phasein effect and imposing no instantaneous effect (). Both NBDAM models are approximately unbiased, while the effect coded and change coded models are both biased. As in Panel (a), the effect coded model is biased towards the null, estimating a nearly null effect regardless of the true value. The change coded model is slightly biased away from the null, with an increasing bias as the magnitude of the true effect increases. The biases shown in Figure 2 (a) and (b) for the effect coded and change coded models are expected based on the strength of the autocorrelation (the unadjusted autocorrelation is estiamted at ). The more correlated the outcome is over time, the more bias towards the null the effect coded model will have, while the change coded model will become increasingly unbiased for stronger autocorrelation.
Model  True RR  Bias  SD  MSE  Type I Error 

Effect coded  1.00  0.0000  0.0072  0.0001  0.113 
Change coded  1.00  0.0024  0.0433  0.0019  0.030 
NBDAM(1)  1.00  0.0009  0.0361  0.0013  0.070 
NBDAM(2)  1.00  0.0012  0.0303  0.0009  0.062 
Model  True RR  Bias  SD  MSE  Power 

Effect coded  0.9025  0.0867  0.0072  0.0076  0.416 
Change coded  0.9025  0.0215  0.0378  0.0019  0.769 
NBDAM(1)  0.9025  0.0046  0.0329  0.0011  0.807 
NBDAM(2)  0.9025  0.0034  0.0277  0.0008  0.910 
4 Case Study: The Impact of Child Access Prevention Laws
Using the same data described above for the simulation study, we estimate the effect of CAP laws on total firearm deaths. This is a simplification of the analysis described in schell2020 which used these same data and estimated the effects of CAP and two other gun laws simultaneously. We code states as having CAP laws if they impose civil or criminal penalties for storing a handgun in a manner that allows access by a minor according to the RAND State Firearm Law Database (cherney2018development). We consider two models. First, we estimate the negative binomial model of schell2020 that included four versions of the policy effect variables: the instant effect and five year phasein effect described in (7), as well as their firstdifferences. Marginal effects were extracted from these models using numerical integration (schell2020). Second, we considered the NBDAM(2) with defined by (7) with . The risk ratio is estimated using the simple approximation in (6). Both models were fit using Stan in R (rstan) with 10,000 MCMC iterations across four chains.
Figure 4
plots the posterior median and 95% credible intervals of the risk ratio measuring the impact of CAP laws on total firearm deaths. The posterior medians are represented by the solid lines, and the 95% credible intervals are represented by the dashed lines. Both the posterior medians and the 95% credible intervals are qualitatively similar, and the differences between the results from the two models is not substantively meaningful. However, the 95% credible intervals for the risk ratio after 5 years from implementation for the NBDAM(2) does not contain 1, 0.930 (0.868–0.997), while the interval for the model of
schell2020 does, 0.937 (0.871–1.008).5 Discussion
This paper evaluates a class of autoregressive models for estimating effects in time series data that allow for direct interpretation of model parameters. The proposed approach is a novel application of these autoregressive models, rather than a postestimation adjustment to the standard autoregressive model model. The parameters from a standard autoregressive model are not interpretable without additional computation because the model conditions on the lagged outcome, which may be endogenous to the variable of interest (pearl2009causality). Fortunately, for linear autoregressive models, the magnitude of this endogeneity can be computed at each time point as a function of model coefficients. The debiased autoregressive models of this paper effectively subtract the endogenous component from the models at each time point during estimation.
The closed form results presented in this paper for linear autoregressive models are not generally available for other, nonlinear models. We propose a simple approximation that facilitates interpretation, and our empirical simulation study supports the use of such approximations when analyzing the effect of gun policies on total firearm deaths. We believe that total firearm deaths represents the expected behavior of many other outcomes when analyzing the effects of gun policy, but it is possible that the approximation is not suitable for outcomes in other domains. In light of that, we recommend that researchers who are concerned whether the proposed approach is appropriate for inference in their specific analysis conduct a simulation study within their dataset to verify the results of this study extend to their proposed analysis. Further research is required to better understand the limits of this approach for other applications.
.1 Defining the Instant and Linear Phasein Effects
The outcomes discussed in this paper are measured as aggregations of events during specific windows in time, e.g., annual firearm deaths, and the statelevel policies of interest may not align exactly with these measurement periods. As such, the effects of the policies may only be active during part of the year. To overcome this challenge, we parameterize our effects of interest using a continuous time scale, and then average these effects over the period of observation (on the log scale). Without loss of generality, assume that a policy becomes active at time , where time is scaled to represent years. Assuming an instantaneous risk ratio of the policy in continuous time of size , the average on the log risk ratio during the period between and is given by the fraction of year in which the policy was active, , and 1 for all time periods after .
For an effect that is assumed to phasein linearly on the log scale, the form is only slightly more complex. Again, assume that a policy become active at time , and assume a log risk ratio that phasesin linearly over a year period, i.e., . The effect at any time point is then given by:
Averaging the log risk ratio between two subsequent years provides a closed form:
.2 Additional Simulation Results
Figure 5 provides the bias of different models for estimating the total effect of a policy at various time points after implementation. Of note, the NBDAMs are approximately unbiased for all effects across all years, while the effect coded and changed coded models are not.
Figure 6 provides the MSE of different models for estimating the total effect of a policy at various time points after implementation. The effect coded model has the lowest MSE when there is no true effect, but has worse performance as the size of the true effect increases. This occurs because the effect coded model is biased towards the null. The remaining models have similar patterns of MSE, with the change coded model tending to have lower MSE for estimating the instantaneous effect (those at Year 0), and the NBDAMs having lower MSE for the total effect at 5 years after implementation.
Comments
There are no comments yet.