Count time series is an active area of current research, with several recent review papers and books appearing on the topic (fokianos2012count, davis2016handbook, weiss2018introduction, davis2021count). Gaussian models, which are completely characterized by the series’ mean and autocovariance structure, may inadequately describe count series, especially when the counts are small. This paper uses a recent advance in count modeling in jia2021count to develop a very general count time series model with seasonal characteristics. Specifically, a transformation technique is used to convert a standardized seasonal correlated Gaussian process into a seasonal count time series. The modeling paradigm allows any marginal count distribution to be produced, has very general correlation structures that can be positive or negative, and can be fitted via likelihood methods. While our most basic setup produces strictly stationary count series (in a periodic sense), nonstationary extensions, particularly those involving covariates, are easily achieved.
With denoting the known period of the data, our objective is to model a time series in time that has a count marginal distribution and periodic properties with known period . A seasonal notation uses to denote the series during the th season of cycle . Here, is the seasonal index and . We assume that there are total observations, taken at the times . To avoid trite work with edge effects, we assume that is a whole number.
We seek to construct count series having the cumulative distribution functionfor each cycle — this stipulation imposes a periodic marginal distribution on the series. In fact, our constructed series will be strictly periodically stationary: for each and all times
, the joint distribution ofcoincides with that of . We use notations such as and interchangeably, the latter being preferred when seasonality is emphasized.
Some previous seasonal count time series models are now reviewed. The most widely used seasonal count time series models to date develop periodic versions of discrete integer-valued autoregressions (PINAR models) — see monteiro2010integer, santos2019periodic, and bentarzi2020some. For example, a first order PINAR series obeys the difference equation
Here, for each season and denotes the classical thinning operator: for an independent and identically distributed (IID) sequence of zero-one Bernoulli trials
and a count-valued random variablethat is independent of , . The noises
are periodic independent and identically distributed (IID) count-valued random variables having finite second moments.
The PINAR model class has drawbacks. Even in the stationary case, PINAR models cannot produce some marginal distributions. joe2016markov quantifies the issue in the stationary case, showing that only marginal distributions in the so-called discrete self-decomposable family can be achieved. Another issue is that PINAR models must have non-negative correlations. Negatively correlated count series do arise (kachour2009first, livsey2018multivariate, jia2021count). Likelihood inference for PINAR and INAR models can be challenging; moreover, adding covariates to the models is non-trivial. See joe2019likelihood for more in the stationary setting.
A different method for constructing seasonal count series uses a periodic renewal point processes as in fralix2012renewal and livsey2018multivariate. Here, a zero-one binary sequence is constructed to be periodically stationary and denote IID copies of . The periodic count series is constructed via the superposition
Here, is a periodic IID sequence of count valued random variables independent of the . For example, to obtain a correlated sequence with Poisson marginal distributions, is taken as independent Poisson in , with having the mean . Then it is easy to see that
is Poisson distributed with mean. fralix2012renewal, lund2016renewal, and livsey2018multivariate show how to produce the classical count marginal distributions (Poisson, binomial, and negative binomial) with this setup and consider processes constructed by clipping Gaussian processes.
While binary-based models typically have negative correlations whenever does, it can be difficult to achieve some marginal distributions. A prominent example of this is the often sought generalized Poisson marginal. Perhaps worse, likelihood methods of parameter inference appear intractable — current parameter inference methods use Gaussian pseudo-likelihoods, which only use the series’ mean and covariance structure. See davis2021count for additional detail.
Before proceeding, a clarification needs to be made. The models constructed here posit a particular count marginal distribution for the data a priori. This differs from dynamic linear modeling goals, where count models are often built from conditional specifications. For a time-homogeneous AR(1) example, a dynamic linear model might employ the state space setup , where , , and is zero mean Gaussian noise. Such a setup produces a conditional Poisson distribution, not a series with a Poisson marginal distribution. In fact, as asmussen_foss_2014 show, the marginal distribution in the above Poisson state space setup can be far from Poisson.
Additional work on periodic count series is contained in morina2011statistical, monteiro2015periodic, bentarzi2017periodic, aknouche2018periodic, santos2019theory, aknouche2018periodic, and ouzzani2019mixture. Most of these references take one of the above approaches. Motivated by jia2021count, this paper presents a different approach.
The rest of this paper proceeds as follows. The next section reviews periodic time series methods, focusing on periodic autoregressive moving-average (PARMA) and seasonal autoregressive moving-average (SARMA) difference equation structures. Section 3 clarifies our model and its properties. Section 4 narrates parameter estimation methods and Section 5 studies these techniques via simulation. Section 6 analyzes a twenty year segment of weekly rainy day counts in Seattle, Washington. Section 7 concludes with comments and remarks.
2 Periodic Time Series Background
This section briefly reviews periodic (seasonal) time series. Our future count construction uses a series , standardized to and , and having Gaussian marginal distributions. While the mean of is zero, periodic features in the autocorrelation function of , which we denote by , will become paramount.
We call a PARMA() series if it obeys the periodic ARMA() difference equation
Here,. The autoregressive order is and the moving-average order is , which are taken constant in the season for simplicity. The autoregressive and moving-average coefficients are and , respectively, during season . We tacitly assume that model parameters are identifiable from the covariance of the series. This may require more than the classical causality and invertibility conditions (reinsel2003elements). Gaussian PARMA solutions are strictly stationary with period as long as the autoregressive polynomial does not have a root on the complex unit circle — see lund1999modeling for quantification. Not all PARMA parameters are free due to the restriction ; the following example delves further into the matter.
Example 3.1 A PAR(1) series with period obeys the recursion
where is zero mean white noise with . The quantities and are assumed periodic in with period . This difference equation is known to have a unique (in mean squared) and causal solution whenever there is a stochastic contraction over an entire cycle: (lund1999modeling).
To impose , take a variance on both sides of (2) and set to infer that , which we tacitly assume is positive for all . This uses , which follows by causality. The covariance structure of can now be extracted as
for . .
Another class of periodic models in use today are the SARMA series. SARMA series are actually time-stationary models, but have comparatively large autocorrelations at lags that are multiples of the period . The most basic SARMA() series obeys a difference equation driven at lags that are multiples of the period :
where is zero mean independent noise with a constant variance. In this setup, unless is a whole multiple of the period . As such, many authors allow to have additional correlation, specifically a zero mean ARMA() series. This results in a model that can have non-zero correlations at any lag; however, the model is still stationary and does not have any true periodic features. Since the model is stationary, we write .
Example 3.2 A SAR(1) series with period and AR(1) obeys the difference equation pair
where is zero mean white noise with variance , , and . Combining these two difference equations results in a stationary and causal AR() model for .
Imposing and taking a variance in the first equation in (4) gives
To proceed, use equation (4)’s causal solutions and to get
for any . Combining the last two equations, we see that taking
To extract the covariance structure of , multiply both sides of (4) by and take expectations to get the Yule-Walker type equations
This system can be rewritten in vector form as
where and . .
PARMA and SARMA methods are compared in detail in lund2011choosing. PARMA models are usually more applicable since the immediate past of the process is typically more influential than past process lags at multiples of the period . Applications in the environment (vecchia1985periodic, bloomfield1994periodic, lund1995climatological, tesfaye2004identification) tend to be PARMA; SARMA structures are useful in economics (franses1994multivariate, franses2004periodic, hurd2007periodically). PARMA reviews are gardner1975characterization, lund1999modeling, and gardner2006cyclostationarity; statistical inference for PARMA models is studied in lund2000recursive, basawa2001large, basawa2004first, lund2006parsimonious, shao2004least, and shao2006mixture. SARMA inference is addressed in chatfield1973box.
Our methods extend the work in jia2021count with Gaussian transformations (copulas) to the periodic setting. Let denote the time series to be constructed, which takes values in the count support set . Our construction works with a latent Gaussian series with zero mean and a unit variance at all times. Then is obtained from via
is the cumulative distribution function (CDF) of the standard normal distribution andis the desired marginal distribution for during season . Here,
is the quantile function
As jia2021count shows, this construction leaves with the marginal distribution for every and . This model is very flexible: any marginal distribution can be achieved for any desired season , even continuous ones. The marginal distribution can have the same form or be different for distinct seasons . Any marginal distribution whatsoever can be achieved; when count distributions are desired, the quantile definition in (8) is the version of the inverse CDF that produces the desired marginals.
3.1 Properties of the Model
Toward ARMA and PARMA model order identification, if is an -dependent series, then and are independent when since is Gaussian. By (7), and are also independent and is also -dependent. From the characterization of stationary moving averages (Proposition 3.2.1 in Brockwell_Davis_1991) and periodic moving-averages in Shao_Lund_2004, we see that if is a periodic moving average of order , then is also a periodic moving average of order . Unfortunately, analogous results for autoregressions do not hold. For example, if is a periodic first order autoregression, may not be a periodic autoregression of any order (jia2021count).
We now derive the covariance structure of via Hermite expansions. Let be the covariance of at times and , where . Then can be related to via Hermite expansions. To do this, let and write the Hermite expansion of as
Here, is the th Hermite coefficient for season , whose calculation is described below, and is the th Hermite polynomial defined by
The first three Hermite polynomials are , , and . Higher order polynomials can be found via the recursion , which follows from (10).
The polynomials and are orthogonal with respective to the standard Gaussian measure if : for a standard normal unless (in which case ). The Hermite coefficients are computed from
where is the standard normal density.
Lemma 2.1 in han2016correlation shows that
where denotes the season corresponding to time . Let denote the variance of . Then the ACF of is
which is a power series in with th coefficient
jia2021count call a link function and a link coefficient. When is stationary and does not depend on , they show that , , and . It is not true that in any case nor is in the periodic case; indeed, stationary or periodically stationary count processes with arbitrarily positive or negative correlations may not exist. For example, the pair , where is standard normal has correlation -1, but two Poisson random variables, both having mean , whose correlation is -1, do not exist.
The model produces the most flexible correlation structures possible in a pairwise sense. Specifically, consider two distinct seasons and and suppose that and are the corresponding marginal distributions for these seasons. Then Theorems 2.1 and 2.5 in whitt1976bivariate show that the bivariate random pair having the marginal distributions and , respectively, with the largest correlation has form and , where is a uniform[0,1] random variable. To achieve the largest correlation, one simply takes to have unit correlation at these times; that is, take . Since is uniform[0,1], the claim follows. For negative correlations, the same results in whitt1976bivariate also show that the most negatively correlated pair that can be produced has the form and . This is produced with a Gaussian series having , which is obtained by selecting . Then is again uniform[0,1] and , since for all real .
The previous paragraph implies that one cannot construct more general autocorrelation functions for count series than what has been constructed above — they do not exist. Negatively correlated count series do arise (kachour2009first, livsey2018multivariate, jia2021count) and can be described with this model class. In the stationary case where the marginal distribution is constant over all seasons , a series with for all is achieved by taking , where is standard normal. This unit correlation property will not carry over to our periodic setting. For example, a random pair having a Poisson marginal with mean during season and a Poisson marginal with mean during season with a unit correlation do not exist when . This said, the model can produce any correlation structures within “the range of achievable correlations". As such, the model class here is quite flexible.
The amount of autocorrelation that inherits from is now discussed. An implication of the result below, which establishes monotonicity of the link function by showing that its derivative is positive, is that the larger the autocorrelations are in , the larger the autocorrelations are in . We state the result below and prove it in the Appendix.
3.2 Calculation and Properties of the Hermite Coefficients
An important numerical task entails calculating , which only depends on by (11). To do this, rewrite in the form
where the convention is made. We also take and . Then for , integration by parts yields
Simplifying this telescoping sum gives
Notice that the summands in (18) are zero whenever . Lemma 2.1 in jia2021count shows that the expansion converges whenever for some . This condition automatically holds for time series, which implicitly require a finite second moment. For count distributions with a finite support, becomes unity for large . For example, a Binomial marginal distribution with 7 trials is considered in our later application. Here, the summation can be reduced to . For count distributions on a countably infinite support, approximating (18) requires truncation of an infinite series. This is usually not an issue: numerically, quickly converges to unity as for light tailed distributions — or equivalently, . In addition to (18), can also be approximated by Gaussian quadrature; see gauss.quad.prob in the package statmod in R. However, the approximation in (18) is more appealing in terms of simplicity and stability (jia2021count).
4 Parameter Inference
This section develops likelihood methods of inference for the model parameters via particle filtering and sequential Monte Carlo techniques. With many count time series model classes, likelihoods are intractable (davis2021count). Accordingly, researchers have defaulted to moment and composite likelihood techniques. However, if the count distributional structure truly matters, likelihood methods should “feel" this structure and return superior parameter estimates. Gaussian pseudo-likelihood estimates, which are based only on the mean and autocovariance of the series, are developed in jia2021count in the stationary case. jia2021count presents an example where Gaussian pseudo-likelihood estimates perform well and an example where they perform poorly.
For notation, let contain all parameters in the marginal distributions and contain all parameters governing the evolution of . The data denote our realization of the series.
The likelihood function is simply a high dimensional multivariate normal probability. To see this, use (7) to get
for some numbers and (these are clarified below but are not important here). The covariance matrix of only depends on , not on . Unfortunately, evaluating a high dimensional multivariate normal probability is numerically challenging for large . While methods to handle this problem exist (kazianka2010copula, kazianka2013approximate, bai2014efficient), they often contain substantial estimation bias.
An alternative approach, which is the one taken here, uses simulation methods to approximate the likelihood. General methods in this category include the quasi-Monte Carlo methods of genz2002comparison and the prominent Geweke–Hajivassiliou–Keane (GHK) simulator of geweke1991efficient and hajivassiliou1996simulation. The performance of these methods, along with an additional “data cloning" approach, are compared in han2020maximum, where the author shows that estimators from these methods are similar, but that the GHK methods are much faster, having a numerical complexity as low as order . Here, is the pre-selected number of sample paths to be simulated (the number of particles). As we will subsequently see, the GHK simulator works quite well for large .
jia2021count propose a sequential importance sampling method that uses a modified GHK simulator. In essence, importance sampling is used to evaluate integrals by drawing samples from an alternative distribution and averaging their corresponding weights. Suppose that we seek to estimate . Then
where is called the weight and the proposed distribution is called the importance distribution. Without loss of generality, we assume that whenever and that for
. Then the importance sampling estimate of the integral is the law of large numbers justified average
where are IID samples drawn from the proposed distribution . With the notation , notice that the likelihood in (19) has form
Observe that and only depend on and the data . Specifically,
where is defined in (16) and is the season at time . Here, it is best to choose a proposed distribution such that 1) for and otherwise; 2) the weight is easy to compute; and 3) can be efficiently drawn from . Our GHK simulator satisfies all three conditions.
To develop our GHK sampler further, we take advantage of the latent Gaussian structure in the PARMA or SARMA series . In simple cases,
may even be a Markov chain. The GHK algorithm samples, depending on the its previous history and , from a truncated normal density. Specifically, let denote the truncated normal density of given the history and . Then
are the one-step-ahead mean and standard deviation ofconditioned on . Again, and only depend on . Here, we choose the importance sampling distribution
Elaborating further, let denote a normal random variable with mean and variance that is known to lie in , where . Then is first drawn from . Thereafter, are sequentially sampled from the distribution in (21). The proposed importance sampling distribution is efficient to sample, has the desired distributional support, and induces an explicit expression for the weights:
Using (21) gives
Define the initial weight . We then recursively update the weights via
at time during the sequential sampling procedure. At the end of the sampling, we obtain
In the classic GHK simulator, and are obtained from a Cholesky decomposition of the covariance matrix of . Here, they are based on the PARMA or SARMA model for .
The full sequential importance sampling procedure is summarized below.
Initialize the process by sampling from the distribution. Define the weight by
Now iterate steps 2 and 3 over . Conditioned on , generate
For example, in the PAR(1) model, for , with the startup condition ; for with the startup .
Define the weight via
The above generates a fair draw of a single “particle path" with the property that the series generated from yields the observations . Repeating this process independent times gives simulated process trajectories. Let be these trajectories and denote their corresponding weights at time by .
The importance sampling estimate is given by
A large provides more accurate estimation.
The popular “L-BGSF-B" gradient step and search method is used to optimize the estimated likelihood