1 Introduction
This paper develops the theory and methods for copula modeling of stationary discretevalued time series. Since the majority of discrete cases involve modeling integer counts supported on some subset of , we hereafter isolate on nonnegative integer count structures. Our methods are based on a copula transformation of a latent Gaussian stationary series and are able to produce any desired count marginal distribution. It is shown that the model class produces the most flexible pairwise correlation structures possible, including negatively dependent series. Model parameters are estimated via two methods: 1) a Gaussian pseudo likelihood approach, developed with some new Hermite expansion techniques, that only employs the mean and the autocovariance of the series, and 2) a particle filtering approach that adapts hidden Markov model (HMM) techniques to general (not necessarily Markov) latent Gaussian stationary series to approximate the true likelihood. Extensions to nonstationary settings, particularly those where covariates arise, are also discussed.
The theory of stationary Gaussian time series is by now well developed. A central result is that one can have a stationary Gaussian series , with lag autocovariance , if and only if is symmetric ( for ) and nonnegative definite, viz.
for every , and real numbers (see Theorem 1.5.1 in [6]). Such a result does not hold for count series. For example, existence of a stationary series with a Poisson marginal distribution is not guaranteed when is merely symmetric and nonnegative definite. In fact, in the Poisson case, such a result is false: is symmetric and nonnegative definite, but it it is impossible to have and jointly distributed with the same Poisson marginal distribution and a correlation of (the reader is challenged to verify this nonexistence). In principle, distributional existence issues are checked with Kolmogorov’s consistency criterion (see Theorem 1.2.1 in [6]); in practice, one needs a specified joint distribution to check for consistency. Phrased another way, Kolmogorov’s consistency criterion is not a constructive result and does not illuminate how to build time series having the desired marginal distributions and correlation structures. Owing to this, count time series have been constructed from a variety of methods over the years. We now discuss past approaches to the stationary correlated count problem; a recent overview of stationary methods is contained in [34]. While extensions to nonstationary cases with covariates are considered later, the fundamental problem lies with constructing models for stationary count series.
Borrowing from the success of autoregressive movingaverage (ARMA) models in describing stationary Gaussian series, early count authors constructed correlated count series from discrete autoregressive movingaverage (DARMA) and integer autoregressive movingaverage (INARMA) difference equation methods. Focusing on the first order autoregressive case for simplicity, a DAR(1) series with marginal distribution is obtained by generating from and then recursively setting
where is generated as independent and identically distributed (IID) with marginal distribution and are generated IID independent Bernoulli trials, independent of , with . Induction shows that has marginal distribution for every . INAR(1) series are built with a thinning operator and parameter defined by
for any count valued random variable
, whereis again a collection of IID Bernoulli trials with success probability
. The INAR(1) difference equation iswhere is an IID countvalued random sequence.
DARMA methods were initially explored in [25, 26], but were subsequently discarded by practitioners because their sample paths can remain constant for long periods in highly correlated cases (observe that for DAR(1) series); INARMA series are still used today. Both INAR(1) and DAR(1) models have a correlation function of form , which cannot be negative since . While one can add higher order autoregressive and even movingaverage terms to the DAR(1) and INAR(1) setups (see [37, 38, 39]), all correlations remain nonnegative. In contrast to the Gaussian ARMA brethren, DARMA and INARMA models do not span the entire range of possible correlation structures for stationary count series. Extensions of DARMA and INARMA methods are considered in [28], but again, none can produce series with negative correlations.
Blight [5] and [9] take a different approach, constructing the desired count marginal distribution by combining IID copies of a correlated Bernoulli series in various ways. By using a binary constructed from a stationary renewal sequence [5, 9, 27], a variety of marginal distributions, including binomial, Poisson, and negative binomial, were produced; moreover, these models can have negative correlations. While these models do not necessarily produce the most negatively correlated count structures possible, they often come close to achieving this bound. The work in [33] derives explicit autocovariance functions when is made by binning a stationary Gaussian sequence into zeroone categories and gives an example of a hurricane count data set where negative correlations arise. That said, some important count marginal distributions, including generalized Poisson, are not easily built from this suite of methods. The results here easily generate any desired count marginal structure.
Other count model structures studied include Gaussian based processes rounded to their nearest integer [29], hierarchical Bayesian count model approaches [3], generalized ARMA methods [4], and others. Each approach has some drawbacks. For example, rounding Gaussian processes to their nearest integer makes it difficult to produce a specific prescribed marginal distribution. Many hierarchical Bayesian procedures also exist for count series. These methods typically posit conditional distributions in lieu of marginal distributions. For example, in a Poisson regression, the Poisson marginal stipulation is being imposed on the marginal distribution of the data. This is not the same as positing a conditional Poisson setup where one takes given some random to have a Poisson distribution with mean . Indeed, as [2] shows, once the randomness of is taken into account, the true marginal distribution can be far from Poisson.
As noted above, a very flexible class of stationary count time series models is constructed here, employing a latent Gaussian process and copula transformations. These techniques have recently shown promise in spatial statistics [14, 22], multivariate modeling [42, 43], and regression [35], but the theory has yet to be developed for count time series ([35, 32] provide some partial results). Our objectives here are severalfold. On a methodological level, it is shown, through some newly derived expansions based on Hermite polynomials, that accurate and efficient numerical quantification of the correlation structure of the copula count model class is feasible. Based on a result of [46], the class produces the most flexible pairwise correlation structures possible — a property quantified in Remark 2.2. Connections of the copula model are also revisited to both importance sampling schemes, where the popular GHK sampler is adapted to the time series context, and to the HMM literature, which allows natural extensions of the GHK sampler and computation of quantities other than the likelihood. All methodological contributions are tested on both synthetic and real data. Other prominent count time series works include [17, 12, 11, 18, 16].
The works [35, 32] are perhaps the closest papers to this study. While the general latent Gaussian construct adopted is the same, our work differs in that explicit autocovariance expansions are developed via Hermite expansions, flexibility and optimality issues of the model class are addressed, simple Gaussian pseudolikelihood estimation of model parameters is developed, and both the importance sampling and HMM connections are explored in greater depth. More detail on the connections to [35, 32] can be found in the main body of the paper. The works [22, 24] are also closely related and will be referenced below with commentary, but their focus is on spatial count modeling.
The rest of this paper proceeds as follows. The next section introduces our latent Gaussian count model and establishes its basic mathematical and statistical properties. Section 3 moves to estimation, developing two techniques. The first method is a Gaussian pseudolikelihood approach that involves only the mean and covariance of the series and can be numerically optimized to rapidly obtain model parameter estimates. The second method uses particle filtering techniques to construct an approximation of the true likelihood of the series. Here, connections to HMM and importance sampling methodologies are made. Section 4 presents simulation results, showing some standard cases where the simpler Gaussian pseudolikelihood approach performs reasonably well. The section also shows a case where particle filtering likelihood estimates, which feel the entire joint count distributional structure, are superior. Section 5 analyzes a count series containing the annual number of nohitter Major League Baseball games since 1893. Here, two covariates are considered: pitching mound height and the number of games played in each season. Section 6 closes the paper with some comments and suggestions for future research.
2 Theory
We are interested in constructing stationary time series that have marginal distributions from several families of count structures supported in , including:

Binomial (Bin()): , , ;

Poisson (Pois()): , with ;

Mixture Poisson (mixPois()): , where with the mixture probabilities such that and with for each ;

Negative binomial (NB()): , with and ;

Generalized Poisson (GPois()): , with and ;

ConwayMaxwellPoisson (CMP): , with , , and a normalizing constant making the probabilities sum to unity.
The negative binomial, generalized Poisson (when
), and ConwayMaxwellPoisson distributions are overdispersed in that their variances are larger than their respective means. This is the case for sample variances and means of many observed count time series.
Let
be the stationary count time series of interest. Suppose that one wants the marginal cumulative distribution function (CDF) of
for each of interest to be, depending on a vector
containing all CDF model parameters. The series will be modeled through the copula type transformation(1) 
Here,
(2) 
where is the CDF of a standard normal variable and
(3) 
is the generalized inverse (quantile function) of the nondecreasing CDF
. The process is assumed to be standard Gaussian, but possibly correlated in time :(4) 
that is, for each . This approach was recently used by [42, 35, 22] with good results. The autocovariance function (ACVF) of , denoted by , is the same as the autocorrelation function (ACF) due to the standard normal assumption and depends on another vector of ACVF parameters.
The model in (1)–(3) has appeared in other bodies of literature under different nomenclature. In particular, [7, 8] call this setup the normal to anything (NORTA) procedure in operations research and [20] calls this a translational model in mechanical engineering. The goal here is to give a reasonably complete analysis of the probabilistic and statistical properties of these models.
The construction in (1) ensures that the marginal CDF of is indeed . Elaborating, the probability integral transformation theorem shows that
has a uniform distribution over
for each ; a second application of the result justifies that has marginal distribution for each . Temporal dependence in will induce temporal dependence in as quantified in the next section. For notation, let(5) 
denote the ACVF of .
2.1 Relationship between autocovariances
The autocovariance functions of and can be related using Hermite expansions (see Chapter 5 of [41]). More specifically, let
(6) 
be the expansion of in terms of the Hermite polynomials
(7) 
The first three Hermite polynomials are , , and ; higher order polynomials can be obtained from the recursion . The Hermite coefficients are
(8) 
The relationship between and is key and is extracted from Chapter 5 of [41] as
(9) 
where the power series is
(10) 
In particular,
(11) 
depends only on the parameters in the marginal distribution . Note also that
(12) 
where
(13) 
and . The function maps into (but not necessarily onto) . For future reference, note that and by (11),
(14) 
Using (6) and gives
(15) 
however, is not necessarily in general. As such, “starts” at , passes through , and connects to . Examples will be given in Section 2.4.
From (12), one can see that
(16) 
This will be useful later. Equation (12) shows that a positive leads to a positive . The same holds for a negative sign since is, in fact, monotone increasing (see Proposition 2.1 below) and crosses zero at (the negativeness of when can also be deduced from the nondecreasing nature of via an inequality on page 20 of [45] for Gaussian variables).
The quantity is called a link function, and , , are called link coefficients. (Sometimes, slightly abusing the terminology, we shall also use these terms for and , respectively.) A key feature in (9) is that the effects of the marginal CDF and the ACVF are “decoupled” in the sense that the correlation parameters in do not influence the coefficients in (9) — this will be useful in estimation later.
Remark 2.1.
The short and longrange dependence properties of can be extracted from those of . Recall that a time series is shortrange dependent (SRD) if . According to one definition, a series is longrange dependent (LRD) if , where is the LRD parameter and is a slowly varying function at infinity [41]. The ACVF of such LRD series satisfies . If is SRD, then so is by (16). On the other hand, if is LRD with parameter , then can be either LRD or SRD. The conclusion depends, in part, on the Hermite rank of , which is defined as . Specifically, if , then is SRD; if , then is LRD with parameter (see [41], Proposition 5.2.4). For example, when the Hermite rank is unity, is LRD with parameter for all ; when , is LRD with parameter for .
Remark 2.2.
The construction in (1)–(2) yields models with very flexible autocorrelations. In fact, the methods achieve the most flexible correlation possible for when and have the same marginal distribution . Indeed, let and define similarly with replaced by . Then, as shown in Theorem 2.5 of [46],
where is a uniform random variable over . Since and for a standard normal random variable , the maximum and minimum correlations and are indeed achieved with (1)–(2) when and , respectively. The preceding statements are nontrivial for only since is attained whenever . It is worthwhile to compare this to the discussion surrounding (15). Finally, all correlations in are achievable since in (13) is continuous in . The flexibility of correlations for Gaussian copula models in the spatial context was also noted and studied in [22], especially when compared to a competing class of hierarchical, e.g. Poisson, models.
The preceding remark all but settles flexibility of autocovariance debates for stationary count series. Flexibility is a concern when the count series is negatively correlated, an issue arising in the hurricane data in [33]. Since a general count marginal distribution can also be achieved, the model class appears quite general.
2.2 Covariates
There are situations where stationarity is not desired. Such scenarios can often be accommodated by simple variants of the above setup. For concreteness, consider a situation where nonrandom covariates are available to explain the series at time — call these . If one wants to have the marginal distribution , where is a vectorvalued function of containing parameters, then simply set
(17) 
and reason as before.
Link functions, not to be confused with in (12)–(13), can be used when parametric support set bounds are encountered. As an example, a Poisson regression with correlated errors can be formulated via
Here, the exponential link guarantees that the Poisson parameter is positive and are regression coefficients. The above construct requires the covariates to be nonrandom; should covariates be random, the marginal distribution may change.
2.3 Calculation and properties of the Hermite coefficients
Several strategies for Hermite coefficient computation are available. We consider the stationary setting here for simplicity. Since in (2) is discrete, the following approach proved simple, stable, and revealing. Let denote all parameters appearing in the marginal distribution . For fixed, define the mass and cumulative probabilities of via
(18) 
where dependence on is notationally suppressed. Note that
(19) 
(take as a convention). When , we take and, when , we take . Using this in (8) provides, for ,
Plugging (7) into the above equation and simplifying provides
(20)  
The telescoping nature of the series in (20) provides
(21) 
(convergence issues are addressed in Lemma 2.1 below). When (that is, or ), the summand is interpreted as zero. Before proceeding, the following results will clarify a number of coefficient issues. The key technical step is Lemma 2.1 below. As noted in these remarks and also in the next section, (21) is appealing from a numerical standpoint and also sheds light on the behavior of the Hermite coefficients.
Lemma 2.1.
The representation in (21) is valid whenever for some .
Proof.
Observe that one obtains (21) from (20) if, after changing to for notational simplicity,
(22) 
To see that this holds when for some , suppose that for all , since otherwise the sum in (22) has a finite number of terms. Since is a polynomial of degree , for some constant that depends on . The sum in (22) can hence be bounded (up to a constant) by
(23) 
To show that (23) converges, it suffices to show that
(24) 
since as
. Mill’s ratio for a standard normal distribution states that
as . Substituting gives as . Taking logarithms in the last relation and ignoring constant terms, order arguments show that as . Substituting into (24) provides(25) 
Remark 2.3.
From a numerical standpoint, the expression in (21) is evaluated as follows. The families of marginal distributions considered in this work have fairly “light” tails, meaning that approaches unity rapidly as . This means that becomes exactly unity numerically for small to moderate values of . Let be the smallest such value. For example, for the Poisson distribution with parameter and Matlab software, , , and . For , the numerical value of is infinite and the terms in (21) are numerically zero and can be discarded. Thus, (21) becomes
(28) 
Alternatively, one could calculate the Hermite coefficients using Gaussian quadrature methods, as discussed e.g. in [22], p. 51. The approach based on (28) though is certainly simpler numerically. Furthermore, as noted below, the expression (28) can shed further light on the behavior of the Hermite coefficients.
Remark 2.4.
Assuming that the are evaluated through (28), their asymptotic behavior as can be quantified. We focus on , whose squares are the link coefficients. The asymptotic relation for Hermite polynomials states that as for each . Using this and Stirling’s formula ( as ) show that
(29) 
Numerically, this approximation, which does not involve Hermite polynomials, was found to be accurate for even moderate values of . It implies that decays (up to a constant) as . While this might seem slow, these coefficients are multiplied by in (9), which decay geometrically rapidly in to zero, except in degenerate cases when .
The computation and behavior of the link coefficients are now examined for several families of marginal distributions. Figure 1 shows plots of on a vertical log scale over a range of parameter values for for the Poisson and negative binomial (with ) distributions. A number of observations are worth making.
Since and by construction, the parameter values in Figure 1 with close to (or close to ) implies that most of the “weight” in the link coefficients is contained in the first coefficient, with higher order coefficients being considerably smaller and decaying with increasing . This takes place in the approximate ranges for the Poisson distribution and in the negative binomial distribution with . Such cases will be called “condensed”. As shown in Section 2.4 below, in the condensed case is close to . In the condensed case, correlations in and are similar.
Noncondensed cases are referred to as “diffuse”. Here, weight is spread to many link coefficients. This happens in the approximate ranges for the Poisson distribution and and for the negative binomial distribution with . This was expected for small s and small s: these cases correspond to discrete random structures that are nearly degenerate in the sense that they concentrate at (as or ). For such cases, large negative correlations in (15) are not possible; hence, cannot be close to and correlations in and are different. The diffuse range for the negative binomial distribution remains to be understood, although it is likely again some form of degeneracy.
2.4 Calculation and properties of link functions
We now study calculation of in (13), which requires truncation of the sum to for some . Note again that the link coefficients are multiplied by in (9) before they are summed, which decays to zero geometrically rapidly in for most stationary of interest when . The link coefficients for large are therefore expected to play a minor role. We now set and explore consequences of this choice.
Remark 2.5.
An alternative procedure would bound (29) by . Now let be the smallest for which the bound is smaller than some predetermined error tolerance . In the Poisson case with , for example, such are , and . These are close to the chosen value of . A different bound and resulting truncation in the spatial context can be found in [22], Lemma 2.2.
Figure 2 plots (solid line) for the Poisson and negative binomial distributions for several parameter values. The link function is computed by truncating its expansion to as discussed above. The condensed cases and (perhaps this case is less condensed) and lead to curves that are close to . However, the diffuse cases appear more delicate. Diffusivity and truncation of the infinite series in (13) lead to a computed link function that does not have (see (14)); in this case, one should increase the number of terms in the summation.
Though deviations from might seem large (most notably for the negative binomial distribution with ), this seems to arise only in the more degenerate cases associated with diffusivity; moreover, this occurs only when linking an ACVF of for lags for which is close to unity. For example, note that if the link deviation is from unity at (as it is approximately for the negative binomial distribution with ), the error for linking as 0.8 (or smaller but positive) would be no more than . In practice, any link deviation could be partially corrected by adding one extra “pseudo link coefficient”, in our case, a 26th coefficient, which would make the link function pass through . The resulting link function is depicted in the dashed line in Figure 2 around the point and essentially coincides with the original link function for all ’s except possibly for values that are close to unity.
The situation for negative and, in particular, around is different: the theoretical value of in (15) is not explicitly known. However, a similar correction could be achieved by first estimating through a MonteCarlo simulation and adding a pseudo 26th coefficient making the computed link function connect to the desired value at . This is again depicted for negative via the dashed lines in Figure 2, which is visually distinguishable only near (and then only in some cases).
Remark 2.6.
In our ensuing estimation work, a link function needs to be evaluated multiple times; hence, running MonteCarlo simulations to evaluate can become computationally expensive. In this case, the estimation procedure is fed precomputed values of
on a grid of parameter values and interpolation is used for intermediate parameter values.
The next result further quantifies the structure of the link function. The result implies that is nondecreasing as a function of . The link’s strict monotonicity is known from [20] when is nondecreasing and differentiable, which does not hold in our case. (Nonstrict) monotonicity for arbitrary nondecreasing is also argued in [7]. Our argument extends strict monotonicity to our setting and identifies an explicit form for the link function’s derivative.
Proposition 2.1.
This result is proven in Appendix A.
Remark 2.7.
The antiderivative
does not seem to have a closed form expression for general . (If it did, then one could integrate (30) explicitly and get a closed form expression for .) But a number of numerical ways to evaluate the above integral over a finite interval have been studied; see, for example, [19], Section 2.
2.5 Particle filtering and the HMM connection
This subsection studies the implications of the latent structure of our model, especially as it relates to HMMs and importance sampling approaches. This will be used in constructing particle filtering (PF) approximations to the true likelihood of the model and in goodnessoffit assessments. The first suggested PF approximation of the likelihood is essentially the popular GewekeHajivassiliouKeane (GHK) sampler for the truncated multivariate normal distribution, as discussed in more detail in Remark 2.9 below. Otherwise, we adhere more closely to the terminology and approaches from the HMM literature.
Our main HMM reference is [15]. As in that monograph, the observations are taken to start at time zero. The following notations are key: let denote the onestepahead linear prediction of the latent Gaussian series from the history . This will be expressed as . The weights , , can be computed recursively in and efficiently from the ACVF of via the DurbinLevinson (DL) algorithm, for example. By convention, . Let be the corresponding meansquared prediction error.
The following problems take centre stage:
Filtering  
Prediction 
as well as numerically computing and for a given function . Here, refers to an expectation conditioned on . These quantities are needed to evaluate likelihoods in Section 3.2 and for model diagnostics in Section 3.3.
Our first task is to derive expressions for the above distributions. We use the notation
(31) 
its role stemming from
Comments
There are no comments yet.