A new INARMA(1, 1) model with Poisson marginals

10/16/2019
by   Johannes Bracher, et al.
0

We suggest an INARMA(1, 1) model with Poisson marginals which extends the INAR(1) in a similar way as the INGARCH(1, 1) does for the INARCH(1) model. The new model is equivalent to a binomially thinned INAR(1) process. This allows us to obtain some of its stochastic properties and use inference methods for hidden Markov models. The model is compared to various other models in two case studies.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

10/27/2020

Autoregressive Asymmetric Linear Gaussian Hidden Markov Models

In a real life process evolving over time, the relationship between its ...
11/19/2020

Parallel tempering as a mechanism for facilitating inference in hierarchical hidden Markov models

The study of animal behavioural states inferred through hidden Markov mo...
05/14/2022

ACID: A Low Dimensional Characterization of Markov-Modulated and Self-Exciting Counting Processes

The conditional intensity (CI) of a counting process Y_t is based on the...
08/01/2020

Estimating the real burden of disease under a pandemic situation: The SARS-CoV2 case

The present paper introduces a new model used to study and analyse the s...
08/23/2018

Fractional Risk Process in Insurance

Important models in insurance, for example the Carmér--Lundberg theory a...
03/31/2020

Modelling and understanding count processes through a Markov-modulated non-homogeneous Poisson process framework

The Markov-modulated Poisson process is utilised for count modelling in ...
07/12/2019

On the connection between the macroscopic Poisson's ratio of an isotropic discrete system and its geometrical structure

The use of discrete material representation in numerical models is advan...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Time series of counts are encountered in a broad variety of contexts. Two popular modelling approaches are the INAR (integer-valued autoregressive [1]

) and INGARCH (integer-valued generalized autoregressive conditional heteroscedasticity

[5]) classes. In this article an extension of the Poisson INAR(1) model is proposed which parallels the generalization of the INARCH(1) to the INGARCH(1, 1) model. We give some properties of the new model, which we refer to as INARMA(1, 1), and point out how to do inference via methods for hidden Markov processes. The performance of the model is compared to various INAR and INGARCH-type models in two case studies.

2 The Poisson INAR(1) and INARCH(1) models

The Poisson INAR(1) model with parameters and is defined as [1]

(1)

The operator denotes binomial thinning, i.e.  with , implying . The sequence

consists of independent Poisson random variables with rate

. All thinning operations are performed independently of each other and of . Moreover, at each time , and are independent of all . Marginally the

are then Poisson distributed with rate

; the autocorrelation function is .

The Poisson INARCH(1) model is usually defined as [10]

(2)

with , but can also be formulated as (compare [11])

(3)

where is again a sequence of independent Poisson random variables with rate . We define the operator as where for we include the degenerate Poisson distribution . Note that while in (3) is integer-valued, is also defined for real-valued . If , the process is stationary with and , i.e.  is over-dispersed for . The autocorrelation function is again .

3 Extension to models with ARMA(1, 1)-like covariance structure

The INARCH(1) model can be extended to the Poisson INGARCH(1, 1) model [5]

(4)
(5)

with and . In the following we assume to be stationary, which is the case if . We can then express it using the operator from (3). Consider

which after some simple algebra leads to

Note that the recursive definition (5) of implies so that non-negativity of is ensured. An alternative display of (4)–(5) is then

(6)
(7)

with and

Stationarity of implies and one obtains [5]

with , i.e. the second-order properties of an ARMA(1, 1) process.

We now suggest a similar generalization of the Poisson INAR(1) model which we call Poisson INARMA(1, 1). It is defined as with

(8)
(9)

where and . Again, all thinning operations are independent of each other and of . At each , , and are independent of all and, given , is independent of . This formulation parallels (6)–(7) as, using , it is easily seen that

However, (9) implies a dependence between the two thinnings and , entering into and , respectively, as they are forced to sum up to . Unlike in the INGARCH model111The INGARCH(1, 1) model, too, can be expressed with a discrete-valued process , just set in (8)–(9). Details are omitted due to space constraints. (6)–(7), is discrete-valued here (it can be shown to be an INAR(1) process with ). This is necessary to ensure well-definedness of and achieved by replacing the multiplications from (7) by binomial thinnings.

As in an INAR(1) model, the are marginally Poisson distributed under model (8)–(9), the rate being

(10)

The autocorrelation function is

(11)

where again . These properties are easy to show using the representation of as a binomially thinned INAR(1) process, see next section. Thus the new model, too, has the second-order properties222As mentioned in [5], Lemma 2, the fact that the autocovariance structure of coincides with that of a stationary ARMA(1, 1) process is sufficient for to be an ARMA(1, 1) process itself. of an ARMA(1, 1) process, justifying the name INARMA(1, 1). Note, however, that the formulation differs from other models referred to as INARMA in the literature (e.g. [7]). The INAR(1) model corresponds to the boundary case of the new class. In comparison to the INGARCH(1, 1) model with the same parameters the new model has lower dispersion and its autocorrelation function is damped if .

4 Alternative displays of INARMA(1, 1) and link to other models

The INARMA(1, 1) model can be interpreted as follows: is the number of fertile females in a population and is the (unobserved) number of juvenile, i.e. not yet fertile females.

is the number of fertile female immigrants (there is no immigration of juveniles). Females do not die before reaching fertility and at each time of their juvenile period have a probability of

to transition to the fertile state. They stay fertile for exactly one time period and can have at most one female offspring, the probability of which is . A graphical display of such a system can be found in Figure 1.

Figure 1: Interpretation of the INARMA(1, 1) process in the form of a flow diagram.

The time from a female’s birth to its fertile period obviously follows a geometric distribution with parameter

. We can use this to express the model as an INAR() model

(12)

with and dependent thinning operations given by

(13)
(14)
(15)

Here, is the number of female offspring born in and is the waiting time until fertility for the th of the females born at time . The definition (13)–(15) of the dependent thinnings implies that for under the constraint .

The representation (12)–(15) nicely illustrates the relationship of the INARMA(1, 1) model to other common models. Replacing the geometric waiting time distribution in (14) by a one-point distribution with yields the INAR(1) model while a categorical distribution with support gives the INAR() model by Alzaid and Al-Osh [2] (see [3] for details). Replacing the binomial offspring distribution in (13) by a Poisson distribution, i.e. setting

yields the INGARCH(1, 1) model. Due to space restrictions, we do not detail on this, but it is straightforward to show using the INARCH() representation of the INGARCH(1, 1) model ([12], p.76) and some basic properties of the Poisson distribution. The INARCH(1) and INARCH() models can be obtained by using again one-point and categorical waiting time distributions in (14).

We recently encountered INAR() models of type (12)–(15) in [3] where we extended work by Fernández-Fontelo et al [6] on underreported INAR models. We showed that is equivalent to a binomially thinned INAR(1) model given by

(16)
(17)

with and, as before, . This represents an interesting parallel to the Gaussian ARMA(1, 1) model which, as shown for instance in [9], can be obtained by adding homoscedastic measurement error to a Gaussian AR(1) process. This third representation of the process makes the derivation of equations (10)–(11) easy (see [6], Section 2 and Appendix A; our model corresponds to the special case of the class discussed there). Also, it implies that many properties of the INAR(1) process translate to the INARMA(1, 1) model, e.g. the marginal Poisson distribution and time-reversibility [8].

5 Inference

Inference for higher-order INAR models with dependent thinning operations is challenging as the likelihood is generally intractable [2]. For our model, however, we can exploit the representation (16)–(17) as a binomially thinned INAR(1) process. As described in Fernández-Fontelo et al [6], Section 3.2, the forward algorithm [14], a standard method for inference in hidden Markov models, can be applied to evaluate the likelihood of this model (again note that our model corresponds to the special case of the class treated in [6]). As the state space of the latent process is infinite, truncation at some reasonably large value is required. The maximum of the log-likelihood is then obtained by numerical optimization.

6 Case studies

We apply the four models from Sections 2 and 3 to two data sets. The first example consists of the gold particle counts from Westgren [13], a data set which is often used in the literature. For instance Weiß ([12], p.48) applies Poisson INAR(1), INAR(2) and CINAR(2) models to these data. To make our results comparable to these analyses we fit all models to observations 501–870 of Series (C). As a second example we use weekly counts of mumps in Bavaria, Germany, from week 1/2014 to week 52/2017 (downloaded from www.survstat.rki.de on 8 Oct 2018). Mumps, a viral disease, used to be a common childhood disease. Since the introduction of a vaccine in the 1950s it has become rare in Germany, but remains under routine surveillance. The data are displayed in Figure 2

. Both time series show slowly decaying autocorrelation functions, indicating that relaxing the AR(1) assumption may be beneficial. While the particle counts are approximately equidispersed (mean 1.55, variance 1.65) the mumps data show some overdispersion (mean 2.49, variance 3.93).

Figure 2: Case studies: gold particle counts and weekly numbers of reported mumps cases in Bavaria.

Table 1

shows parameter estimates and AIC values for the gold particle data. For comparison we added the results Weiß

[12] obtains for INAR(2) and CINAR(2) models (see there for details). To assess the out-of-sample predictive performance we computed mean log scores (, [4]) of one-step-ahead forecasts for the second half of the time series. For each of these forecasts the models were re-fitted to the data which were already available at the respective time point (“rolling forecast”). Note that the log score is negatively oriented, i.e. smaller values are better.

Model Parameter AIC
Poisson INAR(1) 0.73 0.53 1040 1.642
Poisson INAR(2) 0.54 0.47 0.18 1027 1.610
Poisson CINAR(2) 0.60 0.41 0.19 1027 1.611
Poisson INARCH(1) 0.75 0.52 0.00 1057 1.624

AIC
Poisson INARMA(1, 1) 0.31 0.67 0.80 1014 1.577
Poisson INGARCH(1, 1) 0.47 0.54 0.70 1.85 1047 1.592

Table 1: Parameter estimates, AIC values and mean log scores for the gold particle data. Mean log scores for INAR(2) and CINAR(2) were computed using code from the supplement of [12].

The INARMA(1, 1) model has the best in-sample and out-of-sample performance. Interestingly it also outperforms the two AR(2) models from [12], indicating that observations more than two time steps back still contain additional information.

The corresponding results for the mumps data can be found in Table 2. While the INARMA(1, 1) model again represents a considerable improvement compared to the INAR(1), the INGARCH(1, 1) model performs best. This is not surprising given the overdispersion in the data.

Model Parameter AIC
Poisson INAR(1) 2.21 0.11 842 2.017
Poisson INARCH(1) 2.08 0.15 9.01 832 2.010

AIC
Poisson INARMA(1, 1) 1.21 0.25 0.52 827 1.963
Poisson INGARCH(1, 1) 1.12 0.26 0.52 18.97 816 1.955

Table 2: Parameter estimates, AIC values and mean log scores for the mumps data.

7 Discussion

We suggested an INARMA(1, 1) model with Poisson marginals which draws conceptually from the INGARCH(1, 1) model [5] and the INAR() model by Al-Osh and Alzaid [2]. We provided an alternative representation in terms of an offspring distribution and a waiting time distribution which enlightens the close relation to several existing models from the literature. A third representation as a binomially thinned INAR(1) process turned out to be useful for inference and to obtain some stochastic properties. In our case studies the model performed favourably for equidispersed data. For overdispersed data it outperformed the INAR(1) model, but achieved lower performance than the INGARCH(1, 1) model. This raises the question how overdispersion could be accommodated in the model. Alternative immigration distributions could be considered, but the model would then no longer have a representation as a binomially thinned version of a Markov process. Obtaining its stochastic properties and evaluating the likelihood would get more involved. Another open question is how higher-order INARMA() models could be defined.


Data and code Data and R code are available at github.com/jbracher/inarma.


Acknowledgements The author thanks Christian H. Weiß for helpful feedback.

References

  • [1] M.A. Al-Osh and A.A. Alzaid. First-order integer-valued autoregressive (INAR(1)) process. J Time Ser Anal, 8(3):261–275, 1987.
  • [2] A.A. Alzaid and M.A. Al-Osh. An integer-valued pth-order autoregressive structure (INAR(p)) process. J Appl Probab, 27(2):314–324, 1990.
  • [3] J. Bracher.

    Comment on “Under-reported data analysis with INAR-hidden Markov chains”.

    Stat Med, 38(5):893–898, 2019.
  • [4] C. Czado, T. Gneiting, and L. Held. Predictive model assessment for count data. Biometrics, 65(4):1254–1261, 2009.
  • [5] R. Ferland, R. Latour, and D. Oraichi. Integer-valued GARCH process. J Time Ser Anal, 27(6):923–942, 2006.
  • [6] A. Fernández-Fontelo, A. Cabaña, P. Puig, and D. Moriña. Under-reported data analysis with INAR-hidden Markov chains. Stat Med, 35(26):4875–4890, 2016.
  • [7] P. Neal and T. Subba Rao. MCMC for integer-valued ARMA processes. J Time Ser Anal, 28(1):92–110, 2006.
  • [8] S. Schweer. On the time-reversibility of integer-valued autoregressive processes of general order. In A. Steland, E. Rafajłowicz, and K. Szajowski, editors, Stochastic Models, Statistics and Their Applications, pages 169–177, Cham, 2015. Springer.
  • [9] J. Staudenmayer and J.P. Buonaccorsi.

    Measurement error in linear autoregressive models.

    J Am Stat Assoc, 100(471):841–852, 2005.
  • [10] C.H. Weiß. The INARCH(1) model for overdispersed time series of counts. Commun Stat A-Theor, 39(6):1269–1291, 2010.
  • [11] C.H. Weiß. A Poisson INAR(1) model with serially dependent innovations. Metrika, 78(7):829–851, 2015.
  • [12] C.H. Weiß. An Introduction to Discrete-Valued Time Series. Wiley, Hoboken, NJ, 2018.
  • [13] A. Westgren. Die Veränderungsgeschwindigkeit der lokalen Teilchenkonzentration in kolloiden Systemen. Arkiv för Matematik, Astronomi och Fysik, 11(14):1–24, 1916.
  • [14] W. Zucchini and I. MacDonald. Hidden Markov Models for Time Series. Chapman and Hall/CRC, New York, NY, 2009.