An important aspect of financial high-frequency data analysis is the modeling of durations between events. This includes the modeling of the recording of transactions (trade durations), price changes by a given level (price durations) and volume reaches by a given level (volume durations). Financial durations exhibit strong serial correlation, i.e. long durations are usually followed by long durations and short durations are followed by short durations. To capture this time dependence, Engle and Russell (1998) proposed the autoregressive conditional duration (ACD) model. The ACD model is analogous to the GARCH volatility model and enjoys similar popularity in the financial durations field. For the survey of duration analysis, see Pacurar (2008), Bauwens and Hautsch (2009), Hautsch (2011) and Saranjeet and Ramanathan (2018).
Traditional duration models are based on continuous distributions. Table 1 reviews continuous distributions used in the ACD literature. The ACD specification is traditionally based on a time-varying mean with some additional constant shape parameters. The data is, however, inherently discrete. This is also the case for financial durations, whether they are recorded with a precision of seconds or milliseconds. Discreteness of real data is the first motivation of our paper. Generally, there are three ways of dealing with discrete values of observed variables.
The first approach considers random variables with a continuous distribution and ignores the discreteness of the data. This is a valid approach, and often the best solution, when data are recorded with a high precision (e.g. durations with millisecond precision). However, if the precision is low (e.g. durations with second precision), the bias in estimators increases and the size of hypothesis tests is distorted (seeSchneeweiss et al., 2010). Tricker (1984) and Taraldsen (2011)
explore the effects of rounding on the exponential distribution whileTricker (1992)
deals with the gamma distribution. In autoregressive processes, the rounding errors can further accumulate making continuous models unreliable (seeZhang et al., 2010 and Li and Bai, 2011).
The second approach considers random variables with a continuous distribution and takes into account the partial identification and interval uncertainty of the observations caused by rounding or grouping (see Manski, 2003). In financial volatility analysis, discrete values of prices are often (among other effects) captured by the market microstructure noise (see Hansen and Lunde, 2006). To our knowledge, Grimshaw et al. (2005) is the only paper addressing the issue of rounding in financial durations analysis. They found that ignoring the discreteness of data leads to a distortion of time-dependence tests in financial durations.
The third approach considers random variables with discrete distribution. In financial analysis, prices are directly modeled by discrete distributions; see e.g. Russell and Engle (2005) and Koopman et al. (2015). Kabasinskas et al. (2012) use discrete distributions to count zero changes in prices. In our paper, we follow the discrete approach to financial durations and utilize time series models of counts.
There are many trade durations that are exactly zero or very close to zero. Zero durations can be caused by split transactions, i.e. large trades broken into two or more smaller trades. Veredas et al. (2002) offer another explanation as they notice that many simultaneous transactions occur at round prices suggesting many traders post limit orders to be executed at round prices. Zero durations can as well just be independent transactions executed at very similar times and originating from different sources. Whatever the reason for zero durations, ignoring them can cause problems in estimation as many widely used distributions have strictly positive support and zero values have therefore zero density. Liu et al. (2018) examine the effect of zero durations on integrated volatility estimation. The presence of zero durations is the second motivation of our paper. The literature suggests several different ways of dealing with zero durations.
The most common approach dating back to Engle and Russell (1998) is to discard zero durations. Specifically, observations with the same timestamp are merged together with the resulting price calculated as an average of prices weighted by volume. This helps with estimation, but the distribution of durations is distorted as zero-durations that are just independent transactions executed at similar times should be kept in the dataset.
Instead of discarding, Bauwens (2006) set zero durations to a small given value. Again, this helps with estimation but the distribution of durations is distorted as zero-durations that correspond to split transactions should be omitted from the dataset.
The information about zero durations can also be utilized in a model. Zhang et al. (2001) include an indicator of multiple transactions as an explanatory variable in their regression model.
Another way of incorporating zero durations in a model is to directly include excessive zero values in the underlying distribution. For continuous distributions, zero-augmented models proposed by Hautsch et al. (2014) can be used.111The use of zero-augmented models was also suggested by Prof. T. V. Ramanathan during the 3rd Conference and Workshop on Statistical Methods in Finance, Chennai, December 16–19, 2017.. However, in high-precision data, there are no exact zero values but rather very small positive values, many of which should be considered as zeros. Grammig and Wellner (2002) suggest to treat successive trades with either non-increasing or non-decreasing prices within one second as one large trade (i.e. as zero durations). The issue with this approach is that these successive trades can as well be independent and originate from different sources. Therefore, it is an uneasy task to identify whether close-to-zero durations indicate actual split transactions.
It is more convenient to model zero durations in a discrete framework. When the values are grouped, zero durations corresponding to split transactions manifest themselves as an excessive probability of the group containing zero values. For discrete distributions, a zero-inflated extension ofLambert (1992) can be used. This is the approach we suggest in this paper.
Given the discussion above, we propose in this paper a new zero-inflated autoregressive conditional duration (ZIACD) model. We directly take into account a discreteness of durations and utilize the negative binomial distribution to accommodate for overdispersion in durations (see Boswell and Patil, 1970; Cameron and Trivedi, 1986; Christou and Fokianos, 2014). The excessive zero durations caused by split transactions are captured by the zero-inflated modification of the negative binomial distribution (see Greene, 1994). The time-varying location parameter follows the specification of general autoregressive score (GAS) models, also known as dynamic conditional score models (see Creal et al., 2008, 2013; Harvey, 2013). In the GAS framework, time-varying parameters are dependent on their lagged values and a scaled score of the conditional observation density. GAS models belong to the class of observation-driven models (Cox 1981). Koopman et al. (2016) find that observation-driven models based on the score perform comparably to parameter-driven models in terms of predictive accuracy. Observation-driven models (including the GAS model) can be estimated in a straightforward manner by the maximum likelihood method. In this paper, we establish the invertibility of the GAS filter for the ZIACD model and the consistency and asymptotic normality of the maximum likelihood estimator.
In an empirical study, we analyze 30 stocks that form the Dow Jones Industrial Average (DJIA) index with values of trade durations rounded down to seconds. We compare the Poisson, geometric and negative binomial distributions together with their zero-inflated modifications. We find that the proposed ZIACD model is a good fit as it captures both overdispersion and excessive zero values. The portion of zeros caused by split transactions ranges from 37% up to 90% depending on the stock with the average of 63%.
We also compare the proposed ZIACD model with continuous models based on the exponential, Weibull, gamma and generalized gamma distributions. In a simulation study, we find that when data are rounded, the estimates of the continuous model are biased while the proper use of the discrete model identifies true parameters. Furthermore, our empirical duration data has very high precision and as we round them to seconds for the discrete model, we lose some information. The use of the continuous approach, however, also causes a loss of information as close-to-zero durations need to be removed or set to a given threshold value for estimation purposes. We find that the loss of decimals is significantly less severe than the loss of zeros imposed by the continuous approach. Finally, we find that the proposed ZIACD model outperforms the continuous models in terms of predictive accuracy.
The rest of the paper is structured as follows. In Section 2, we propose the ZIACD model based on the zero-inflated negative binomial distribution with time-varying location parameter and prove its asymptotic properties. In Section 3, we describe characteristics of financial durations data and fit the proposed model within a discrete framework. In Section 4, we compare the proposed discrete model with continuous models. We conclude the paper in Section 5.
|Engle and Russell (1998)||Exponential||Mean||0|
|Engle and Russell (1998)||Weibull||Mean||1|
|Lunde (1999)||Generalized gamma||Mean||2|
|Grammig and Maurer (2000)||Burr||Mean||2|
|Hautsch (2001)||Generalized F||Mean||3|
|Leiva et al. (2014)||Power-exponential B-S||Median||2|
|Leiva et al. (2014)||Student’s t B-S||Median||2|
|Zheng et al. (2016)||Fréchet||Mean||1|
2 Discrete Duration Model
Let be random variables denoting times of transactions. Trade durations are then defined as for . As we operate in a discrete framework, we assume , and , . We further assume trade durations to follow some given discrete distribution with conditional probability mass function , where are observations, are time-varying parameters for and are static parameters. First, we consider trade durations to follow the negative binomial distribution. Next, we extend the negative binomial distribution to capture excessive zeros using the zero-inflated model. For time-varying parameters, we use the generalized autoregressive score model. The model utilizes the score for time-varying parameters defined as
and the Fisher information for time-varying parameters defined as
Note, that the latter equality requires some regularity conditions (Lehmann and Casella, 1998).
2.1 Negative Binomial Distribution
Non-negative integer variables are commonly analyzed using count data models based on specific underlying distribution, most notably the Poisson distribution and the negative binomial distribution (seeCameron and Trivedi, 2013
). A distinctive feature of the Poisson distribution is that its expected value is equal to its variance. This characteristic is too strict in many applications as count data often exhibit overdispersion, a higher variance than the expected value. A generalization of the Poisson distribution overcoming this limitation is the negative binomial distribution with one parameter determining its expected value and another parameter determining its excess dispersion.
The negative binomial distribution can be derived in many ways (see Boswell and Patil, 1970). We use the NB2 parameterization of Cameron and Trivedi (1986) derived from the Poisson-gamma mixture distribution. It is the most common parametrization used in the negative binomial regression according to Cameron and Trivedi (2013). We consider the location parameter to be time-varying, i.e. , while the dispersion parameter is static. The probability mass function is
The expected value and variance is
The score for the parameter is
The Fisher information for the parameter is
Special cases of the negative binomial distribution include the Poisson distribution for
and the geometric distribution for.
2.2 Zero-Inflated Distribution
The zero-inflated distribution is an extension of a discrete distribution allowing the probability of zero values to be higher than the probability given by the original distribution. In the zero-inflated distribution, values are generated by two components – one component generates only zero values while the other component generates integer values (including zero values) according to the original distribution. Lambert (1992) proposed the zero-inflated Poisson model and Greene (1994) used zero-inflated model for the negative binomial distribution.
The zero-inflated negative binomial distribution is a discrete distribution with three parameters. We consider the location parameter to be time-varying, while the dispersion parameter and the probability of excessive zero values are static, i.e. and . The variable follows the zero-inflated negative binomial distribution if
The first process generates only zeros and corresponds to split transactions, while the second process generates values from the negative binomial distribution and corresponds to regular transactions. The probability mass function is
The expected value and variance is
The score for the parameter is
The Fisher information for the parameter is
Special cases of the zero-inflated negative binomial distribution include the the negative binomial distribution for , zero-inflated Poisson distribution for and the zero-inflated geometric distribution for .
2.3 Generalized Autoregressive Score Dynamics
Generalized autoregressive score (GAS) models (Creal et al., 2008, 2013), also known as dynamic conditional score models (Harvey, 2013), capture dynamics of time-varying parameters by the autoregressive term and the scaled score of the conditional observation density (or the conditional observation probability mass function in the case of discrete distribution). The time-varying parameters follow the recursion
where are the constant parameters, are the autoregressive parameters, are the score parameters, is the scaling function for the score and is the score. As the scaling function, we consider
unit scaling, i.e. ,
square root of inverse of the Fisher information scaling, i.e. ,
inverse of the Fisher information scaling, i.e. .
Note that each scaling function results in a different GAS model. The long-term mean and unconditional value of the time-varying parameters is . The parameters in (12) are assumed to be unbounded. However, some distributions require bounded parameters (e.g. variance greater than zero). The standard solution in the GAS framework is to use an unbounded parametrization , which follows the GAS recursion instead of the original parametrization , i.e.
where are the constant parameters, are the autoregressive parameters, are the score parameters, is the reparametrized scaling function for the score and is the reparametrized score. The reparametrized score equals to
while the Fisher information of the reparametrized model equals to
where is the derivation of .
The GAS specification includes many commonly used econometric models. For example, the GAS model with the normal distribution, the inverse of the Fisher information scaling and time-varying variance results in the GARCH model while the GAS model with the exponential distribution, the inverse of the Fisher information scaling and time-varying expected value results in the ACD model(Creal et al., 2013). The GAS framework can be utilized for discrete models as well. Koopman et al. (2015) used discrete copulas based on the Skellam distribution for high-frequency stock price changes. Koopman and Lit (2017) used the bivariate Poisson distribution for a number of goals in football matches and the Skellam distribution for a score difference. Gorgi (2018) used the Poisson distribution as well as the negative binomial distribution for offensive conduct reports.
2.4 Zero-Inflated Autoregressive Conditional Duration Model
In our model, we consider observations to follow the zero-inflated negative binomial distribution with the time-varying parameter and static parameters specified in (8). We use a reparametrization with the exponential link for the location parameter . Parameter then follow recursion
where is the constant parameter, is the autoregressive parameter, is the score parameter and is the scaled score. Note that both the scaling function and the score are with respect to the reparametrization , which can be obtained from (14) and (15). The long-term mean and unconditional value of is then and in the original restricted parametrization.
In the rest of the paper, we focus on the unit scaling . In Section 3.4, we compare the unit scaling with the square root of inverse of the Fisher information scaling and the inverse of the Fisher information scaling and show that differences between estimated coefficients are negligible. The scaled score for the zero-inflated negative binomial distribution with the unit scaling is given by
2.5 Estimation and Asymptotic Properties
Let us denote
the static parameter vector which defines the dynamics of the GAS model proposed in (16). The static parameter vector is estimated by the method of maximum likelihood
where denotes the log likelihood function. The log likelihood is obtained from a sequence of observations , which depends on the filtered time-varying parameter , and is given by
In our case, the log likelihood is based on the zero-inflated negative binomial distribution
Below, we show that the maximum likelihood estimator of the ZIACD model is consistent and asymptotically normal. The proof follows the structure laid down in Blasques et al. (2014), but we focus in the particular case of discrete data with a probability mass function . In contrast, Blasques et al. (2014)
treat a general case for continuous data with a smooth probability density function.
Filter invertibility is crucial for statistical inference in the context of observation-driven time-varying parameter models; see e.g. Straumann and Mikosch (2006), Wintenberger (2013), and Blasques et al. (2014). The filter initialized at some point is said to be invertible if converges almost surely exponentially fast to a unique limit strictly stationary and ergodic sequence ,
Let denote the log likelihood which depends on the limit time-varying parameter
and let denote the limit log likelihood function
Proposition 1 appeals to the results in Blasques et al. (2014) to establishes the invertibility of the score filter. The proof is presented in A. In Example 1, we illustrate how the invertibility can be verified in the current context. Below, we let denote the conditional expectation .
Proposition 1 (Filter invertibility).
Let the observed data be strictly stationary and ergodic and let be a compact set which ensures that
Then the filter defined as is invertible, uniformly in .
Consider the case of the score model for the zero-inflated negative binomial distribution with the unit scaling. We note that the conditions of Proposition 1 are easily satisfied for strictly stationary data with a logarithmic moment
with a logarithmic moment, and for a compact parameter space
We note that condition (i) of Proposition 1 holds since
which holds as the parameter vector lies on the compact set , and is a given point in . Condition (ii) of Proposition 1 holds as
since has a logarithmic moment, is compact and . Finally, the contraction condition (iii) in Proposition 1 is satisfied uniformly in since
This can be simplified by noting that
This, in turn, implies that
Proposition 1 gives us sufficient elements to characterize the asymptotic behavior of the ML estimator. Theorem 1 below establishes the strong consistency of the ML estimator as the sample size diverges to infinity. The proof is presented in A and is based on the theory laid down in Blasques et al. (2014). The proof relies on the shape of the log likelihood function for the zero inflated negative binomial model. Theorem 1 uses the invertibility properties established in Proposition 1 for our zero-inflated negative binomial score model, and obtains the consistency of the ML estimator by imposing some additional moment conditions. The moment conditions in Theorem 1 are written as high-level conditions that apply to most ML estimator settings. These include a bounded moment for the log likelihood and a logarithmic moment for the score . The high-level formulation of these assumptions gives us flexibility in applying these results to a wide range of designs of our score model. However, it can also be unfortunately abstract. Luckily, in Example 2 below, we note that the moment assumptions are directly implied by a single moment bound on the data . The derivations in this example also make clear that the same result applies to many formulations of the score model for the zero-inflated negative binomial distribution.
Theorem 1 (Consistency of the ML estimator).
Let the conditions of Proposition 1 hold, the likelihood have one bounded moment and the score have a logarithmic moment,
Finally, suppose be the unique maximizer of the limit log likelihood function over the parameter space ; i.e. . Then as .
Consider again the score model for the zero-inflated negative binomial distribution with the unit scaling. The bounded moment for the log likelihood stated in Theorem 1 holds trivially if the data has a bounded moment . This follows directly from the fact that is bounded in and bounded by a linear function in ,
Note that since we use unit scaling in Theorem 1, we have that .
Finally, Theorem 2 establishes the -consistency rate of and the asymptotic normality of the standardized estimator as . We follow Blasques et al. (2014) closely, but formulate somewhat higher-level assumptions that allow us to be more concise than the primitive assumptions explored in Blasques et al. (2014). The proof is presented in A. After Theorem 2, we use an example to illustrate how these conditions can be verified in the current context.
Theorem 2 (Asymptotic normality of the ML estimator).
Let the conditions of Theorem 1 hold. Furthermore, let the zero-inflated negative binomial score model be correctly specified and . Additionally, assume that
the first-order derivatives of the log likelihood have four bounded moments at ,
the second-order derivatives of the log likelihood have one uniform bounded moment,
the third-order derivatives of the log likelihood have a uniform logarithmic bounded moment,
the first and second derivatives of the filtering process converge almost surely, exponentially fast, to a limit stationary and ergodic sequence,
with four bounded moments
Then the estimator is asymptotically Gaussian