Minimum reversion in multivariate time series

11/26/2018 ∙ by Torsten Kleinow, et al. ∙ University of Amsterdam Heriot-Watt University 0

We propose a new multivariate time series model in which we assume that each component has a tendency to revert to the minimum of all components. Such a specification is useful to describe phenomena where each member in a population which is subjected to random noise mimics the behaviour of the best performing member. We show that the proposed dynamics generate co-integrated processes.We characterize the model's asymptotic properties for the case of two populations and show a stabilizing effect on long term dynamics in simulation studies. An empirical study involving human survival data in different countries provides an example which confirms the occurrence of the phenomenon of reversion to the minimum in real data.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

When multivariate time series are used to describe the joint dynamics of stochastic processes, often an a priori assumption of a stabilizing mechanism is made. We expect, for example, that many economic variables will fluctuate randomly over time, but we do not find it plausible that they will diverge while doing so. This is because certain long-term equilibrium relationships are assumed to be present between such variables, despite the volatility we see in them over short time horizons. Many economic examples are now known (see for example Engle & Granger (1987)) and other fields in which such co-integration relationships are found include neuroscience (Østergaard et al. (2017)) and gene differentiation in populations (Hössjer & Ryman (2014)).

A possible model feature that implements such a stabilizing mechanism in multivariate time series is mean reversion: the tendency of all components to drift towards a constant, which has the interpretation of the long term average of the series. Adding a mean reversion term to all increments of a multivariate discrete random walk makes the series second-order stationary. This also guarantees that the distance between any two components will be stationary.

In this paper we propose a different mechanism: instead of making all components tend to a priori chosen constants, we impose that at every time step they move, in expectation, towards the minimal value among all components. When different components in the time series signify the value of a certain common variable among different groups, such an assumption can be interpreted as all groups mimicking the behaviour of the group which currently has the ’best’ value111We consider reversion to the minimum in this paper and only consider biometric variables for which low values are deemed to be best. One could of course also apply our analysis in cases where there is a drift towards the maximum. Our choice here is motivated by the particular empirical example we give in the last section, where we consider multivariate time series for human mortality. .

If knowledge about what is beneficial or detrimental for a group to minimize a certain quantitative indicator is communicated between different groups, and each group is capable of using this knowledge to their benefit, the best practices are implemented in all groups over time. The overall effect would be that the group which has achieved the best (i.e. the minimal) value would be ’followed’ over time by those with worse achievements, as they ’learn’ or ’mimic’ what is beneficial. We will show that this effect will lead to a co-integration relationship between groups but also to an additional downward drift that no group would achieve on its own or if the minimum reversion effect would not be present. In that sense, learning from the best performer generates improvements for all that would not occur if individual groups were left to their own devices.

This hypothesis of ’reversion to the minimum’ in multivariate time series can be tested by specifying models with and without such an effect and applying the Bayesian Information Criterion to compare the goodness of fit. In this paper we provide such specifications, analyse the properties of models where it is present, and give an empirical example of a specific time series where clear evidence for this effect is found.

There are a number of models in the literature that are related to the one we propose in this paper. Systems of diffusion processes have been proposed in which the drift and diffusion coefficients of an individual process depend on the rank of that process within the system. For example, Fernholz (2002) and Banner et al. (2005) introduce the Atlas model to describe the market capitalization of firms in equity markets, and Ichiba et al. (2013) apply their model to define optimal investment strategies. The Atlas model is applied by Sartoretti & Hongler (2013) to describe the dynamics of a swarm, and Ruzmaikina & Aizenman (2005) and Shkolnikov (2009) consider the evolution of competing particle systems in discrete time and study the distribution of the gaps between the particles. Balázsa et al. (2014) introduced a continuous time Markov jump model for interacting particles with a jump rate which depends linearly on the distance of the particle from the center of mass of the whole group.

In contrast to the models proposed in the literature, we consider a system of discrete time processes under the assumption that the distribution of each component’s stochastic increments does not depend on the rank of the component but on its distance to the minimum component. This feature leads to multivariate time series with short term volatility and long term equilibrium relationships, while it gives a clear interpretation of the presence of the stabilizing mechanism in the long run.

The remainder of the paper is organised as follows. In section 2 we introduce our model and analyse some of its theoretical properties. Section 3 provides a study in which we provide empirical evidence that a ’minimum reversion’ effect can be found in human mortality data. Finally, we provide some conclusions and suggestions for further research in Section 4.

2 Minimum Reversion Model

2.1 Specification

Let for certain with and let be a multivariate time series in with components with for a given . We denote the minimum value among the different components at time by


and specify the following model for the dynamics of the components with :


The , , and are given constants. We define and assume that the are independent and identically distributed

-dimensional random variables with a multivariate Gaussian distribution. More precisely, we assume that


for independent standard Gaussian variables , which creates a correlation structure parameterized by the constants .

The constant drift , the first order autoregression coefficient for the differenced series and the Gaussian increment are standard features in time series modelling. Our innovation concerns the parameters which quantify the ’learning effect’. To discuss the properties of the newly proposed term in (2) we first consider the model with and . This introduces a downward drift for when is not the component with the lowest current value.

The specification in (2) with and also creates a downward trend in the minimum process defined in (1). To form some intuition for this result we consider the case where for all , that is, the are independent for any fixed . In that case we have

Substituting gives an expression for and since for all and for at least one

, this probability must be strictly greater than

and smaller than or equal to . This establishes that the probability of a downward movement is always greater than and that it is largest when all components attain the same value, that is, for all . The probability is increasing with the number of components .

Individual components are thus not stationary but they turn out to be co-integrated, as the following result shows.

Proposition 1.

If all processes in (2) have a common minimum reversion parameter and a common drift and there is no autoregressive term, so , and for all , then the processes are co-integrated.


Fix a and define for any . We then find for any

Since we obtain that is a stationary AR(1) process for all .

Furthermore, we find that and therefore

The first term in the last expression is a minimum over stationary processes and the other terms are stationary too, hence is stationary. ∎

2.2 A analysis of the two dimensional case

To characterize the dynamics of minimum reversion, and to show that it induces a downward drift even for the case where the drift parameters are taken to be zero, we look at the simplest non-trivial case in two dimensions (. This means the dynamics of the time series become:


Note that we take and for the analysis in this section and we assume that the correlation between the i.i.d. Gaussian variables and equals for all .

We define222If we let and , but this event has zero probability for all .

Proposition 2.

For the model (4)-(5) with we have that




with .


Let the indices of the minimum and maximum be denoted by and . We first determine the distribution of the minimum and maximum333Since the variables are Gaussian this distribution can be obtained directly as well, see Roberts (1966).. Substituting in (4)-(5) gives, for


We find

The term is independent of

and has a Gaussian distribution with mean zero and variance

so for all . We thus find that equals444Here and in the sequel we use to denote the cumulative probability function for a standard Gaussian random variable.


if we define the shorthand notation and . The function increases monotonically from its minimum for towards zero for . Likewise,

Since for a

with the standard normal distribution, we must have that


Let , and define , then we have for and that which shows that the process satisfies the geometric ergodicity conditions in Theorem 15.0.1 of Meyn et al. (2009) on the domain . This proves that for any choice of the distribution function of converges to a stationary distribution . This implies that for the function should satisfy

The solution to this equation is with the parameter , i.e. the stationary distribution of is the distribution of if . The stationary distribution of is therefore the distribution of , which gives (7).

Due to (8), the expected change in under the stationary distribution for equals plus the effect of the reversion to the minimum. This effect was shown above to be which gives the second equality in (6) after some rewriting. The first one follows from the fact that converges to a stationary distribution. ∎

If we choose certain parameter values, the asymptotic drifts in proposition 2

can also be estimated by Monte Carlo simulations. We generated

paths to approximate the stationary distributions and used these to estimate the limit of the expectation of the extra drift (Table 1) and the limit of the expectation of the difference between the minimum and maximum (Table 2) for , and in the absence of correlation between the increments for the two time series. Simulations also allow us to estimate the extra drift generated by the minimum reversion term in more than two dimensions, i.e. for and we show some examples Tables 1 and 2. As expected, we find that both the drift generated by the minimum reversion and the difference between the minimum and maximum increases with the population size . The former increases and the latter decreases when the strength of the reversion to the minimum, which is determined by the parameter , increases.

Exact Simulation
0.0125 -0.0447 -0.0448 -0.0671 -0.0817 -0.1129 -0.1401 -0.1641
0.025 -0.0635 -0.0635 -0.0952 -0.1158 -0.1602 -0.1987 -0.2329
0.05 -0.0903 -0.0903 -0.1355 -0.1649 -0.2280 -0.2828 -0.3314
0.1 -0.1294 -0.1294 -0.1942 -0.2362 -0.3266 -0.4052 -0.4748
0.2 -0.1881 -0.1881 -0.2821 -0.3432 -0.4745 -0.5887 -0.6899
0.4 -0.2821 -0.2821 -0.4231 -0.5147 -0.7118 -0.8830 -1.0348
Table 1: Generated drift for different minimum reversion parameters according to (6) (second column) and using simulations (third to last column).
Exact Simulation
0.0125 7.1589 7.1593 10.7386 13.0627 18.0644 22.4106 26.2618
0.025 5.0781 5.0774 7.6185 9.2656 12.8138 15.8962 18.6282
0.05 3.6137 3.6132 5.4202 6.5931 9.1178 11.3112 13.2565
0.1 2.5887 2.5886 3.8831 4.7229 6.5316 8.1031 9.4965
0.2 1.8806 1.8806 2.8209 3.4313 4.7454 5.8866 6.8988
0.4 1.4105 1.4104 2.1156 2.5735 3.5591 4.415 5.1742
Table 2: Expectation of the stationary distribution for for different minimum reversion parameters according to (7) (second column) and using simulations (third to last column).

3 Evidence for a Learning Effect in Mortality Rates

In this section we use the time series model in (2) to model mortality rates in several countries. The implicit assumption is that a population with the high mortality rate ”copies” the behaviour of individuals in the population with low mortality. Such a model is consistent with the spread of medical advances and changes in behaviour like a reduction in smoking prevalence.

3.1 Modelling Mortality - The Common Age Effect Model

To obtain the time series, we first fit a stochastic mortality model to observed death counts. We assume that the number of deaths, , in population at age in calendar year

is a random variable with a Poisson distribution, that is,


where is the hazard rate (also known as the ”force of mortality” in the literature) and refers to the central exposure to risk.

We then use a stochastic model for the force of mortality that incorporates the population-specific time series defined in (2). As we are modelling the mortality in multiple populations simultaneously and wish to make our model suitable for a wide age range, we use a modification of the Lee-Carter model (Lee & Carter (1992)) with common age effects, as suggested by Kleinow (2015):


where the age effects and do not depend on the population . Having age effects that are common to all populations ensures that the individual components

(known as ”period effects”) are comparable across populations as they are rescaled by a common vector

and shifted by a common vector . The parameters in (10) are not identifiable since

for any real numbers and . To identify a unique set of parameters, we impose the following constraints on the parameter vectors and :

for a fixed reference age . Applying those two constraints means that for in every population . In other words, we can interpret the period effect as the fitted log mortality rate at the reference age in population . For our empirical analysis we choose the reference age .

We estimate the age effects and and the period effects using the maximum likelihood method based on the Poisson model in (9) for observed deaths counts and exposure values .

Figure 1: Estimated values of (left) and (right) for the mortality rates of females based on data for the period 1921–2011. We can clearly see the phenomenon known as ”age heaping”: when people report the death of a person without knowing their exact age, they often report the age to be a multiple of .

3.2 Data

The empirical death counts and exposure data have been obtained from the Human Mortality Database (HMD), see HMD (2018). We consider data for ages and for two ranges of calendar years: or . Table 3 shows a list of countries included in our study together with the HMD country codes. As indicated in the table, there are some countries for which data from 1921 are not available and those countries have been excluded from our empirical study for that range or years.

Country HMD Code 1921–2011 Country HMD Code 1921–2011
The Netherlands NLD Sweden SWE
Denmark DNK Belgium BEL
Finland FIN England & Wales GBRTENW
France FRATNP Switzerland CHE
Australia AUS Italy ITA
Austria AUT Ireland IRL
Norway NOR Japan JAP
Canada CAN New Zealand NZL_NP
Portugal PRT Spain ESP
Table 3: List of countries included in our study. For all countries mortality data for males and females are considered for calendar years 1951–2011. The table also indicates which countries are included in our study for the longer observation period 1921–2011. Note that for France and England & Wales the total populations are considered rather than the civilian population.

3.3 Empirical Results

Figure 1 shows the estimated values of the age effects and obtained from data for females based on the observation period 1921–2011. The age effects for the other data sets are not shown since they all have a very similar shape. To illustrate our data and the effect of the minimum reversion effect, Figure 2 shows the estimated values of for females for the years 1921–2011 in the dataset, together with two projected scenarios based on the model in (2). On the lefthand side the projection includes the minimum reversion effect based on the estimated parameter and on the righthand side we set .

To investigate the significance of the learning effect, we compare the Bayesian Information Criterion (BIC) for models without minimum reversion () with the more general model in (2). If denotes the number of parameters and is the total number of observations across all populations and all years, the BIC value equals , where is the maximum value of the likelihood function for the model in (2). This means

where is a constant which does not depend on the parameters that must be estimated, and are vectors with components

and is the covariance matrix of so for and .

We determined the BIC values for different specifications in which parameter values may or may not be constrained to be the same for all different groups in . Based on this analysis, the parameters and are taken population-specific while the parameters , and are common to all populations. The obtained BIC values and parameter estimates555We do not show the values and for every group in but only report , the average of over all groups , and , which is defined by the requirement that is the average of over all groups . Estimated parameter values for all countries, genders and data periods are available upon request. are shown in Table 4 for four data sets.

Model BIC
Females in calendar years 1921–2011
3886.16 30 -7558.49 -0.0224 -0.2998 0.0338 0.4996
3890.61 31 -7560.27 -0.0194 0.0191 -0.2895 0.0336 0.4977
Males in calendar years 1921–2011
4151.49 30 -8089.15 -0.0150 -0.1890 0.0297 0.5207
4155.65 31 -8090.35 -0.0131 0.0185 -0.1804 0.0295 0.5224
Females in calendar years 1951–2011
3854.23 42 -7411.38 -0.0210 -0.3414 0.0288 0.4717
3861.62 43 -7419.08 -0.0161 0.0177 -0.3394 0.0285 0.4641
Males in calendar years 1951–2011
4100.51 42 -7903.94 -0.0246 -0.3320 0.0252 0.5845
4105.98 43 -7907.81 -0.0192 0.0167 -0.3219 0.0250 0.5832
Table 4: Goodness of fit and parameter values of the model in (2) without and with minimum reversion, estimated for different data sets.

We notice that the BIC values always improve when is not restricted to be zero which shows that the learning effect adds to the goodness of fit of the model. We also observe that the estimated drift parameter is reduced when minimum reversion is included in the model, from which we conclude that some of the mortality improvements in the populations are driven by learning effects from others who have lower mortality rates.

4 Conclusions and Further Research

Figure 2: Estimated values of for the mortality rates of females based on data for the period 1921–2011. The simulated scenarios are based on the full model in (2) (left) and the nested model with (right).

We have shown that model specifications that include a minimum reversion term resulted in better BIC values than specifications without such a term when they were fitted to several datasets from the human mortality database. Visual inspection of the projections generated by the time series with minimum reversion in 2 also shows a clear improvement. This testifies to the usefulness of the incorporation of a ”learning effect” in the time series.

Several extensions of the proposed model are possible. In particular, the minimum in (2) could be replaced by other rank statistics. Another possible direction of research is the study of a continuous time version of minimum reversion, where the drift term of a diffusion process in multiple dimensions is a function of the distance of the process to the minimum of its components. The properties of these processes could then be compared to the rank-based diffusion processes for particle systems mentioned in the Introduction of this paper.


  • Balázsa et al. (2014) Balázsa, M., Ráczb, M. Z. & Tóth, B. (2014). Modeling flocks and prices: Jumping particles with an attractive interaction. Annales de l’Institut Henri Poincaré - Probabilités et Statistiques 50, 425–454.
  • Banner et al. (2005) Banner, A. D., Fernholz, R. & Karatzas, I. (2005). Atlas Model of Equity Markets. The Annals of Applied Probability 15, 2296–2330.
  • Engle & Granger (1987) Engle, R. F. & Granger, C. W. J. (1987). Co-integration and error correction: Representation, estimation, and testing. Econometrica 55, 251–276.
  • Fernholz (2002) Fernholz, E. R. (2002). Stochastic Portfolio Theory. Springer.
  • HMD (2018) HMD (2018). Human Mortality Database: University of California, Berkeley (USA), and Max Planck Institute for Demographic Research (Germany). Available at or (data downloaded on 30 April 2018).
  • Hössjer & Ryman (2014) Hössjer, O. & Ryman, N. (2014). Quasi equilibrium, variance effective size and fixation index for populations with substructure. Journal of Mathematical Biology 69, 1057–1128.
  • Ichiba et al. (2013) Ichiba, T., Pal, S. & Shkolnikov, M. (2013). Convergence rates for rank-based models with applications to portfolio theory. Probab. Theory Relat. Fields 156, 415–448.
  • Kleinow (2015) Kleinow, T. (2015). A common age effect model for the mortality of multiple populations. Insurance: Mathematics and Economics 63, 147 – 152. Special Issue: Longevity Nine - the Ninth International Longevity Risk and Capital Markets Solutions Conference.
  • Lee & Carter (1992) Lee, R. D. & Carter, L. R. (1992). Modeling and Forecasting U.S. Mortality. Journal of the American Statistical Association 87, 659–675.
  • Meyn et al. (2009) Meyn, S., Tweedie, R. L. & Glynn, P. W. (2009). Markov Chains and Stochastic Stability. Cambridge Mathematical Library. Cambridge University Press, 2nd ed.
  • Østergaard et al. (2017) Østergaard, J., Rahbek, A. & Ditlevsen, S. (2017). Oscillating systems with cointegrated phase processes. Journal of Mathematical Biology 75, 845–883.
  • Roberts (1966) Roberts, C. (1966). A correlation model useful in the study of twins. Journal of the American Statistical Association 61, 1184–1190.
  • Ruzmaikina & Aizenman (2005) Ruzmaikina, A. & Aizenman, M. (2005). Characterization of invariant measures at the leading edge for competing particle systems. The Annals of Probability 33, 82–113.
  • Sartoretti & Hongler (2013) Sartoretti, G. A. & Hongler, M.-O. (2013). Soft control of swarms: analytical approach.

    Proceedings of the 5th International Conference on Agents and Artificial Intelligence

  • Shkolnikov (2009) Shkolnikov, M. (2009). Competing Particle Systems Evolving by I.I.D. Increments. Electronic Journal of Probability 14, 728–751.