1 Introduction
When multivariate time series are used to describe the joint dynamics of stochastic processes, often an a priori assumption of a stabilizing mechanism is made. We expect, for example, that many economic variables will fluctuate randomly over time, but we do not find it plausible that they will diverge while doing so. This is because certain longterm equilibrium relationships are assumed to be present between such variables, despite the volatility we see in them over short time horizons. Many economic examples are now known (see for example Engle & Granger (1987)) and other fields in which such cointegration relationships are found include neuroscience (Østergaard et al. (2017)) and gene differentiation in populations (Hössjer & Ryman (2014)).
A possible model feature that implements such a stabilizing mechanism in multivariate time series is mean reversion: the tendency of all components to drift towards a constant, which has the interpretation of the long term average of the series. Adding a mean reversion term to all increments of a multivariate discrete random walk makes the series secondorder stationary. This also guarantees that the distance between any two components will be stationary.
In this paper we propose a different mechanism: instead of making all components tend to a priori chosen constants, we impose that at every time step they move, in expectation, towards the minimal value among all components. When different components in the time series signify the value of a certain common variable among different groups, such an assumption can be interpreted as all groups mimicking the behaviour of the group which currently has the ’best’ value^{1}^{1}1We consider reversion to the minimum in this paper and only consider biometric variables for which low values are deemed to be best. One could of course also apply our analysis in cases where there is a drift towards the maximum. Our choice here is motivated by the particular empirical example we give in the last section, where we consider multivariate time series for human mortality. .
If knowledge about what is beneficial or detrimental for a group to minimize a certain quantitative indicator is communicated between different groups, and each group is capable of using this knowledge to their benefit, the best practices are implemented in all groups over time. The overall effect would be that the group which has achieved the best (i.e. the minimal) value would be ’followed’ over time by those with worse achievements, as they ’learn’ or ’mimic’ what is beneficial. We will show that this effect will lead to a cointegration relationship between groups but also to an additional downward drift that no group would achieve on its own or if the minimum reversion effect would not be present. In that sense, learning from the best performer generates improvements for all that would not occur if individual groups were left to their own devices.
This hypothesis of ’reversion to the minimum’ in multivariate time series can be tested by specifying models with and without such an effect and applying the Bayesian Information Criterion to compare the goodness of fit. In this paper we provide such specifications, analyse the properties of models where it is present, and give an empirical example of a specific time series where clear evidence for this effect is found.
There are a number of models in the literature that are related to the one we propose in this paper. Systems of diffusion processes have been proposed in which the drift and diffusion coefficients of an individual process depend on the rank of that process within the system. For example, Fernholz (2002) and Banner et al. (2005) introduce the Atlas model to describe the market capitalization of firms in equity markets, and Ichiba et al. (2013) apply their model to define optimal investment strategies. The Atlas model is applied by Sartoretti & Hongler (2013) to describe the dynamics of a swarm, and Ruzmaikina & Aizenman (2005) and Shkolnikov (2009) consider the evolution of competing particle systems in discrete time and study the distribution of the gaps between the particles. Balázsa et al. (2014) introduced a continuous time Markov jump model for interacting particles with a jump rate which depends linearly on the distance of the particle from the center of mass of the whole group.
In contrast to the models proposed in the literature, we consider a system of discrete time processes under the assumption that the distribution of each component’s stochastic increments does not depend on the rank of the component but on its distance to the minimum component. This feature leads to multivariate time series with short term volatility and long term equilibrium relationships, while it gives a clear interpretation of the presence of the stabilizing mechanism in the long run.
The remainder of the paper is organised as follows. In section 2 we introduce our model and analyse some of its theoretical properties. Section 3 provides a study in which we provide empirical evidence that a ’minimum reversion’ effect can be found in human mortality data. Finally, we provide some conclusions and suggestions for further research in Section 4.
2 Minimum Reversion Model
2.1 Specification
Let for certain with and let be a multivariate time series in with components with for a given . We denote the minimum value among the different components at time by
(1) 
and specify the following model for the dynamics of the components with :
(2) 
The , , and are given constants. We define and assume that the are independent and identically distributed
dimensional random variables with a multivariate Gaussian distribution. More precisely, we assume that
(3) 
for independent standard Gaussian variables , which creates a correlation structure parameterized by the constants .
The constant drift , the first order autoregression coefficient for the differenced series and the Gaussian increment are standard features in time series modelling. Our innovation concerns the parameters which quantify the ’learning effect’. To discuss the properties of the newly proposed term in (2) we first consider the model with and . This introduces a downward drift for when is not the component with the lowest current value.
The specification in (2) with and also creates a downward trend in the minimum process defined in (1). To form some intuition for this result we consider the case where for all , that is, the are independent for any fixed . In that case we have
Substituting gives an expression for and since for all and for at least one
, this probability must be strictly greater than
and smaller than or equal to . This establishes that the probability of a downward movement is always greater than and that it is largest when all components attain the same value, that is, for all . The probability is increasing with the number of components .Individual components are thus not stationary but they turn out to be cointegrated, as the following result shows.
Proposition 1.
If all processes in (2) have a common minimum reversion parameter and a common drift and there is no autoregressive term, so , and for all , then the processes are cointegrated.
Proof.
Fix a and define for any . We then find for any
Since we obtain that is a stationary AR(1) process for all .
Furthermore, we find that and therefore
The first term in the last expression is a minimum over stationary processes and the other terms are stationary too, hence is stationary. ∎
2.2 A analysis of the two dimensional case
To characterize the dynamics of minimum reversion, and to show that it induces a downward drift even for the case where the drift parameters are taken to be zero, we look at the simplest nontrivial case in two dimensions (. This means the dynamics of the time series become:
(4)  
(5) 
Note that we take and for the analysis in this section and we assume that the correlation between the i.i.d. Gaussian variables and equals for all .
We define^{2}^{2}2If we let and , but this event has zero probability for all .
Proof.
Let the indices of the minimum and maximum be denoted by and . We first determine the distribution of the minimum and maximum^{3}^{3}3Since the variables are Gaussian this distribution can be obtained directly as well, see Roberts (1966).. Substituting in (4)(5) gives, for
that
We find
The term is independent of
and has a Gaussian distribution with mean zero and variance
so for all . We thus find that equals^{4}^{4}4Here and in the sequel we use to denote the cumulative probability function for a standard Gaussian random variable.(8)  
if we define the shorthand notation and . The function increases monotonically from its minimum for towards zero for . Likewise,
Since for a
with the standard normal distribution, we must have that
.Let , and define , then we have for and that which shows that the process satisfies the geometric ergodicity conditions in Theorem 15.0.1 of Meyn et al. (2009) on the domain . This proves that for any choice of the distribution function of converges to a stationary distribution . This implies that for the function should satisfy
The solution to this equation is with the parameter , i.e. the stationary distribution of is the distribution of if . The stationary distribution of is therefore the distribution of , which gives (7).
Due to (8), the expected change in under the stationary distribution for equals plus the effect of the reversion to the minimum. This effect was shown above to be which gives the second equality in (6) after some rewriting. The first one follows from the fact that converges to a stationary distribution. ∎
If we choose certain parameter values, the asymptotic drifts in proposition 2
can also be estimated by Monte Carlo simulations. We generated
paths to approximate the stationary distributions and used these to estimate the limit of the expectation of the extra drift (Table 1) and the limit of the expectation of the difference between the minimum and maximum (Table 2) for , and in the absence of correlation between the increments for the two time series. Simulations also allow us to estimate the extra drift generated by the minimum reversion term in more than two dimensions, i.e. for and we show some examples Tables 1 and 2. As expected, we find that both the drift generated by the minimum reversion and the difference between the minimum and maximum increases with the population size . The former increases and the latter decreases when the strength of the reversion to the minimum, which is determined by the parameter , increases.Exact  Simulation  

0.0125  0.0447  0.0448  0.0671  0.0817  0.1129  0.1401  0.1641 
0.025  0.0635  0.0635  0.0952  0.1158  0.1602  0.1987  0.2329 
0.05  0.0903  0.0903  0.1355  0.1649  0.2280  0.2828  0.3314 
0.1  0.1294  0.1294  0.1942  0.2362  0.3266  0.4052  0.4748 
0.2  0.1881  0.1881  0.2821  0.3432  0.4745  0.5887  0.6899 
0.4  0.2821  0.2821  0.4231  0.5147  0.7118  0.8830  1.0348 
Exact  Simulation  

0.0125  7.1589  7.1593  10.7386  13.0627  18.0644  22.4106  26.2618 
0.025  5.0781  5.0774  7.6185  9.2656  12.8138  15.8962  18.6282 
0.05  3.6137  3.6132  5.4202  6.5931  9.1178  11.3112  13.2565 
0.1  2.5887  2.5886  3.8831  4.7229  6.5316  8.1031  9.4965 
0.2  1.8806  1.8806  2.8209  3.4313  4.7454  5.8866  6.8988 
0.4  1.4105  1.4104  2.1156  2.5735  3.5591  4.415  5.1742 
3 Evidence for a Learning Effect in Mortality Rates
In this section we use the time series model in (2) to model mortality rates in several countries. The implicit assumption is that a population with the high mortality rate ”copies” the behaviour of individuals in the population with low mortality. Such a model is consistent with the spread of medical advances and changes in behaviour like a reduction in smoking prevalence.
3.1 Modelling Mortality  The Common Age Effect Model
To obtain the time series, we first fit a stochastic mortality model to observed death counts. We assume that the number of deaths, , in population at age in calendar year
is a random variable with a Poisson distribution, that is,
(9) 
where is the hazard rate (also known as the ”force of mortality” in the literature) and refers to the central exposure to risk.
We then use a stochastic model for the force of mortality that incorporates the populationspecific time series defined in (2). As we are modelling the mortality in multiple populations simultaneously and wish to make our model suitable for a wide age range, we use a modification of the LeeCarter model (Lee & Carter (1992)) with common age effects, as suggested by Kleinow (2015):
(10) 
where the age effects and do not depend on the population . Having age effects that are common to all populations ensures that the individual components
(known as ”period effects”) are comparable across populations as they are rescaled by a common vector
and shifted by a common vector . The parameters in (10) are not identifiable sincefor any real numbers and . To identify a unique set of parameters, we impose the following constraints on the parameter vectors and :
for a fixed reference age . Applying those two constraints means that for in every population . In other words, we can interpret the period effect as the fitted log mortality rate at the reference age in population . For our empirical analysis we choose the reference age .
We estimate the age effects and and the period effects using the maximum likelihood method based on the Poisson model in (9) for observed deaths counts and exposure values .
3.2 Data
The empirical death counts and exposure data have been obtained from the Human Mortality Database (HMD), see HMD (2018). We consider data for ages and for two ranges of calendar years: or . Table 3 shows a list of countries included in our study together with the HMD country codes. As indicated in the table, there are some countries for which data from 1921 are not available and those countries have been excluded from our empirical study for that range or years.
Country  HMD Code  1921–2011  Country  HMD Code  1921–2011 

The Netherlands  NLD  ✓  Sweden  SWE  ✓ 
Denmark  DNK  ✓  Belgium  BEL  ✓ 
Finland  FIN  ✓  England & Wales  GBRTENW  ✓ 
France  FRATNP  ✓  Switzerland  CHE  ✓ 
Australia  AUS  ✓  Italy  ITA  ✓ 
Austria  AUT  Ireland  IRL  
Norway  NOR  ✓  Japan  JAP  
Canada  CAN  ✓  New Zealand  NZL_NP  
Portugal  PRT  Spain  ESP  ✓  
USA  USA  Iceland  ISL  ✓ 
3.3 Empirical Results
Figure 1 shows the estimated values of the age effects and obtained from data for females based on the observation period 1921–2011. The age effects for the other data sets are not shown since they all have a very similar shape. To illustrate our data and the effect of the minimum reversion effect, Figure 2 shows the estimated values of for females for the years 1921–2011 in the dataset, together with two projected scenarios based on the model in (2). On the lefthand side the projection includes the minimum reversion effect based on the estimated parameter and on the righthand side we set .
To investigate the significance of the learning effect, we compare the Bayesian Information Criterion (BIC) for models without minimum reversion () with the more general model in (2). If denotes the number of parameters and is the total number of observations across all populations and all years, the BIC value equals , where is the maximum value of the likelihood function for the model in (2). This means
where is a constant which does not depend on the parameters that must be estimated, and are vectors with components
and is the covariance matrix of so for and .
We determined the BIC values for different specifications in which parameter values may or may not be constrained to be the same for all different groups in . Based on this analysis, the parameters and are taken populationspecific while the parameters , and are common to all populations. The obtained BIC values and parameter estimates^{5}^{5}5We do not show the values and for every group in but only report , the average of over all groups , and , which is defined by the requirement that is the average of over all groups . Estimated parameter values for all countries, genders and data periods are available upon request. are shown in Table 4 for four data sets.
Model  BIC  

Females in calendar years 1921–2011  
3886.16  30  7558.49  0.0224  0.2998  0.0338  0.4996  
3890.61  31  7560.27  0.0194  0.0191  0.2895  0.0336  0.4977  
Males in calendar years 1921–2011  
4151.49  30  8089.15  0.0150  0.1890  0.0297  0.5207  
4155.65  31  8090.35  0.0131  0.0185  0.1804  0.0295  0.5224  
Females in calendar years 1951–2011  
3854.23  42  7411.38  0.0210  0.3414  0.0288  0.4717  
3861.62  43  7419.08  0.0161  0.0177  0.3394  0.0285  0.4641  
Males in calendar years 1951–2011  
4100.51  42  7903.94  0.0246  0.3320  0.0252  0.5845  
4105.98  43  7907.81  0.0192  0.0167  0.3219  0.0250  0.5832 
We notice that the BIC values always improve when is not restricted to be zero which shows that the learning effect adds to the goodness of fit of the model. We also observe that the estimated drift parameter is reduced when minimum reversion is included in the model, from which we conclude that some of the mortality improvements in the populations are driven by learning effects from others who have lower mortality rates.
4 Conclusions and Further Research
We have shown that model specifications that include a minimum reversion term resulted in better BIC values than specifications without such a term when they were fitted to several datasets from the human mortality database. Visual inspection of the projections generated by the time series with minimum reversion in 2 also shows a clear improvement. This testifies to the usefulness of the incorporation of a ”learning effect” in the time series.
Several extensions of the proposed model are possible. In particular, the minimum in (2) could be replaced by other rank statistics. Another possible direction of research is the study of a continuous time version of minimum reversion, where the drift term of a diffusion process in multiple dimensions is a function of the distance of the process to the minimum of its components. The properties of these processes could then be compared to the rankbased diffusion processes for particle systems mentioned in the Introduction of this paper.
References
 Balázsa et al. (2014) Balázsa, M., Ráczb, M. Z. & Tóth, B. (2014). Modeling flocks and prices: Jumping particles with an attractive interaction. Annales de l’Institut Henri Poincaré  Probabilités et Statistiques 50, 425–454.
 Banner et al. (2005) Banner, A. D., Fernholz, R. & Karatzas, I. (2005). Atlas Model of Equity Markets. The Annals of Applied Probability 15, 2296–2330.
 Engle & Granger (1987) Engle, R. F. & Granger, C. W. J. (1987). Cointegration and error correction: Representation, estimation, and testing. Econometrica 55, 251–276.
 Fernholz (2002) Fernholz, E. R. (2002). Stochastic Portfolio Theory. Springer.
 HMD (2018) HMD (2018). Human Mortality Database: University of California, Berkeley (USA), and Max Planck Institute for Demographic Research (Germany). Available at www.mortality.org or www.humanmortality.de (data downloaded on 30 April 2018).
 Hössjer & Ryman (2014) Hössjer, O. & Ryman, N. (2014). Quasi equilibrium, variance effective size and fixation index for populations with substructure. Journal of Mathematical Biology 69, 1057–1128.
 Ichiba et al. (2013) Ichiba, T., Pal, S. & Shkolnikov, M. (2013). Convergence rates for rankbased models with applications to portfolio theory. Probab. Theory Relat. Fields 156, 415–448.
 Kleinow (2015) Kleinow, T. (2015). A common age effect model for the mortality of multiple populations. Insurance: Mathematics and Economics 63, 147 – 152. Special Issue: Longevity Nine  the Ninth International Longevity Risk and Capital Markets Solutions Conference.
 Lee & Carter (1992) Lee, R. D. & Carter, L. R. (1992). Modeling and Forecasting U.S. Mortality. Journal of the American Statistical Association 87, 659–675.
 Meyn et al. (2009) Meyn, S., Tweedie, R. L. & Glynn, P. W. (2009). Markov Chains and Stochastic Stability. Cambridge Mathematical Library. Cambridge University Press, 2nd ed.
 Østergaard et al. (2017) Østergaard, J., Rahbek, A. & Ditlevsen, S. (2017). Oscillating systems with cointegrated phase processes. Journal of Mathematical Biology 75, 845–883.
 Roberts (1966) Roberts, C. (1966). A correlation model useful in the study of twins. Journal of the American Statistical Association 61, 1184–1190.
 Ruzmaikina & Aizenman (2005) Ruzmaikina, A. & Aizenman, M. (2005). Characterization of invariant measures at the leading edge for competing particle systems. The Annals of Probability 33, 82–113.

Sartoretti & Hongler (2013)
Sartoretti, G. A. & Hongler, M.O. (2013).
Soft control of swarms: analytical approach.
Proceedings of the 5th International Conference on Agents and Artificial Intelligence
1.  Shkolnikov (2009) Shkolnikov, M. (2009). Competing Particle Systems Evolving by I.I.D. Increments. Electronic Journal of Probability 14, 728–751.
Comments
There are no comments yet.