# Exact and asymptotic properties of δ-records in the linear drift model

The study of records in the Linear Drift Model (LDM) has attracted much attention recently due to applications in several fields. In the present paper we study δ-records in the LDM, defined as observations which are greater than all previous observations, plus a fixed real quantity δ. We give analytical properties of the probability of δ-records and study the correlation between δ-record events. We also analyse the asymptotic behaviour of the number of δ-records among the first n observations and give conditions for convergence to the Gaussian distribution. As a consequence of our results, we solve a conjecture posed in J. Stat. Mech. 2010, P10013, regarding the total number of records in a LDM with negative drift. Examples of application to particular distributions, such as Gumbel or Pareto are also provided. We illustrate our results with a real data set of summer temperatures in Spain, where the LDM is consistent with the global-warming phenomenon.

## Authors

• 1 publication
• 1 publication
• 1 publication
• 1 publication
• ### Breaking Bivariate Records

We establish a fundamental property of bivariate Pareto records for inde...
01/24/2019 ∙ by James Allen Fill, et al. ∙ 0

• ### Generating Pareto records

We present, (partially) analyze, and apply an efficient algorithm for th...
01/17/2019 ∙ by James Allen Fill, et al. ∙ 0

• ### Records for Some Stationary Dependent Sequences

For a zero-mean, unit-variance second-order stationary univariate Gaussi...
07/01/2018 ∙ by Michael Falk, et al. ∙ 0

• ### Records for Some Stationary Dependence Sequences

For a zero-mean, unit-variance second-order stationary univariate Gaussi...
07/01/2018 ∙ by Michael Falk, et al. ∙ 0

• ### The Pareto Record Frontier

For iid d-dimensional observations X^(1), X^(2), ... with independent Ex...
01/17/2019 ∙ by James Allen Fill, et al. ∙ 0

• ### Variational Bayes survival analysis for unemployment modelling

Mathematical modelling of unemployment dynamics attempts to predict the ...
02/03/2021 ∙ by Pavle Boškoski, et al. ∙ 0

• ### Bayesian Estimations for Diagonalizable Bilinear SPDEs

The main goal of this paper is to study the parameter estimation problem...
05/29/2018 ∙ by Ziteng Cheng, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction.

Extreme values and records have attracted large efforts and attention since the beginnings of statistics and probability, due to their intrinsic interest and their mathematical challenges. An important motivation for studying records comes from their connections with other interesting problems and, of course, from their countless practical applications in different fields such as climatology [1, 2, 3, 4], sports [5, 6, 7], finance [8, 9] or biology [10]. Moreover, records have been used in statistical inference because, in some contexts, data is inherently composed of record observations [11, 12, 13, 14]. The classical probabilistic setting of independent and identically distributed random observations (iid) observations has been profusely studied. Main results in this framework can be found in the monographs [15, 16, 17]. In the last years there has been an increasing interest in the study of records in correlated observations such as random walks or time series [18, 19, 20, 21, 22, 23].

An interesting departure from the iid model, which introduces time-dependence between observations, results from adding a deterministic linear trend to the iid observations, thus obtaining the so named Linear Drift Model (LDM). This model was first introduced in [24] and later developed in [25, 26, 27]. The model was also considered in [28], under a wide range of scenarios, and has proven particularly useful in the study of global warming phenomena [4, 29]. Furthermore, the importance of this model is not only related to applications but also to its mathematical structure. For instance, the study of records in the LDM model can be helpful in determining whether the underlying distribution is heavy-tailed or not [30, 31].

Also, different generalizations of the notion of record, such as near-records [32, 33, 34] or -exceedance records [35, 36] have been proposed recently. We will work with -records, first introduced in [37], which are observations greater than all previous entries, plus a fixed quantity . In the iid setting, the distribution [38, 39], process structure [40] and asymptotic properties [41] of -records have been studied. In the case , where -records are more numerous than records, their use in statistical inference has been recently proposed and positively assessed; see [41, 42, 43].

In this work, we study -records from observations obeying the LDM, while revisiting some open questions about records. We analyse the positivity and continuity of the asymptotic -record probability as a function of and of the trend parameter

. We also obtain a law of large numbers and a central limit theorem for the counting process of

-records, thus extending the corresponding results in [24]. Furthermore, we completely characterize the finiteness of the number of -records and, in particular, we solve a conjecture posed in [28], about the finiteness of the number of usual records in the LDM with negative trend.

We assess the effect of on the -record probabilities and correlations, for explicitly solvable models. Some of the results obtained in these examples are new and shed light on the behaviour of record events, when the underlying distribution is heavy-tailed. Finally we illustrate our results by analyzing a real dataset of temperatures, which fits the LDM with a trend parameter consistent with the global-warming phenomenon.

## 2 δ-records in the linear drift model

Our objects of interest in this paper are -records, formally defined as follows: given a sequence of observations and a parameter, is defined conventionally as -record and, for , is a -record if .

Note that -records are just (upper) records, if . If , a -record is necessarily a record and -records are a subsequence of records. On the other hand, if , a -record can be smaller than the current maximum, so records are a subsequence of -records.

Throughout this paper we assume that the

are random variables obeying the LDM, that is,

can be represented as

 Yn=Xn+cn,n≥1, (1)

where is the trend parameter and

is a sequence of iid random variables, with (absolutely continuous) cumulative distribution function (cdf)

and probability density function (pdf)

. Another important parameter of the model is the right-tail expectation of the , defined as

 μ+=∫∞0xf(x)dx.

For simplicity, we assume the existence of an interval of real numbers , with , such that , for all , and otherwise. Note that and .

Let denote the indicator of the event is a -record. That is, if and otherwise. So, the number of -records up to index is computed as .

Under the LDM, the probability of { is a -record} is easily computed by conditioning, as

 pj,δ:=E[1j,δ]=∫∞−∞j−1∏i=1F(x+ci−δ)f(x)dx,

where denotes the mathematical expectation. Moreover, the asymptotic -record probability is given by the formula

 pδ:=limn→∞pn,δ=∫∞−∞∞∏i=1F(x+ci−δ)f(x)dx, (2)

which is mathematically justified by the monotone convergence theorem for integrals.

In what follows we occasionally write , etc. to emphasize the dependence on the trend parameter .

## 3 Properties of the δ-record probabilities

We begin with a simple property about the asymptotic -record probability of an affine transformation of the LDM. Let , with , , and . If is the -record probability in this model, then it holds

 ~pδ(c)=pδb(cb).

We consider next some analytical properties of and , as functions of and . We note first that both are increasing in and decreasing in . Moreover, it is easy to see that is decreasing in and continuous in , converging to 1 as . The continuity of is less clear because of the infinite product within the integral in (2).

### 3.1 Positivity of pδ(c)

We show that the positivity of depends on and and on the right-tail behaviour of . We consider two cases depending on :

1. . In this case , for all .

To justify this claim, we show that for all .

If the conclusion is immediate because , as .

If , we note that implies and so, . Thus .

Finally, if , we note that implies , which in turn implies . This follows from the definition of and from Taylor’s expansion of .

Distributions with can be considered as “right-heavy-tailed” and we observe that, for such distributions, the linear trend has no impact on the asymptotic probability of a -record. This class of distributions includes the Pareto and Fréchet, with shape parameter .

2. . As in the previous case, we have three situations depending on the sign of .

For , , for all , since , for all .

If ,

 pδ(0)=∫∞−∞∞∏j=1F(x−δ)f(x)dx=∫∞x++δf(x)dx, (3)

which is positive if and only if and .

Finally, if , then if and only if . Indeed, note that, if , then , for all , and so, only the first observation (by convention) is a -record. Conversely, if , then the interval is nonempty and, for every , we have , for all . Now, since as , and , we have , which implies and, so .

Summarizing the above findings, we state

###### Theorem 1

if and only if and one of the following conditions holds

1. ,

2. .

### 3.2 Continuity of pδ(c)

As commented at the beginning of this section, the continuity of is not obvious. However, thanks to Theorem 1 we can restrict attention to distributions with finite right-tail expectation since, otherwise, vanishes and continuity is trivial. Thus, we assume throughout this section that .

A first interesting fact, which is rigorously proved in Proposition 6 of the Appendix, is that is continuous at every , for every , such that . Then, thanks to the bounded convergence theorem of integration, we conclude that is continuous, at every .

The continuity at is subtler to establish and depends of the sign of and the finiteness of , the right-end point of . Note that, for every and , we have

 ∞∏j=1F(x−δ)≤∞∏j=1F(x+cj−δ)≤N∏j=1F(x+cj−δ).

Then, taking the limit as in the above inequalities,

 ∞∏j=1F(x−δ)≤limc→0+∞∏j=1F(x+cj−δ)≤F(x−δ)N.

Therefore, is 0, if , and 1 otherwise. Then, by the dominated convergence theorem,

 limc→0+pδ(c)=∫∞−∞limc→0+∞∏j=1F(x+cj−δ)f(x)dx=∫∞x++δf(x)dx.

Thus, is right-continuous at by (3). Regarding left-continuity at 0, recall that for . So, is discontinuous at 0 if and only if and .

We now show the continuity of as a function of . The result is trivial if , since , for all . For , note that, by (3), , which is continuous since is a continuous function.

If and is a sequence converging to , we prove that

 limn→∞∞∏i=1F(x+ci−δn)=∞∏i=1F(x+ci−δ), (4)

for all , . Indeed, let , then yielding . Also for large enough and (4) follows. Let now and such that . Then, for large enough, we have and

 −∞∑i=1logF(x+ci−δn)≤−∞∑i=1logF(x+ci−(δ+ε))<∞,

since . So (4) holds, and continuity follows.

In the following theorem we summarize conditions for continuity of .

###### Theorem 2

The asymptotic -record probability , as a function of , is

• continuous at every and right-continuous at , for all ;

• discontinuous at if and only if , and

• continuous in , for all .

## 4 Exactly solvable models

In general it is not possible to compute exactly the probabilities or . We show below explicit results for the Gumbel distribution and for particular instances of the Dagum family of distributions.

### 4.1 The Gumbel distribution

Let , for , be the Gumbel distribution. Note that . Then, if ,

 n−1∏j=1F(x+cj−δ)=F(x)∑n−1j=1e−cj+δ=F(x)eδe−c−e−nc1−e−c

and, if , . So, from (2) we get

 pn,δ(c)=∫∞−∞F(x)eδe−c−e−nc1−e−cf(x)dx=1−e−c1−e−c+eδ(e−c−e−nc),

if , and

 pn,δ(0)=1(n−1)eδ+1.

Note that, taking limits as , in the above formulas, we obtain

 pδ(c)=1−e−ceδe−c+1−e−c=11+e−c1−e−ceδ,

if and , if , as expected from Theorem 1.

Also, for every , decreases with as a logistic function of . Figure 1 shows the behaviour of as a function of and .

### 4.2 The Dagum family of distributions

The random variables in the Dagum family of distributions have cdf given by , for , and , for , where are positive parameters. Two important distributions in the Dagum family are the Loglogistic, with parameters , , and the Pareto (up to a shift), with , . For simplicity, in this example we limit our attention to the case , which has .

By Theorem 1 we know that , for every , so we chose to analyse the speed of convergence of to 0, for some values of . To that end, observe that the formula for takes the manageable form

 pn,δ(c)=∫∞(δ−c)+n−1∏i=1(x+ci−δx+b+ci−δ)qf(x)dx, (5)

which becomes simpler if we further assume that (that is, the trend parameter of the LDM is equal to the scale parameter of the distribution). From (5) we get

 pn,δ(c)=∫∞(δ−c)+(x+c−δx+cn−δ)qf(x)dx. (6)

Note that the Pareto(1,1) distribution, taking , is included as a particular case. This distribution will be studied at the end of this example and later, in section 5, in the context of -record correlations.

We introduce the notation to make explicit the dependence of on . First, for records () we have,

 p(q)n,0(c)= cq∫∞0xq−1(x+cn)−q(x+c)−1dx = qn−q∫10tq−1(1−t(n−1)/n)−qdt (7) = q(n−1)q∫n1(y−1)q−1ydy, (8)

where the second equality follows from the change of variable and the third from .

Observe that (7) and (8) do not depend on and so, for the sake of simplicity, we write . Moreover, from formula (7) we see that

 p(q)n,0=n−q 2F1(q,q;q+1;(n−1)/n),

where is the Gauss hypergeometric function.

Also, from (8) and using the binomial expansion, for , we readily obtain

 p(q)n,0=q(n−1)q((−1)q−1logn+q−1∑k=1(q−1k)(−1)q−1−kk(nk−1)). (9)

The asymptotic behaviour of , for any , can be obtained from (8). For , (9) yields . For , the leading term in the integral in (8) is , so . For , the integral in (8) converges and, using formula 3.191.2 in [44], we get

 p(q)n,0∼n−qq∫∞1(y−1)q−1ydy=n−qqΓ(1−q)Γ(q).

Thus,

 p(q)n,0∼⎧⎪⎨⎪⎩n−qqΓ(1−q)Γ(q),%if01. (10)

It is interesting to observe that the limiting behaviour of , as a function of the power of the tail , seems to match the asymptotic behaviour of when is the Fréchet distribution (, ) and the tuning parameter is the trend , studied in [27].

We now consider and investigate whether , as . This result can be expected since, as , the variables take very large values, so may have little influence on the probability of -record, in the long term.

From (6) we may evaluate , for any , although the computation becomes lengthy as grows. We have carried out the computation with values of from 1 to 7, and obtained

 p(1)n,δ∼log(n)n,p(q)n,δ∼qq−11n,q=2,…,7.

So, from (10) we have , at least for .

For noninteger values of , the limit behaviour of (6) is harder to analyse. To get a tractable expression, we impose . Proceeding as above, we have, for ,

 p(q)n,δ=q(n−1)q(n−2)2q∫n−11(y−1)2q−1yq+1dy.

Therefore, we have

 p(q)n,δ∼⎧⎪ ⎪ ⎪⎨⎪ ⎪ ⎪⎩n−qΓ(2q)Γ(1−q)Γ(q),if 01.

So, under the above stated conditions, , for , but this is not the case if .

To conclude this example, we study the particular case of the Pareto distribution, that is, , and take . The probability of -record is explicitly computed as:

 pn,δ=∫∞δ∨1x−δx2(x+n−1−δ)dx=1(n−1−δ)2((n−1)log(n−min{1,δ}max{1,δ})−min{1,δ}(n−1−δ)), (11)

if and , if . Figure 2 shows the behaviour of as a function of and .

## 5 Correlations

The indicators of -records are in general not independent in the case of iid random variables, see [41]. In [31] the authors study the dependence of record events in the LDM, by means of the following dependence index ( in their case)

 ln(c,δ):=P[obs. n and n+1 are δ-records]P[obs. n is δ-record% ]P[obs. n+1 is δ-record]=E[1n,δ1n+1,δ]E[1n,δ]E[1n+1,δ].

If the events are independent, then . Otherwise, values greater or smaller than indicate positive or negative correlation, respectively. That is, neighbouring -records tend to attract or repel each other, if or .

In order to manipulate we consider the decomposition

 E[1n,δ1n+1,δ]=E[1n,δ1n+1,δ1{YnYn+1}], (12)

which, for , can be written as

 E[1n,δ1n+1,δ] =∞∫−∞(∞∫s−cn−1∏j=1F(s+cj−δ)f(t)dt+s−c∫s−c+δn∏j=2F(t+cj−δ)f(t)dt)f(s)ds =∞∫−∞((1−F(s−c))n−1∏j=1F(s+cj−δ)+s−c∫s−c+δn∏j=2F(t+cj−δ)f(t)dt)f(s)ds, (13)

and, for ,

 E[1n,δ1n+1,δ] =∞∫−∞∞∫s−c+δn−1∏j=1F(s+cj−δ)f(t)dtf(s)ds =∞∫−∞(1−F(s−c+δ))n−1∏j=1F(s+cj−δ)f(s)ds, (14)

since the second term in (12) vanishes.

As for , it is not possible to explicitly compute , in general. Nevertheless, it is still possible to describe the behaviour of the dependence index in some particular cases.

### 5.1 The Gumbel distribution

Let and the Gumbel distribution, as in section 4.1. When and , elementary but lengthy computations yield

 limn→∞E[1n,δ1n+1,δ]=(ec−1)2(ec−eδ+1)(ec+eδ−1)(e2c+eδ−1)

and

 l∞(c,δ):=limn→∞ln(c,δ)=(ec+eδ−1)(ec−eδ+1)(e2c+eδ−1).

By differentiating with respect to , we see that is decreasing in and bounded below by 1, since . With respect to we find that the derivative vanishes at

 δ=log(1−e2c+√e4c−e2c),

and then, for any ,

 maxδ<0l∞(c,δ)=2e2c(√e2c(e2c−1)−e2c+1)√e2c(e2c−1)=2(e2c−√2e3csinh(c)).

Note also that .

For ,

 limn→∞E[1n,δ1n+1,δ]=ec(ec−1)2(ec+eδ−1)(ec+δ−ec+e2c−eδ+e2δ)

and

 l∞(c,δ)=ec(ec+eδ−1)ec+δ−ec+e2c−eδ+e2δ.

We note that , , if , which results in the asymptotic independence of consecutive record indicators in the LDM. Also, there are no critical points for the index when . So, in this case is increasing in with , and decreasing in , with , as can be seen in Figure 3. Gathering these results, we conclude that if and only if . The asymptotic independence for records () was proved in [26]; we have shown here that -records attract each other for and repel each other for .

### 5.2 The Pareto distribution

Let be the Pareto distribution and . The probability of -record is given in section 4.2. Computations of are cumbersome and the explicit expression of can be found in Appendix 10.5.

We have and , for every . Also, , for all , that is, -record-attraction grows unboundedly, as increases. Moreover, it can be proved that as , where is a constant depending on .

The sublinear growth of as increases can be observed in the right panel of Figure 4, for different values of , as well as the decrease in . Also, for fixed (left panel of Figure 4), there is a negative value of where the correlation reaches a maximum, as in the Gumbel case. Note that, for negative and small positive values of , , while, for big values of , .

## 6 Asymptotic behaviour of Nn,δ

In sections 3 and 4 we have presented properties of the probability that observation is a -record. In this section we analyse the random variable , defined as the number of -records among the first observations, and study its behaviour as .

Depending on , and , it might be the case that only finitely many -records are observed. We give necessary and sufficient conditions for this to happen. On the other hand, if grows to infinity, we investigate if the ratio converges (in a certain stochastic sense) to and, in that case, how the oscillations of around are distributed.

Recall that, in the classical record model (), the number of records grows to infinity, and there are universal results ensuring that, for any continuous , converges to 1, almost surely (a.s.) and has, asymptotically, a standard Gaussian distribution. However, when , results in [41] and [45] for the model with , show that may grow to a finite limit and, when it diverges, the corresponding limit laws depend both on and . We begin by analyzing the situation where has a finite limit.

### 6.1 Finiteness of the total number of δ-records

Let be the total number of -records along the sequence . In this section we find necessary and sufficient conditions for the finiteness of and .

Clearly, these questions are related to the asymptotic behaviour of . If , then we can expect . On the other hand, if , it may happen that grows sublinearly to or . Since, by Theorem 1, the positivity of is linked to the finiteness of , we split the analysis in two cases:

1. . In this situation, for any .

To check this assertion, we first prove that . Observe that implies and

 ∞∑n=1P[Yn>a]=∞∑n=1P[Xn>a−cn]=∞∑n=1(1−F(a−cn))=∞, ∀a∈R. (15)

From (15) and the second Borel-Cantelli lemma, we conclude that infinitely often (i.o.), for any , and so, , with probability one. This fact clearly implies . Now, since, for , , we get . On the other hand, for , the event

 {Xn+(c−δ)n>max1≤j≤n−1{Xj+(c−δ)j}} % implies {Xn+cn>max1≤j≤n−1{Xj+cj}+δ},

that is, . Therefore, .

2. . We distinguish three scenarios depending on the sign of .

If , we first assume . In this case, we have and is an immediate consequence of the law of large numbers in Theorem 5 below. If , only the first observation will be a -record as shown in section 3.1, so .

If and , then , since . If and , the situation is more complicated. In fact, if and only if

 ∫∞01−F(x+δ)(1−F(x))2f(x)dx<∞,

which is also equivalent to . This is shown in Proposition 7 of the Appendix, by relating this question to the counting process of geometric records, as studied in [45].

If , we proceed as in (15) to obtain

where the last inequality follows from . Thus, the first Borel-Cantelli lemma ensures that , for all , so . Then, there exists a random variable such that and, consequently, . In this case, we can also prove that ; see Proposition 9.

Summarizing the above, we give a complete characterization of the (almost sure) finiteness of the number of -records in the next theorem.

###### Theorem 3

if and only if one of the following conditions holds

1. and ,

2. , and