 # Records for Some Stationary Dependence Sequences

For a zero-mean, unit-variance second-order stationary univariate Gaussian process we derive the probability that a record at the time n, say X_n, takes place and derive its distribution function. We study the joint distribution of the arrival time process of records and the distribution of the increments between the first and second record, and the third and second record and we compute the expected number of records. We also consider two consecutive and non-consecutive records, one at time j and one at time n and we derive the probability that the joint records (X_j,X_n) occur as well as their distribution function. The probability that the records X_n and (X_j,X_n) take place and the arrival time of the n-th record, are independent of the marginal distribution function, provided that it is continuous. These results actually hold for a second-order stationary process with Gaussian copulas. We extend some of these results to the case of a multivariate Gaussian process. Finally, for a strictly stationary process satisfying some mild conditions on the tail behavior of the common marginal distribution function F and the long-range dependence of the extremes of the process, we derive the asymptotic probability that the record X_n occurs and derive its distribution function.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Let

be a sequence of identically distributed random variables (rvs), and denote by

the common univariate marginal distribution function. For any , set . For simplicity, we set , that is . The rv is a record if . Such an event is coded by the indicator function . When are independent, many results on records are already known (e.g., gal87; arnbn98; [Ch. 4]resn08; barakat2017; falkkp2018). In the multivariate case various definitions of records are possible and have been investigated both in the past and more recently, see e.g., golres89, hashhue05, hwang2010, domfalzot18 to name a few. In this work we consider complete records

; these are random vectors which are univariate records in each component. Precisely, let

be a strictly stationary sequence of -dimensional random vectors (rvs) . Let be the common joint distribution function of with margins , . The rv is a complete record if

 Xn>max1≤i≤n−1Xi,

where the maximum is computed componentwise. We denote the rv coding the occurrence of a complete record at time by .

Except for haiman1987, haiman1998, as far as we know, most of the available results on records concern sequences of independent random variables or vectors. In the present work we derive some new results on the records of a stationary sequence of dependent random variables and dependent random vectors, under appropriate conditions of the dependence structure.

At first we consider a univariate second-order stationary Gaussian process with zero-mean, unit-variance. This means that for every , , and the autocovariance of the process is translation-invariant depending only on the time difference, i.e. for every , , where is a function only of the separation and for every , . We derive the probability that a record at time , say , takes place, and the distribution of , being a record. Furthermore, we derive the joint distribution of the arrival time process of records and more specifically the distribution of the increments between the first and second record and the third and second record. We compute the expected number of records which, depending on the type of correlation structure of the Gaussian process, can be finite or infinite. We also focus on joint records and we derive the probability that two consecutive and non-consecutive records at the time and , say and , take place, as well as the joint distribution of , considering they are both records.

We highlight that many of our findings, such as the probability that the records and take place and the arrival time of the -th record, are independent of the marginal distribution function , provided that is is continuous. As a consequence, the results actually hold for second-order stationary sequences with Gaussian copulas. On the contrary the distribution of a record (two records), conditional to the assumption that it is a record (they are records), however does depend on .

Next we consider a strictly stationary process satisfying some mild conditions on the tail behavior of the common marginal distribution function and the long-range dependence of the extremes of the process. More specifically, it is assumed that is attracted by the so-called Generalized Extreme-Value family of distributions, and that maxima on separated enough intervals within the time span are approximately independent. Within this setting we derive the probability that is a record, the distribution of (being a record), and the expected number of records.

We complete the work by considering a zero-mean, unit-variance multivariate second-order stationary Gaussian process. We derive the probability that a complete record at time occurs, and we compute the distribution of (being a record), as well as the probability that two complete records at the time and occur, and the joint distribution of (being records).

The paper is organized as follows. In Section 2.1 we introduce some notation used throughout the paper and we briefly review some basic concepts on the multivariate closed skew-normal distribution. In Section 2.2 we present our main results on records for an univariate second-order stationary Gaussian process. In Section 2.3 we provide the asymptotic probability and distribution function of a record at time for a strictly stationary process that satisfies some appropriate conditions. Finally, in Section 3 we extend some of the results derived in Section 2.3 to the case of multivariate second-order stationary Gaussian processes.

## 2 Univariate Case

### 2.1 Preliminary results and notation

Throughout the paper we use the following notation. The symbol , , means an

-dimensional random vector that follows a multivariate Gaussian distribution with mean

and positive-definite covariance matrix , and

is the correlation matrix. Its cumulative distribution function (cdf) and probability density function (pdf) are denoted by

and with . When and , where

is the identity matrix, we write

for simplicity.

We indicate with () a matrix of dimension whose elements are all equal to one (zero). We omit the subscripts when the dimensions of the matrices are clear from the context.

We introduce the notion of a multivariate closed skew-normal (CSN) random vector and we do so by using the so-called conditioning representation ([Ch. 2]genton2004). Let being independent of , where , and . Let , then

where . Define equal to , under the condition that , denoted by , where . The -dimensional random vector follows a multivariate closed skew-normal distribution, in symbols , whose pdf is, for all ,

 ψm,n(x;ξ,Ω,Δ,μ,Σ)=ϕm(x−ξ;Ω)Φn(Δ(x−ξ);μ,Σ)Φn(0;μ,Γ). (1)

We denote the cdf of by . When , and , we omit them among the parameters for simplicity and we write and instead. We recall that the closed skew-normal distribution is also known in the literature as the unified multivariate skew-normal distribution, which simply uses a different parametrization (e.g, Ch. 7.1.2 in azzalini2013skew). The exposition of our results benefits from the parametrization used by the closed skew-normal distribution.

We recall that if then

 Ψm,n(x;ξ,Ω,Δ,Σ)=Φn+m(~x;~Ω)Φn(0;μ,Γ), (2)

where

 ~x=(−μx−ξ),~Ω=(Γ−ΩΔ⊤ΔΩΩ),

see azzalini2010. Furthermore, for and then,

 b+X ∼ CSNm,n(ξ+b,Ω,Δ,μ,Σ) (3) AX ∼ CSNq,n(Aξ,Ω∗,Δ∗,μ,Σ∗) (4)

where , and , (see Ch. 2 in genton2004 for details).

### 2.2 Records of dependent univariate Gaussian sequences

Let be a second-order stationary Gaussian sequence of dependent rvs. Without loss of generality, assume for simplicity that , for every . Throughout the paper we will refer to such a process as a stationary standard Gaussian (SSG) sequence. For any , let and identify the -dimensional and -dimensional subvector partition such that , with corresponding partition of the parameter . By we denote the number of elements of a set .

Our results rely on the following well-known important result on the conditional distribution derived from joint Gaussian distribution. Precisely, let with corresponding partition of the parameters and , then in [Theorem 2.5.1]ander84 it is established that the conditional distribution of given that , is for all ,

 XI∁|XI=xI∼N|I∁|(μI∁,ΣI∁,I∁;I),μI∁=ΣI∁,I¯Σ−1I,IxI,ΣI∁,I∁;I=¯ΣI∁,I∁−ΣI∁,I¯Σ−1I,IΣI,I∁. (5)

Furthermore, we denote the related correlation matrix by

 ¯ΣI∁,I∁;I=σ−1I∁,I∁;IΣI∁,I∁;Iσ−1I∁,I∁;I,

where . For any , when we simplify the notation writing and . When or we further simplify the notation by and .

In our first result we compute the probability that is a record together with its distribution. It is well known that in the case of independent rv with identical continuous df (see e.g., gal87) and that the distribution of , given that it is a record, equals that of the largest observation among falkkp2018.

###### Proposition 2.1.

Let be a SSG sequence of rvs. For every , let , . Then, the probability that is a record and the distribution of , given that it is a record, are equal to

 Pr(Rn=1) = Φn−1(0;Γ1:n−1;1:n−1) Pr(Xn≤x|Rn=1) = Ψ1,n−1(x;ϱ1:n−1,¯Σ1:n−1,1:n−1;n),

where is a variance-covariance matrix whose entries of the associated correlation matrix are

 γi,j;n=1+ρi,j−ρi,n−ρj,n2√(1−ρi,n)(1−ρj,n),i≠n,j≠n (6)

and is a correlation matrix with entries

 ρi,j;n=ρi,j−ρi,nρj,n√(1−ρ2i,n)(1−ρ2j,n),i≠n,j≠n.
###### Proof.

The probability that is a record is

 Pr(Xn>Mn−1)=∫+∞−∞Pr(Xi

where

 Γ1:n−1;1:n−1 = ¯Σ1:n−1,1:n−1;n+ϱ1:n−1ϱ⊤1:n−1 (7) ϱ1:n−1 = σ−11:n−1,1:n−1;n(1n−1−¯Σ1:n−1,n)=(√1−ρi,n1+ρi,n,∀i∈I∁)⊤. (8)

To obtain the second line we used the formula in (5), which leads to , where , and this can be seen as independent of . From the third to fourth row we used Lemma 7.1 in azzalini1996multivariate. With similar steps, we obtain the distribution for the record ,

 Pr(Xn≤x|Rn=1)=Pr(Xn≤x,Xn>Mn−1)Pr(Xn>Mn−1),=∫x−∞ϕ(z)Φn−1(zϱ1:n−1;¯Σ1:n−1,1:n−1;n)dzΦn−1(0;Γ1:n−1;1:n−1)≡Ψ1,n−1(x;ϱ1:n−1,¯Σ1:n−1,1:n−1;n).

The correlations , , in Proposition 3.1 satisfy [Kurowicka and CookeKurowicka and Cooke2006] but they must also be such as to satisfy or

 (ρi,n+ρj,n−1)−2√(1−ρi,n)(1−ρj,n)≤ρi,j≤(ρi,n+ρj,n−1)+2√(1−ρi,n)(1−ρj,n).
###### Remark 2.2.

Assume in Proposition 3.1 that for all . Then,

 Pr(Rn=1)=Φn−1(0;In−1+1n−11⊤n−1)=E(Φn−1(1n−1Z;In−1))=∫+∞−∞Φn−1(1n−1z;In−1)ϕ(z)dz=∫+∞−∞Φn−1(z)ϕ(z)dz=n−1,

where . As expected, we obtain the results in [GalambosGalambos1987] and Lemma 1.1 in [Falk, Chokami, and PadoanFalk et al.2018]. Furthermore,

 Pr(Xn≤x|Rn=1)=Ψ1,n−1(x;1n−1,In−1)=n∫x−∞Φn−1(1n−1z;In−1)ϕ(z)dz=n∫x−∞Φn−1(z)ϕ(z)dz=Φ(x)n.

Let

 T(k):=inf{m∈N:m∑i=1Ri=k},k≥2,T(1):=1,

be the arrival time of the -th record.

###### Lemma 2.3.

Let be the arrival time process of records. Let where and . Set . Then,

 Pr(T(i)=ji,i=2,…,k)=Φjk−k(0;ΓI∁,I∁)Ψk−1,jk−k(0;D¯ΣI,ID⊤,Δ,¯ΣI∁,I∁;I)

where ,

 Δ = ϱI∁,I∁¯ΣI,ID⊤(D¯ΣI,ID⊤)−1, (9) ΓI∁,I∁ = ϱI∁,I∁¯ΣI,Iϱ⊤I∁,I∁+¯ΣI∁,I∁;I, (10) ϱI∁,I∁ = σ−1I∁,I∁;I(B−ΣI∁,I¯Σ−1I,I), (11)

and

 B:=⎛⎜ ⎜ ⎜ ⎜ ⎜⎝1j2−20j2−2…0j2−20j2−20j3−j2−11j3−j2−1…0j3−j2−10j3−j2−1⋮⋮⋮⋮0jk−jk−1−10jk−jk−1−1…1jk−jk−1−10jk−jk−1−1⎞⎟ ⎟ ⎟ ⎟ ⎟⎠∈Rjk−k,k−1 (12)
###### Proof.

We have

 Pr(T(i)=ji,i=2,…,k)=Pr(Mji+1:ji+1−1

where is given in (12). By standardizing the random vector , we obtain

 ∫+∞−∞∫zk−∞…∫z2−∞Φjk−k(ϱI∁,I∁z;¯ΣI∁,I∁;I)ϕk(z;¯ΣI,I)dz=Φjk−k(0;ΓI∁,I∁)∫+∞−∞∫zk−∞…∫z2−∞ψk,jk−k(0;¯ΣI,I,ϱI∁,I∁,¯ΣXI∁,XI∁;XI)dz=Φjk−k(0;ΓI∁,I∁)Pr(Z1

where and are given in (10) and (11).

By recalling formula (4), we obtain

 ⎛⎜ ⎜⎝Z1−Z2⋮Zk−1−Zk⎞⎟ ⎟⎠=⎛⎜ ⎜ ⎜⎝1−100…001−10…0…0…001−1⎞⎟ ⎟ ⎟⎠⎛⎜ ⎜⎝Z1⋮Zk⎞⎟ ⎟⎠=DZ∼CSNk−1,jk−k(D¯ΣI,ID⊤,Δ,¯ΣI∁,I∁;I)

where is given in (9) ∎

In the next result we establish the distribution of the arrival time of the second record as well as that of the increment .

###### Theorem 2.4.

Let be a SSG sequence of rvs. Let with . Assume that for , as and as . For , the distribution of the arrival time of the second record is

 Pr(T(2)=n)={1/2,n=2,Φn−2(0;Γ2:n−1,2:n−1)−Φn−1(0;Γ2:n,2:n),n>2 (13)

where and are defined similarly to (7). Furthermore, for every , the distribution of the increment is

 H(x)=∑n≥2Φn−1(ux;Γ2:n,2:n)−Φn−1(0;Γ2:n,2:n), (14)

where is an -dimensional vector.

###### Proof.

When we have

 Pr(T(2)=2)=Pr(X2>X1)=1/2.

For we have

 Pr(T(2)=n)=Pr(XiX1)=Pr(Xi

Therefore, (13) follows by similar arguments to those used in Proposition 3.1. It must be checked that

 ∑n≥2Pr(T(2)=n)=12+limN→∞N∑n=3(Φn−2(0;Γ2:n−1,2:n−1)−Φn−1(0;Γ2:n,2:n))=limN→∞(1−Φ2(0;Γ1:2,1:2)+Φ2(0;Γ1:2,1:2)−⋯+ΦN−2(0;Γ1:N−2,1:N−2)−ΦN−1(0;Γ1:N−1,1:N−1))=1−limN→∞ΦN−1(0;Γ1:N−1,1:N−1)=1.

Let be zero-mean unit-variance Gaussian sequence with variance-covariance matrix . Set . Clearly . We recall that for every . By the Fréchet inequalities we have that

 An:=max(0,n∑i=1Pr(Xi≤0)−(n−1))=max(0,1−n/2)≤Pn≤1/2.

For we derive the following upper bound . Precisely,

 Pn = Pr(n−1∑i=11(~Xi≤0)≥n−1)=Pr{n−1∑i=1(1(~Xi≤0)−12)≥n−12} ≤ Pr{∣∣ ∣∣n−1∑i=1(1(~Xi≤0)−12)∣∣ ∣∣≥n−12} ≤ 4(n−2)2E⎡⎣{n−1∑i=1(1(~Xi≤0)−12)}2⎤⎦ = 4(n−2)2n−1∑i=1n−1∑j=1Cov(1(~Xi≤0),1(~Xj≤0)) = 4(n−2)2n−1∑i=1n−1∑j=1Cov(Pi,j;n−1/4)=:Bn,

where and where is a bivariate Gaussian cdf with correlation that is given in (6). In the third row we used the Chebyshev’s inequality. Set we rewrite as

 Bn = 4(n−2)2n−2∑h=02(n−h)(Ph;n−1/4) = 8n(1+2/n)2(P0;n−1/4)+8n(1+2/n)2n−2∑h=1(1−hn)(Ph;n−1/4) = αn+βn,

where and

 γh;n=1+ρ0,h−ρ0,n−i−ρh,n−i2√(1−ρ0,n−i)(1−ρh,n−i),h=0,…,n−2.

Now, when we obtain and therefore and as a consequence the term as . We rewrite the term as

 βn = 8n(1+2/n)2n−2∑h=1(Ph;n−1/4)−8n(1+2/n)2n−2∑h=1hn(Ph