## 1 Introduction

Many real world processes, like global weather data, water reservoir levels, biological or medical signals, economic time series, etc, are intrinsicaly non-stationary Andreas et al. (2008); Nerini et al. (2017); Boashash et al. (2013); Couts et al. (1966); Young (1994)

: their probability density function (PDF) deforms when time evolves. Analyzing such processes requires a stationary hypothesis in order to apply classical analysis, like,

e.g., two-point correlations assessment Yang and Shahabi (2005). The stationary hypothesis can be either strict or weak: while a strict stationarity requires all moments of the process — and hence its PDF — to be time-independent, a weak stationarity is achieved when the first moment and the covariance function are time-independent and the variance is finite at all time

Dębowski (2007). Even the weaker hypothesis is often very restrictive and not realistic over long time periods. When the signal has a drift or a linear trend, another approach is to focus on its time-increments or time-derivatives. Indeed, assuming that the increments or time derivative are stationary is then a more realistic hypothesis. For real world processes, the stationarity of the increments or even the stationarity of the signal is often argued to be valid when considering small chunks of data spanning short enough time range Yaglom (1955); Ibe (2013); Frisch (1995), so that slow evolutions of higher order moments can be neglected. The present article focuses on non-stationary processes with increments that are stationary and centered; this hypothesis ensures that the processes do not have any trend or drift.Shannon information theory provides a very general framework to study stationary processes Shannon (1948); Kantz and Schreiber (2003), and some attempts to analyze non-stationary processes have been reported Vu et al. (2009); Ray and Chowdhury (2010); Gómez Herrero et al. (2015). Contrary to most classical approaches, like, e.g., linear response theory in statistical physics or solid state physics, this framework is not restricted to the study of two-point correlations and linear relationships, and it allows to quantify higher order dependences Granero-Belinchón et al. (2019) and nonlinear dynamics Kantz and Schreiber (2003). Information theory can be straightforwardly applied to any non-stationary time-process : by carefully studying how the probability density and dependences of the process evolve in time, a time-evolving Shannon entropy can be defined. The drawback of this approach is that it requires the knowledge of many realizations of the time-evolution of the process, as it relies on having enough statistics over the realizations Gómez Herrero et al. (2015).

Unfortunately, obtaining enough data is very difficult in real world systems where in the best case scenario a few realizations can be recorded experimentally, and usually only a single realization is accessible. In this paper, we develop a methodology that can be applied to a single realization, in order to analyse a non-stationary signal with stationary centered increments. We describe a time-averaged framework that gathers all available data points in a time window representing a single realization, whether it is the full experimental time duration, or just a fraction of it Vu et al. (2009).

The present paper is organized as follows. In section 2, we present the general framework of information theory for a non-stationary signal, and our new framework that exploit time averages. We then give a particular emphasis on self-similar processes. In section 3, we report a benchmarking of our framework in the special case of Gaussian self-similar signals, a model situation where it is possible to obtain analytical developments. In section 4, we explore the case of non-Gaussian self-similar processes. Finally, in section 5, we drop the hypothesis of self-similarity and we apply our framework to a multifractal process.

## 2 Information theory for non-stationary processes

### 2.1 Non-stationary processes with stationary increments

In this article, we consider non-stationary processes with stationary increments. Such a process can be written as a motion obtained by integrating a stationary noise :

(1) |

where and are the values at time , both of which can be set to 0 without loss of generality.

Nowadays signals are recorded and stored on digital media, which amounts to consider in practice a set of data sampled at discrete times where . We further assume that the signals are equi-sampled, i.e., is constant and we choose . So we consider in this article discrete time processes and we express them as motions obtained by integrating a stationary noise according to:

(2) |

where again . Eq.(2) can also be replaced by

(3) |

If the noise is not centered, i.e., has a statistical mean , we introduce the centered noise . The equations for the motion read:

(4) | ||||

(5) |

The process can be interpreted as a motion built on the stationary centered noise together with an additive deterministic drift, which is the linear trend .

In this article, we study motions without trend, so we impose that the noise is centered, i.e., that its statistical mean . Besides the simple centering of the increments , any detrending method can be applied to , e.g., using moving averages. As a consequence, the motion is centered: its statistical mean is at all times . Nevertheless, its variance, and all its higher order moments, may depend on time: the motion is a non-stationary process with stationary increments. Typical examples of such processes are Brownian motion and fractional Brownian motion Mandelbrot and Van Ness (1968), both of which have a variance that evolves with time.

### 2.2 General framework

For a generic non-stationary process , the probability density function (PDF)

changes with time. The information theory framework can be applied to each random variable

, i.e., at each time . To do so, the PDF of needs to be estimated at each time , which in practice requires to have many realizations available Gómez Herrero et al. (2015).To analyze the temporal dynamics of a random processes at a given time , we consider the

-dimensional vector obtained with the Takens time-embedding procedure

Takens (1981):(6) |

The embedding dimension controls the order of the statistics that are considered, and the delay defines a time scale. We define below some information theory quantities that are functionals of the

-point joint-distributions

, in order to characterize linear and non-linear temporal dynamics.##### Shannon entropy

The entropy of is:

(7) |

This quantity depends on time , as well as on embedding parameter and delay . We further note it , where the index indicates the time and the parameters are indicated as upper indices. It measures the amount of information characterizing the -dimensional PDF of the process at time sampled at scale . When , the entropy does not depend on and does not probe the dynamics of the process; we then note it , dropping the upper indices. However, for embedding dimension the entropy depends on the linear and non-linear dynamics of the process. Indeed, the entropy involves arbitrarily high order moments of the joint PDF . As usual, the entropy does not depend on the first moment of the distribution.

Using the time-increments of size , , it can be shown (see appendix A) that the amount of information measured by is the same as the amount of information in the vector , i.e.,

(8) | ||||

For processes with stationary increments, the marginal distribution of may be strongly time-dependent, but the marginal distributions of any increment is time-independent. Eq.(8) thus suggests that the time-dependence of originates mainly from , the first component of the rewritten embedded vector . Nevertheless, it should be observed that although the increments, considered by themselves, have a stationary dependence structure, the covariance of with any of the increments is a priori non-stationary.

##### Mutual information and auto-mutual information

The mutual information measures the amount of information shared by two processes. For two non-stationary time-embedded vectors and , it is defined as :

(9) |

In the following, we use auto-mutual information to measure, for a single process , the shared information between two successive time-embedded vectors of dimension and Granero-Belinchon et al. (2017):

(10) |

Auto-mutual information defined in (10) probes the dynamics of the process at time by measuring the dependencies between two consecutive chunks of and points sampled every .

##### Entropy rate

The entropy rate, or entropy gain Crutchfield and Feldman (2003), of order at time measures the increase of Shannon entropy when the embedding dimension is increased from to . It is defined as the variation of Shannon entropy between and , two successive time-embedded versions of the process :

(11) | ||||

(12) |

Within the general framework, the entropy, mutual information and entropy rate are well defined at any time for a non-stationary process. Although this framework can formally be used to analyze non-stationary processes at any time , in practice it is often impossible to assess statistics at a fixed time , as the number of available realizations from real world datasets may be very small. To overcome this issue, we propose in the next section another framework that considers averages over a finite and possibly large time window, which represents for example the duration of an experimental measurement.

### 2.3 Practical time-averaged framework

We now focus on non-stationary processes with stationary increments. We develop in this section a pragmatic approach which can be applied when a single time trace of a non-stationary signal is available.

We first present a very formal perspective that defines a time-averaged PDF of a non-stationary process. We then propose a practical approach which uses a very simple estimation of such a time-average PDF. We finally use this practical approach to define all the information quantities that we are interested in.

#### 2.3.1 Time-averaged framework

Using a formal perspective, we consider the global statistics of the dataset, when forgetting its time dynamics, and we formally consider the time-averaged probability density function in the time window :

(13) |

Because of the time-average, this probability density function doesn’t depend on a single time but on the starting time and the duration of the time window.

In the case of a stationary process, the PDF is independent of , so the PDF is independent of and .

In the case of a non-stationary process with stationary centered increments, the dependence on only appears on the mean of the time-averaged PDF . As a consequence, since the Shannon entropy does not depend on the mean, none of the information theoretic quantities depends on .

In the case of a non-stationary process with stationary but non-centered increments, there is a drift: the first moment of evolves linearly with time. When integrated in time in eq.(13), this induces a deformation of the time-averaged PDF , which affects a priori moments of any order. As a consequence, the Shannon entropy is then expected to depend on .

In the following, we focus on non-stationary processes with stationary centered increments, described in section 2.1.

#### 2.3.2 Practical framework

In practice, given a time series of length , we propose to very roughly approximate the PDF defined in (13) with the normalized histogram of all data points , available in the time window. This is a very strong assumption, as is a priori very different from any , and a priori very different from the histogram constructed after cumulating all the available data in the interval. This pragmatic approach comes down to treat the set of available data points in the time interval exactly in the same way as if it was a set of data points originating from a stationary, albeit unknown, process and then estimate its PDF.

In the following, we drop the hat in the notations, and consider only the ersatz probabilities in place of the time-averaged probabilities . As we discuss later in section 6, if several experimental realizations are available, it is of course possible to use them to enhance the estimation of the time-averaged PDF.

#### 2.3.3 Information theory quantities in the practical framework

Given a time series of length , and considering the ersatz PDFs , we define the time-averaged Shannon entropy, the time-averaged auto-mutual information and the time-averaged entropy rate, as described below.

##### Ersatz Shannon entropy

We define the ersatz entropy of the time-embedded signal as the entropy of the time-averaged PDF :

(14) |

gives the amount of information of the set of values of the signal in the time interval and hence it can be interpreted as the total information characterizing the temporal trajectory of the process. If the process has stationary centered increments, the total amount of information in the trajectory depends only on its length , and not on its starting time . In that sense, the ersatz entropy is not stationary.

Using the rewriting (8), we argue that this dependence in originates from — the first component of the vector — which has a time-dependent marginal distribution. Because the other components of are increments, they have by hypothesis a stationary dependence structure. So increasing the embedding dimension does not impact the dependence of the ersatz entropy on the window size , but only its dependence on the increments size .

##### Auto-mutual information

We define the ersatz auto-mutual information as:

(15) |

##### Entropy rate

We define the ersatz entropy rate over a time interval of size as:

(16) | ||||

(17) |

From (16), we may expect a cancelation of the main dependence in which is the same for and for . As a consequence, the ersatz entropy rate should be stationary, in the sense that it should not depend on the length of the time interval that is considered.

If the available samples span a very large time window, one may consider using multiple non-overlapping time windows of size starting at various times. Because of the stationarity and zero-mean of the increments and hence the independence of the ersatz quantities on , it is possible to average the different estimations of the ersatz quantities obtained in each window. It is also possible to use all the non-overlapping windows to populate the histogram and thus enhance the estimation of the time-averaged PDF. Each of these two operations will increase the statistics and hence improve the estimation.

### 2.4 Self-similar processes

In this section, we focus on the special case of self similar processes, i.e., signals which exhibit monofractal scale invariance Mandelbrot (1982). Such processes have been used as a satisfying first approximation to model or describe very various phenomena, such as ionic transport Mauritz (1989), fluid turbulence Chevillard et al. (2012), climate Kavvas et al. (2015), river flows Rigon et al. (1996), cloud structure Gotoh and Fujii (1998) or earthquakes Console et al. (2003), as well as neural signals Ivanov et al. (2009), stock markets Drozdz et al. (1999); Cont et al. (1997), texture patterns Uhl and Wimmer (2015) or internet traffic Chakraborty et al. (2004). A process is monofractal scale-invariant if there exists a real number such that for all , the probability density functions of and are equivalent. is called the Hurst exponent. If , the process is stationary and called a fractional noise. If , the process is non-stationary with stationary increments. The case corresponds to the traditional Brownian motion.

Assuming , the scale invariance property can be expressed as Flandrin (1992):

(18) |

The scale invariance property of a process transfers to its increments, as well as any of its time-embedded version:

(19) |

This relation allows to express the non-stationary PDF of at any time as a function of the PDF at unit-time (=1). This is done by using the factor in eq.(19), i.e., by rescaling each coordinate of the embedded vector by the factor .

Using eq.(13), it is straightforward to see that the scale invariant property of the form (19) is also valid for the time-averaged PDF .

Because of its definition (2) as a cumulative sum of a noise, a motion can be seen as accumulating the correlations between successive points of the noise. When performing a time-embedding, the particular case is interesting: considering the relation (8), we may expect that the information contained in the time-embedded motion is closely related to the information contained in the time-embedded noise . This is not the case anymore when .

##### Fractional Brownian motion

The Fractional Brownian motion (fBm) was proposed by Mandelbrot and Van Ness Mandelbrot and Van Ness (1968) and quickly became a benchmark for self-similarity and long-range dependence. The fBm is the only Gaussian self-similar process with stationary increments. It is characterized by its Hurst exponent, .

The fBm is a motion, obtained by integrating according to (2) a fractional Gaussian noise (fGn), defined as a centered Gaussian process with the correlation structure

(20) |

The fGn is a stationary noise with the standard deviation

. It is scale-invariant with a Hurst exponent .The non-stationary covariance structure of the fBm reads

(21) |

where .

#### 2.4.1 General framework

We show below how the theoretical information quantities depend on time and delay . We start from the relation (8) between the entropy of the time-embedded vector and the entropy of the increments and we normalize each component of the vector by its standard deviation. The standard deviation of the motion evolves with time as , while the standard deviation of the increments is independent of , thanks to the stationarity of the increments, and evolves with the size of the increment as . So we have:

(22) | ||||

(23) |

We then use the scaling law (19) for to relate the joint probability at a given time to the joint probability at unit-time , which leads to:

(24) | ||||

(25) |

Using (8) again at time , we have , so we can express the time-dependent Shannon entropy (7) for self-similar processes as:

(26) |

The entropy rate can be rewritten with (11) and (26) as:

(27) |

where is the entropy rate at time , using the rescaled time delay .

Although the two quantities and are considered at a fixed time , they still depend on via the delay . Because is small as soon as , we expect that the dependence of the entropy on time is mainly in , and that the entropy rate is almost time-independent.

##### Fractional Brownian motion

The PDF of the fBm is Gaussian at any time , so we can express its Shannon entropy and entropy rate at time by using eq.(26) and the expression of the Shannon entropy of a Gaussian multivariate process Zografos and Nadarajah (2005). We obtain the following approximated expressions:

(28) | ||||

(29) |

where is the entropy of the fBm at unit-time. These formulae are exact for , but for , constant terms as well as corrections in have been omitted for clarity.

#### 2.4.2 Practical time-averaged framework

For a generic self-similar process, we are not able to derive any analytical results in the practical time-averaged framework. Nevertheless, the behaviors expected for a generic non-stationary process with stationary increments are holding: i) the ersatz entropy is not stationary, in the sense that it depends on the length of the time-interval, ii) the ersatz entropy rate is stationary.

##### Fractional Brownian motion

The ersatz entropy of the fBm over a time window of size can be expressed by averaging its covariance structure on a time window of size . We obtain Granero-Belinchon et al. (2016):

(30) |

The entropy of the fBm thus increases linearly with the logarithm of the window size . The larger the time window, the more there is information in the trajectory.

The auto-mutual information of the fBm can be derived in the same way using (15) for :

(31) |

where is a correction in that reads

(32) | ||||

(33) |

The ersatz mutual information depends logarithmically on the scale and the window size . The larger the window-size or the smaller the scale , the stronger the dependencies.

The ersatz entropy rate of order is obtained by combining (30) and (31) according to (17):

(34) |

which is independent of up to corrections in , while being linear in with a constant slope . The correction in eq.(34) is positive, see eq.(33).

Comparing (28) with (30) shows that for the fBm, the ersatz entropy dependence on is exactly the same as the entropy dependence on . Comparing (29) with (34) shows that the entropy rate and the ersatz entropy rate do not depend on or up to corrective terms that are negligible if the scale is not too large. We also see explicitly that both quantities evolves with the scale in , again up to corrections of order and .

The example of the fBm suggests that for a scale-invariant process the evolution of any information theory quantity with the scale is the same within the practical time-averaged framework or the general framework. We push this analysis further in the next sections, by exploring if this property holds when the process is non-Gaussian.

## 3 Benchmarking the practical framework with the fBm

We focus in this section on the fractional Brownian motion, for which analytical expressions were derived in the previous sections. We use the fBm not only to benchmark our estimators of information theory quantities, but also to illustrate the use of the practical framework and the expected behavior of the ersatz quantities when used on a self-similar process of Hurst exponent .

### 3.1 Characterization of the estimates

#### 3.1.1 Data

To obtain a fBm, we integrate a fractional Gaussian noise (fGn). We use circulant matrix method Helgason et al. (2011) to impose the correlation structure of the fGn (20) . Then, we center and normalize the noise such that the standard deviation, , is equal to one. We then take the cumulative sum to obtain the fBm. Through all this article, for all the processes used to illustrate our results, but we have checked that they hold for any other value .

#### 3.1.2 Procedure

We estimate the Shannon entropy with our own implementation of the -nearest neighbors estimate from Kozachenko and Leonenko Kozachenko and Leonenko (1987). We estimate the auto-mutual information with the algorithm provided by Kraskov, Stogbauer and Grassberger Kraskov et al. (2004). This estimator is also based on a nearest neighbors search and it provides — amongst several good properties — a build-in cancellation of the bias difference originating from each of the two arguments. In the following, we note the number of neighbors, which is the only parameter of the estimators. The entropy rate is then computed using eq.(17).

We generate for each motion a set of independent realizations of fixed size with a Hurst exponent . We compute averages of the estimates on the realizations and use the standard deviation as error bars in the different graphs.

#### 3.1.3 Convergence / bias

We detail here how the ersatz entropy rate evolves with and . We report in Fig. 1a our results for all possible values of the couples , while is set to 1 here. According to eq.(34), the ersatz entropy rate of the fBm converges for large to the value (horizontal black line in Fig. 1a) thanks to the vanishing of the correction term , according to (33).

Fig. 1a can be interpreted as describing the behavior of the bias of the estimator. This bias vanishes non-monotonically as . When is reduced, first the bias is positive and diminishes toward negative values and then converges to zero. This behavior was previously reported for the -nn mutual information estimator applied for stationary processes Kraskov et al. (2004); Gao et al. (2018); Granero-Belinchón et al. (2019), and we confirm it is valid for the fBm.

We observed the same convergence for a large range of scales : the ersatz entropy rate then converges to for large with the same behavior of the bias.

#### 3.1.4 Standard deviation of the estimates

We present in Fig. 2a the evolution of the standard deviation of the ersatz entropy, mutual information and entropy rate with for . The standard deviation of both the entropy and mutual information is large, and does not decrease when — and hence the number of samples — increases. On the contrary, the standard deviation of the entropy rate is much smaller and decreases when increases. We attribute this feature to the dependence of the quantities on the observation time , see eqs.(30) and (31) for the fBm. While and increase as , this is not the case for which is independent on (up to small corrections, negligible for smallish ). Although it is difficult to explain why the standard deviation of the entropy and mutual information remain constant when increases, it seems that this results from a balance between the non-stationarity (in ) and the increased statistics. On the contrary, for the entropy rate which is stationary, the decrease of the std is as expected.

As a conclusion, both the bias and the standard deviation of the ersatz entropy rate increase when increases or decreases and can be made arbitrarily small by increasing the window size . In the remainder of this article, we choose and when studying the behavior of information theoretic quantities on the scale , we set .

### 3.2 Dependence on times and

In this section, we present a detailed numerical study of the ersatz entropy, auto-mutual information and entropy rate of the fBm with . In particular, we present a quantitative comparison with the analytical expressions (28, 29) in the general framework, as well as with analytical expressions (30, 31, 34) in the practical framework for the fBm. These comparisons allow: first, to validate the analytical expressions obtained for fBm in the practical framework, and second to show that the information theoretic quantities in the practical framework evolve in and exactly as their counterparts evolve in the general framework in and . To compare analytical and numerical results, we vary the window size , the scale and the embedding dimension .

#### 3.2.1 Entropy and auto-mutual information

##### Dependence on

The left column of Fig. 3 shows the ersatz Shannon entropy (Fig. 3a) and auto-mutual information (Fig. 3c) at a given scale , as a function of . The evolution of these two quantities for is very close to , which is represented by a continuous black line. This is in agreement with eq.(30) and eq.(31). For , we obtain in the practical framework the behaviors predicted in the general framework, replacing by in the equations. We observe that the auto-mutual information does not depend on the embedding dimension , while the entropy does, with an offset that seems to depend linearly on . The dependence of the entropy and the auto-mutual information on the time window is the signature of the non-stationarity of the signal.

##### Dependence on

The right column of Fig. 3 shows the ersatz Shannon entropy and auto-mutual information for a fixed window size when varying the scale parameter . The ersatz Shannon entropy behaves as , see Fig. 3b, in agreement with eq.(26) or eq.(28). The ersatz auto-mutual information behaves as for any embedding , see Fig. 3d, in agreement with eq.(31), thus suggesting this formula is valid for any embedding dimension.

#### 3.2.2 Stationarity of the entropy rate

Fig. 4a shows that the ersatz entropy rate with embedding dimension is almost constant when is varied. For embedding dimensions , there is a small variation, of about 15, much smaller than the 200% variation observed for either the entropy or the auto-mutual information (Fig. 3a,c) on the same range of . This small dependence on can be due to the correction in eq.(34), which may depend on . We argue that it is mostly due to bias, which increases with the embedding dimension. Indeed, we observe that the entropy rate seems to converge for larger to the same value close to for all . As a larger corresponds to a larger sampling of the statistics, the bias is reduced, as reported in Fig. 1. Moreover, for , eq.(34) predicts a positive correction that vanishes when is large: on the contrary we observe a convergence to a value lower than which hints that the bias is negative and larger than the theoretical correction. This suggests that the form of eq.(34) is still valid for embedding dimensions .

#### 3.2.3 Entropy rate dependence on scale

Fig. 4b shows that for a fixed window size the ersatz entropy rate is proportional to . We have added a black line defined by the linear function , as suggested by eq.(34) without the corrective term. This black line perfectly describes the evolution of the entropy rate with the scale , which is independent on the embedding dimension .

To observe the finer evolution of the entropy rate on the scale , we subtract the main contribution to the entropy rate and we plot for different embedding dimensions in Fig. 5. We observe a slight increase, which is larger for larger embedding dimensions. For , the correction term can be evaluated from eq.(32), and is at most , and does not account for the evolution reported here, which is probably due to the bias which increases when the number of points — which is proportional to — decreases and when the embedding increases.

For a scale invariant self similar process, the standard deviation of the increments of size behaves as . Subtracting amounts to subtracting : for each scale , this corresponds to normalizing the down-sampled data (taking one point every points) by the standard deviation of the increments of size . When the Hurst exponent is a priori unknown, can be computed, and used to compute the main contribution ; thus the fine evolution of the entropy rate with can be used as a tool to probe the deviation from the self similarity assumption, which is interesting for multifractal signals.

## 4 Application of the practical framework to non-Gaussian self-similar processes

In this section, we turn to non-Gaussian processes and describe the results obtained in the time-average framework generalized in this larger class of processes.

##### Procedure

We construct two different motions, in the very same way as we did for the fBm. We integrated two log-normal noises synthesized with the same log-normal marginal distribution and with the same correlation function (20) as the fGn, but different dependance structure. To generate these noises, we use the methodology proposed in Helgason et al. (2011)

to obtain the log-normal marginal by applying two different transformations to the cumulative distribution function

of a Gaussian white noise

: the Hermitian transformation of rank 1 () and the even-Hermitian transformation of rank 2 : , whereis the cumulative distribution function of the targeted log-normal distribution. This synthesis is performed with the toolbox provided at

www.hermir.org. Once the two log-normal noises have been generated, they are integrated using eq.(2) to obtain two non-stationary scale invariant processes with non-Gaussian statistics.The dependence structures of the two log-normal noises were previously studied in detail Granero-Belinchón et al. (2019): while the correlation function is the same for the two noises — and identical to the targeted one of the fBm given by (20) — the complete dependence structure was shown to be different.

To study these two non-stationary and non-Gaussian motions, we use again realizations of points, neighbors and we focus on the case where embedding dimension and Hurst exponent .

##### Bias and standard deviation

We report in Fig. 1b and 1c the evolution of the ersatz entropy rate of the Hermitian and the even-Hermitian log-normal processes in function of .
We observe exactly the same behavior as for the fBm: the entropy rate converges to , the entropy of the log-normal process at unit-time^{1}^{1}1if is a log-normal process of mean and standard deviation , then the process is Gaussian with the mean and the standard deviation and the entropy of can be expressed as Granero-Belinchón et al. (2019):

We report in Fig. 2b and Fig. 2c the behavior of the standard deviation of the estimators. Again, exactly as for the fBm, the standard deviation is large for the ersatz entropy and the ersatz auto-mutual information, while it is much smaller for the ersatz entropy rate.

Again, both the bias and the standard deviation of the entropy rate increase when increases or decreases and can be made arbitrarily small by increasing . These results do not depend on the marginal distribution: they have been obtained not only for the fBm with Gaussian statistics, but also for two motions built on log-normal noises.

##### Dependence on times and

The evolution of on the time window size for the two motions is presented in Fig. 6a). As it was the case for the fBm, depends only weakly on , and seems to converge for larger to the value , up to a small corrective term.

The evolution of with the time scale is presented in Fig. 6b). In the same way as for the fBm, we again observe a large increase, almost proportional to . Because this strong tendency originates from the increase of the standard deviation of the increments of size when increases, we again normalize the entropy rate by subtracting . Results are presented in Fig. 7, together with results for the fBm with for comparison.

The normalized ersatz entropy rate of the motion built from the even-Hermitian log-normal noise appears as almost independent of . This behavior is identical to the one observed for the fBm, but the remaining constant value is different ( or ). The ersatz entropy rate of the fBm (in black) and the even-Hermitian motion (in red) both behaves exactly as , which is the expected behavior for a self-similar process, see eq.(27). On the contrary, the motion built with the Hermitian transformation of rank 1 exhibits an additional variation in : the normalized entropy rate evolves from the value at — expected for the motion built with a log-normal noise and obtained for the even-Hermitian process at any — up to the value — expected for a Gaussian process, and obtained for the fBm at any .

As a conclusion, one can estimate the Hurst exponent of a perfectly self-similar process as the slope of the linear fit in of the ersatz entropy rate. This is a valid approach for the fBm and the motion built from the noise constructed with the even-Hermitian transformation, because the ersatz entropy rate then behaves linearly in . On the contrary, the motion built using an hermitian transformation of rank 1 does not appear as perfectly self-similar. This can be indeed verified by plotting the normalized PDFs (setting the standard deviation to unity) of the increments of the motions for various values of . As can be seen in Fig. 8 the PDFs of the increments of the "standard log-normal process" varies with the scale , while these of the "even-Hermitian motion" remain identical. For , the increments are nothing but the log-normal noises, which are log-normal, as prescribed. For large , the increments of the "even-Hermitian motion" remain log-normal, while the increments of the standard log-normal motion" deforms and seems to become more Gaussian. The ersatz entropy rate catches this fine evolution perfectly.

## 5 Application of the practical framework to a multifractal process

We now explore the proposed time-averaged framework on the Multifractal Random Walk, to illustrate how it performs on a multifractal process. The multifractal random walk (MRW) Bacry et al. (2001); Bacry and Muzy (2002) is a popular multiplicative cascade process widely used to model systems that exhibit multifractal properties Delour et al. (2001). Like the fBm, the MRW is a motion obtained by integrating — again with eq. (2) — a stationary noise such that

(35) |

where is a fGn with parameter and is a Gaussian random process, independent of with a correlation function

(36) | |||||

where is the integral scale, set here to .

The MRW is a scale invariant process: the power spectrum of its time-derivative behaves as a power law with an exponent , which would be the Hurst exponent obtained for a fGn with parameter . Any moment of order of the increments of size behaves as a power law of with the exponent . Contrary to the fBm, the MRW is not exactly self similar and exhibits intermittency: is not a linear function of , as expected for a self-similar process. As a consequence, the shape of the PDF of the increments depends on the scale.

We choose the parameter such that the power spectrum of the noise is identical to the one of the fBm used in the former sections, i.e., . We set the parameter , a value widely used to model the intermittency of Eulerian turbulent velocity field Chevillard et al. (2012).

Figure 9 compares the evolution of the PDF of the increments of the fBm and the MRW. As expected, no change is observed for the fBm, while the PDF of the MRW has wider tails for smaller . The fBm is perfectly self-similar, while the MRW exhibits intermittency Granero-Belinchón et al. (2018): the PDF of its increments is deformed when the scale of the increments is varied, although no analytical expression of the PDF is available.

We apply our practical framework and plot in Fig. 10a) the evolution of the ersatz entropy rate of the MRW with . Again, the entropy rate seems to be independent of . We nevertheless observe a small tendency to increase towards the value the entropy of the MRW at unit-time. Here, because there is no analytical expression of the PDF, cannot be derived analytically and we numericaly estimate its value.

The dependence in is plotted in Fig. 10b). We again observe a strong linear evolution of the ersatz entropy rate in . After subtracting this strong tendency (Fig. 11), we still observe an evolution with , but this evolution appears much weaker than for the Hermitian log-normal (blue curve in Fig. 7). Indeed, the deformation of the PDFs of the increments when varying is much slower for the MRW (Fig. 9b) than for the Hermitian log-normal (Fig. 8a).

## 6 Discussion and Conclusions

We proposed a new framework in information theory to analyze a non-stationary process by considering it as resulting from a gedanken stationary process and estimating the PDF by cumulating all available samples in a time interval of size . This framework hence considers a PDF obtained by time-averaging over a time window , and then proceeds to compute the associated information theory quantities. In particular, the ersatz entropy that is then defined can be interpreted as the amount of information characterizing the complete trajectory of the process . If we assume that the increments of are stationary and centered, then and all other information ersatz theoretical quantities depend only on the duration and not of the first time .

We illustrated our approach by focusing first on a model system: the fractional Brownian motion. We derived in this context the analytical expressions of the ersatz entropy, ersatz auto-mutual information and ersatz entropy rate, which allowed a pedagogical description of our new information theory quantities. We also reported how the ersatz quantities behave when the time-interval size and the embedding time scale are varied: we obtained analytical expressions for embedding dimension , and confirmed them numerically for . Besides the fBm, we reported numerical observations for various self similar or multifractal processes. The ersatz entropy always diverges logarithmically in while the ersatz entropy rate always behaves as almost independent of . The examination of how the ersatz entropy rate depends on the scale provides a fine exploration of either the self-similarity or the multifractality of the process.

This exploration of the multifractality of a non-stationary process with stationary increments using the ersatz entropy rate gives a viewpoint very similar to the one reported when analyzing the increments of the process with the regular Shanon entropy, as reported in Granero-Belinchón et al. (2018). We are currently investigating how to relate quantitatively the two approaches.

In the same vein, the ersatz entropy rate allowed us to discriminate two different non-stationary processes, and obtain fine differences in their self similarity properties (figure 7), in close relation to a method using the entropy rate of the increments of the signal, as exposed in Granero-Belinchón et al. (2019). A possible connection is also under investigation.

Through all this article, we have estimated the ersatz quantities of a process on a single trajectory of this process; this situation corresponds to the worst case scenario where only a single realization of the process is know. If enough experimental data are available, one can improve the estimation of the ersatz quantities in two ways. First, if the same experiment has been conducted multiple times, and thus multiple realizations are available over the time interval , one can use all these independent realizations to enhance the estimation of the time-averaged PDF. Second, if a single but long enough realization of size is available, one can split it into multiple time intervals , and the use these intervals as independent realizations as in the first case. This later situation is made possible by the assumption that the increments of the signal are not only stationary, but also centered.

## Appendix A Entropy of a time-embedded signal

The time-embedded vector (eq. 6) can be mapped into the vector

by the linear transformation

:(37) |

where is the band matrix defined as: