The estimation of physical model parameters from observed data is a frequent problem in many areas, such as in machine learning[RAISSI2019686, HARLIM2021109922_2021], geophysics [artigomestrado2017, FWI_EAGEITALY_PLOSONE2020], biology [InverseProblemsBiologia1, InverseProblemsBiologia2], physics [BA2018300, Razavy], among others [inverseproblems_quimica, plosonewilton_2019, HEIDEL2021109865]. Such a task is solved through the so-called inverse problem, which consists of identifying physical parameters that can not be directly measured from the observations [tarantola2005book]. From a practical viewpoint, in the inverse problem, physical model parameters are estimated by matching the calculated data to the observed data by optimizing an objective function [MENKE20121]. The objective function in the least-squares sense is widely used, which is based on the assumption that errors are independent and identically distributed (iid
) by a standard Gaussian probability distribution[tarantola2005book]. Although this approach is quite popular, the least-squares estimation is biased if the errors are non-Gaussian, violating the Gauss-Markov theorem [gauss_markovtheorem_KENDALL, THOMSON2014219]. Indeed, just a few outliers are enough for the least-squares criterion to be inappropriate [robustMethods1988].
In this way, a lot of non-Gaussian criteria have been proposed to mitigate the inverse problem sensitivity to aberrant measurements (outliers). The most common criterion to deal with non-Gaussian errors is based on the -norm of the difference between the calculated and the observed data, in which the errors are assumed to be iid
according to a double exponential distribution (Laplace distribution)[tarantolanormal1_1987]. Although this approach is known for being outlier-insensitive, this criterion is singular in cases where the error is null (or close to zero). Thus, it is necessary to assume that the absolute error is greater than zero according to the machine precision used, which generates an indeterminacy problem from a computational point of view. To avoid the singularity of this approach, hybrid criteria which combine the least-squares distance with the least-absolute-values criterion (-norm) have been proposed and successfully applied for parameter robust estimation [huberOriginal1973, HuberFWI_Guitton, BubeLangan_1997_hybridObjectiveFunction]. However, hybrid approaches require the determination of a threshold parameter, which demands boring trial-and-error investigations, increasing the computational cost [residualnormbrossier].
Indeed, objective functions based on heavy-tailed probability distributions, such as the Cauchy-Lorentz distribution[CauchyRegression_2017] and the Student’s t-distribution [Ubaidillah_2017_studentT], have demonstrated robust properties for unbiased data inversion. However, both approaches assume a fixed probability distribution of errors, not adapting to the particularities of the model or data at hand. In this sense, objective functions based on generalized distributions are interesting because they might be adapted to the specificities of the erratic data by selecting an adequate free-parameter. In fact, several generalized approaches have been proposed to deal with erratic data [GeneralizedGaussian_SEG2015, Sacchi_SEG_generalized_2020, PhysRevE.104.024107, PSI_Jackson_2021, FWI_qLaplace_2021]. Thus, generalized distributions based on the Rényi, Tsallis and Kaniadakis statistics have generated objective functions robust to erratic noise [sergio2021extensive].
In this work, we consider deformed Gaussian distributions associated to generalized statistical mechanics in the sense of Rényi (-statistics) [Renyi1965informationTheory], Tsallis (-statistics) [Tsallis1988] and Kaniadakis (-statistics) [kaniadakis2001] to mitigate the undesirable effects of outliers on estimates of physical parameters. In particular, we place the objective functions based on -, - and -generalizations in the broad context of the Gauss’ law of error [Tanaka2019, errorLawTsallisSuyari, WADA200689], see Refs. [sergio2021extensive, daSilva2020robust, PhysRevE.101.053311]. The three deformed Gaussian distributions mentioned above have already demonstrated robust properties in many applications [sergio2021extensive, daSilva2020robust, PhysRevE.101.053311, igoTsallisEntropy, FWIqGAuss_EAGE_2020, Lima2021, daSilva_2021_EPJPLUS_q_k_Laplace, daSilva_SEG_IMAGE_2021_qGaussian]. However, the entropic index associated with each of these approaches that make the inversion process more robust requires thorough investigation. In this regard, we analyse and compare the generalized objective functions from a statistical and numerical point of view in order to obtain the optimum value of the entropic index. The workflow of the experiments employed in this work is summarized in Fig. 1 in which it is represented a flowchart of an inverse problem. We call attention that in our framework, generalized statistics define the norm employed in the inversion problem solution.
We have organized this article as follows. In Section 2 we present a brief review on the solution of inverse problems using the maximum likelihood method in the conventional framework, as well as in the framework of Rényi, Tsallis and Kaniadakis. Moreover, in Section 2.4 we discuss the similarities among the generalized objective functions by considering a numerical test whose purpose is to estimate line model parameters; and finally, Section 3 is destined to apply the methodology presented in the article to address a classic geophysical problem that consists of estimating the acoustic impedance model using seismic data post-stack contaminated with spike noise.
2 A brief review of generalized statistical in inverse theory: Maximum likelihood methods
An inverse problem is formulated as an optimization task, in which an objective function describes how well the estimated model matches the measurements [tarantola2005book]. In this regard, the model parameters are obtained by solving the following linear system [tarantola2005book, MENKE20121]:
where are the calculated data, in which represents the forward operator. It is worth emphasizing that inverse problems are ill-posed [hadamard], which means that the solution of Eq. (1) is not unique. This is due, in general, to the fact that the observed data are band-limited and noisy. In this way, it is necessary to find methodologies capable to solve Eq. (1) for cases where the observed data, , is corrupted by measurement errors, limitations in data acquisition, among other factors.
In the conventional approach, model parameters are estimated by maximizing the objective function is derived from the maximization of the Boltzmann-Gibbs-Shannon entropy (BGS):
subject to the following constraints:
where is a probability function and represents the difference between observed and calculated data. As well known in the literature, the probability distribution determined from the optimization of the BGS functional entropy subject to the constraints in Eqs. (3) and (4) corresponds to the standard Gaussian distribution (see, for instance, Section 2 of Ref. [Lima2021]):
In other words, inverse problems, in the conventional framework, are solved from the premise that errors are independent and identically distributed (iid) according to a standard Gaussian likelihood function, which is formulated as the following optimization problem:
A likelihood function is a useful tool to estimate physical parameters from the empirical data. In practice, inverse problems based on the Gauss’ law of error are formulated in a least-squares sense. To see this, we notice that maximizing the likelihood function in (6) is equivalent to minimizing the negative of the log-likelihood:
However, it is worth emphasizing that in several problems the errors are non-Gaussian and, therefore, in such contexts the conventional approach becomes inefficient, especially in the presence of aberrant measures (outliers) [Claerbout1973].
As is already known, we obtain with Eq. (5) an adjustment to the data that is nothing more than a mean. For example, if we want to estimate , where is the
-th element of the parameter vector, from Eq. (5) we find the following stationary point
Therefore, if the observed data is contaminated with outliers, what we find using the conventional approach is the measurement of a mean, which in turn is strongly influenced by outliers and even more by a large amount of outliers.
For an objective function to admit a minimum, it must satisfy the condition
In Eq. (9) are arbitrary constants and is the so-called influence function [Lima2021]. The influence function Eq. (9) informs us about the sensitivity of the objective function to outliers. In this sense, if for a certain observed data , the objective function is sensitive to outliers and, therefore, it is not a robust estimator. A robust estimate, however, will indicate for . What we have in the conventional approach, however, is that the conventional objective function, Eq. (7), is linearly influenced by outliers, as in the following equation
The conventional theory of inverse problems, based on Gaussian statistics, failures with errors outside the Gaussian domain. In this sense, we look for alternatives to generalize Gaussian statistics in order to find more robust methods to deal with outliers.
2.1 Rényi’s Framework
Based on information theory, A. Rényi [Renyi1965informationTheory, renyi1961measures] introduced a general information entropy as a one-parameter generalization of the BGS entropy. Rényi entropy (-entropy) functional is expressed by:
where is the entropic index. Furthermore, the entropy in Eq. (11) shares many of the properties of the BGS entropy Eq. (2), such as: it is non-negative, it is additive, and it has an extreme for the case of equiprobability [Renyi1965informationTheory]. The fundamental difference between these two entropies resides in the non-conservation of the concavity, which is associated to the choice of index . Furthermore, Rényi’s entropy recovers Shannon’s entropy at the limit . Applications of Rényi entropy can be found in several fields [Wang2018, SnchezMoreno2010, Dong2016].
Taking into account the constraints in Eqs. (3) and (4), -entropy is maximized by the -generalized Gaussian distribution, -Gaussian, which is expressed in the form [Costa2003, JOHNSON2007, Tanaka2019]:
where if and . In addition, is the normalizing constant:
with representing the gamma function. At the limit , the ordinary Gaussian probability distribution Eq. (5) is recovered. Figure 2 shows some curves of the -Gaussian probability distribution. In particular, we note that at the limit the probability distribution approaches a strongly peaked function.
Following the same path discussed of the previous Section, that is, using the maximum likelihood method, we find a generalized objective function [sergio2021extensive]:
This function will be called -objective function and at limit the conventional objective function, Eq. (7), is recovered. To investigate the behaviour of the -objective function regarding outliers we compute the influence function, as defined in Eq. (9), named -influence function:
2.2 Tsallis’s Framework
Based on multifractals quantities and long-range interactions, C. Tsallis postulates an alternative form for the entropy to generalize the standard statistical mechanics [Tsallis1988]. Since then, a wide variety of applications have been performed based on Tsallis -statistics [PICOLIJR2009, DASILVA2021125539, Erick_2021_q_Pareto_EPBJ, DASILVA_CORSO_2021_Marsquakes]. The Tsallis approach is based on the -entropy, defined as follows:
where is the entropic index (also known as nonextensive parameter). The choice of the entropic index assigns new properties to the entropy functional, and in the limit case it recovers the conventional BGS entropy.
By considering the maximum entropy principle for -entropy, a -generalization of Gauss’ law of error was formulated in Ref. [Suyari2005] assuming that the errors follow an optimal probability distribution. In this regard, the optimal probability function is computed by maximizing the -entropy constrained to the normalization condition, Eq. (3), and the -variance [qlogPlastino, qExpectation] given by:
in which is the escort probability function [Schlogl1993thermodynamics, abe2000remark]. The probability distribution resulting from the aforementioned optimization problem is known by the -Gaussian distribution:
where the normalization constant is given by [prato1999nonextensive]:
A comparison between the conventional Eq. (2) and Tsallis approach Eq. (16) reveals that most probable events gain greater weight in the entropy calculation for the case in which . In this sense, the usual average is replaced by an average that depends on the choice of the index and so the higher the value of this index [hasegawa2009properties], the most likely events receive higher weights. Figure. 4 show illustrative curves of the -Gaussian probability distribution. It is important to note that at the limit we have a behaviour that reminds us of the Rényi distribution in : at this limit both distributions display a peaked behaviour.
Applying the probabilistic maximum-likelihood method in the -Gaussian distribution, we have the following objective function [daSilva2020robust]
In order to check the influence of outliers for our objective function, we calculate the influence function [Lima2021]:
Equation (21) reveals that the Tsallis framework also shows a robust objective function that is resistant to outliers. Figure 5 displays a couple of influence function curves. At the limit , the choice of index implies because the sum inside the brackets will always result in a negative quantity: . On the other hand, for the sum inside the brackets turns into a large positive number, leading to .
2.3 Kaniadakis’s Framework
G. Kaniadakis proposed a new way to calculate the entropy based on the principle of Kinetic Interaction [kaniadakis2001, PhysRevEkaniadakis2002]. This new -entropy that generalizes the BGS statistics is given by:
Kaniadakis statistics has been applied in different contexts [Kaniadakis2020_epidemologia, DASILVA2021110622, Kaniadakis2021_powerlaw]. The conventional entropy (BGS) is recovered in the limit of the entropic index . Kaniadakis’ framework not only includes conventional BGS statistics, but it is related to other statistics, such as the famous quantum statistics of Fermi-Dirac and Bose-Einstein, as well as the Tsallis [kaniadakis2001, Santos2011].
Based on the -Gaussian statistic, the reference [sergio2021kaniadakis] presents an error law that can be applied to a variety of problems and that, because it has a heavy tails distribution, it is able to satisfactorily work with outliers. The -Gaussian distribution is given by
where is the normalization constant given by
and depends on the index and is given by:
Some curves of the -Gaussian distribution are shown in Fig. 6. We notice that the distribution has heavy tails when choosing and at this limit, as well as in the -Gaussian and -Gaussian distributions, the distribution shows a peaked behaviour that resembles the Dirac delta distribution.
In this scenario, the inverse problem is therefore formulated as the problem of optimizing the -objective function that derives from the principle of maximum likelihood
and the analysis of the robustness of this objective function can be performed by the -influence function, which is given by:
The curves of the -objective and -influence functions are shown in Fig. 7. In Fig. 7(c)-(d) we notice that as we increase the value of index the influence of distant values of decreases. In particular, at the limit we observe the curve that indicates less influence for measurements. Looking at Eq. (27) we noticed that for we have .
2.4 Comparing the performance of objective functions
In this section, we present a simple numerical experiment in order to quantitatively analyse the robustness of objective functions based on generalized statistics. The experiment consists of estimating the coefficients of a linear polynomial, , from observed data contaminated by outliers. In this regard, we consider the independent variable within the range to generate numbers obeying a linear polynomial with coefficients and
. Then we contaminate the numbers generated with a Gaussian distribution with zero-mean and standard-deviation. In addition, the variable is contaminated with outliers in region , of , the outliers are given by where
is a Gaussian random variable.
Conventionally, this problem is treated by minimizing the square of the residuals based on Gaussian statistics [MENKE20121, weisberg2005applied]. However, as discussed in Eq. (8), the Gaussian estimate is not appropriate to solve problems with discrepant values. In this sense, we propose to estimate the coefficients using the objective functions of Rényi (Eq. (14)), Tsallis (Eq. (20)) and Kaniadakis (Eq. (26)). The values of the entropic index were used between , and . Inside each interval, uniform spaced values were taken.
The calculated models, , are compared with the ideal model to find for each objective function the best entropic index that achieve an optimal fitting. Figure 9 correlates the estimated values of the intercept, , and the slope, , with the employed entropic index. We can see that as we move away from the Gaussian limit, we find a better fit.
We compared the lines obtained with the estimated parameters, calculating the Mean Absolute Error, , between the calculated line, , and the ideal line, . The obtained results are: with index ; with and with . The represents the average deviation between predicted and reference values, and the best result is achieved for close to zero. In contrast, the calculated with the conventional objective function was .
With this test, we observe that the results improve as we move away from the Gaussian limit. In particular, we notice that in Fig. 9 the green curve corresponding to the Tsallis result shows the steepest decrease after the Gaussian limit (), while the red curve (Kaniadakis) presents a curve that slowly decreases after the Gaussian limit.
An analysis of the influence functions, Eqs. (15), (21) and (27), reveals that they are zero for , regardless of the choice of entropic indexes (obviously, disregarding the conventional limit). In addition, we notice that in the limits and the influence functions are merely a function that depends on the inverse of . Thus, the three objective functions, Eqs. (14), (20) and (26), are resistant to outliers and have an entropic index limit in which they are equivalent.
3 Numerical experiments
In this section, we present numerical experiments to demonstrate the outlier-resistance of the data-inversion method based on generalized statistics by considering an important problem that comes from geophysics, which is an important process to obtain estimates of subsurface properties. In particular, we address a problem of seismic inversion known as Post-Stack Inversion (PSI) [psi]. The goal of PSI is to estimate, from the observation of the seismic data, the acoustic impedance, which is a property of the rock defined as the product of the density of the rock and the speed of the acoustic wave in the subsurface [sen2006seismic].
The forward problem is formulated through the following relationship: . Here, represents the seismic data calculated by the parameters which is used to estimate the acoustic impedance . The operators and are described by [wu2020seismic]:
Where is the wavelet operator, which computes the convolution between the seismic signal and . Finally, represents the first order derivative operator. In addition, we can group these two matrices into a single operator . In this way, we compute the residuals between the observed data and the calculated data .
To analyse the outlier-resistance of the objective functions presented in Section 2, we consider a portion of the synthetic geological Marmousi2 [Versteeg_marmousi, MarmousiMartin] model as a benchmark (true model). In particular, we take into account the acoustic impedance model that consists of of depth and of distance as depicted in Fig. 10. The seismic source used was a Ricker wavelet [rickerSource, rickerSource2] with the peak frequency (the most energetic frequency).
We test the robustness of the generalized objective functions using seismic data contaminated with white Gaussian noise with low intensity (taking a signal-to-noise ratio equal toin all scenarios) and spikes (outliers) with different intensities, as shown in Fig. 11. The spikes were added by randomly choosing positions in the seismic data and adding peaks with intensities between and times the original amplitude, where is a Gaussian random variable. We considered noise scenarios, where the difference is in the percentages of samples contaminated by the outliers. In this regard, the number of samples were chosen from to of the data samples, with steps of .
For each spiky-noise scenario, we carried out data-inversions employing the -, - and -objective functions, Eqs. (14), (20) and (26), respectively with , and using 200 values for each of these parameters with uniform spacing between intervals. In total, for each objective function, we get results. To minimize the objective functions, we employ the conjugate gradient method [stiefel1952methodsCG, scales1997geoinversetheory], defining a maximum of 10 iterations and a tolerance error .
We consider the Pearson’s correlation coefficient [evans1996straightforward] as the statistical metric to compare the PSI results, which is defined as:
where and is the difference between the true and recovered acoustic impedance models, , and their respective averages, . The coefficient R assume values between and . The case of R close to zero implies in absence of correlation. The correlation being strong when it approaches one.
Figures 12 and 13 show the recovered acoustic impedance for conventional and generalized objective functions. We notice that the PSI results obtained with the conventional objective function are severely affected by the presence of outliers. On the other hand, at the limits , and the influence of spikes are minimized, and good estimates are obtained.
We summarize the PSI results for all numerical simulations performed in the present work in Fig. 14. In this figure, we remark that the objective functions present satisfactory results in , and limit cases, as predicted by the numerical experiment presented in Section 2.4. Indeed, in these limit cases, the generalized objective functions are robust tools capable of mitigate the influence of outliers, which leads to good results even for high spike contamination. In addition, it should be noted that the PSI results related with the reddish regions of the heatmap in Fig. 14 show a strong correlation regardless of the contamination rate employed.
In this work we explore robust methods based on the Rényi, Tsallis and Kaniadakis generalized statistics. Since the solution of the data-inversion strongly depends on the employed objective function, the generalized objective functions are indeed valuable for this purpose. In fact, given a proper choice of the entropic index, it is possible to get a robust objective function that handles errors that do not obey the Gaussian statistics.
In particular, we investigate a special example of non-Gaussian errors: outliers. In this scenario, we seek to answer some basic questions: (i) what is the most appropriate choice for entropic indexes? (ii) Which of the proposed methods is more resistant to outliers? For the first question, we find that there is a limit for every method in which the objective functions are able to ignore aberrant values without compromising the results. In addition, we note that the properties of the -, - and -generalized distributions and the respective objective functions have similar characteristics at the limit: .
To conclude, it is worth emphasizing that although these methodologies have been successfully employed in geophysical applications, our proposals are easily adaptable to a wide variety of parameter estimation problems. In this regard, we hope that the methodologies proposed in this work are of great value for the modelling of complex systems with numerous unknown variables, as the generalized objective functions are able to reduce the computational cost by accelerating the convergence of the process of optimization, as shown by analysing the influence function.
J.V.T. de Lima, J.M. de Araújo, G. Corso and G.Z. dos Santos Lima gratefully acknowledge support from Petrobras through the project "Statistical physics inversion for multi-parameters in reservoir characterisation" at Federal University of Rio Grande do Norte. J.M. de Araújo thanks Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) for his productivity fellowship (grant no. 313431/2018-3). G. Corso acknowledges CNPq for support through productivity fellowship (grant no. 307907/2019-8).
Conflicts of interest
The authors declare that they have no conflict of interest.
Author contribution statement
All authors contributed equally to this work.