First derivatives at the optimum analysis (fdao): An approach to estimate the uncertainty in nonlinear regression involving stochastically independent variables

by   Carlos Sevcik, et al.

An important problem of optimization analysis surges when parameters such as {θ_i}_i=1, ... ,k , determining a function y=f(x | {θ_i}) , must be estimated from a set of observables { x_j,y_j}_j=1, ... ,m . Where {x_i} are independent variables assumed to be uncertainty-free. It is known that analytical solutions are possible if y=f(x | θ_i) is a linear combination of {θ_i=1, ... ,k}. Here it is proposed that determining the uncertainty of parameters that are not linearly independent may be achieved from derivatives ∂ f(x | {θ_i})∂θ_i at an optimum, if the parameters are stochastically independent.



There are no comments yet.


page 15

page 28


Efficient Computation of High-Order Electromagnetic Field Derivatives for Multiple Design Parameters in FDTD

This paper introduces a new computational framework to derive electromag...

Consistent regression of biophysical parameters with kernel methods

This paper introduces a novel statistical regression framework that allo...

A nonlinear system related to investment under uncertainty solved using the fractional pseudo-Newton method

A nonlinear algebraic equation system of two variables is numerically so...

Matroid Partition Property and the Secretary Problem

A matroid ℳ on a set E of elements has the α-partition property, for som...

Fast Derivatives for Multilinear Polynomials

The article considers linear functions of many (n) variables - multiline...

Learning Combinations of Sigmoids Through Gradient Estimation

We develop a new approach to learn the parameters of regression models w...

Enhancing approximation abilities of neural networks by training derivatives

Method for increasing precision of feedforward networks is presented. Wi...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

An important problem of optimization analysis surges when it is desired to guess the parameters , which minimize or maximize a function such as


vertical bars mean given that and in this paper curly braces indicate sets. Eq. (1) is called objective function (obf) in optimization analysis (Rivett, 1973, pg. 5) where it represents a process (not necessarily including random components) whose parameters have to be determined for the process to operateat an optimum; econometric systems, water reservoirs and electric networks are some examples. Eq. (1) is also called regression function (rgf

) in regression analysis

(Reg, 2018) where a set of parameter values must be determined from a set of observables . In regression analysis are independent variates (called explanatory variables or predictors) which are usually assumed to be uncertainty-free and and their associated observed dependent variates (response variables) are . Analytical solutions are possible, and many are well known (see for example (Montgomery et al., 2015)), when involves a linear combination of . In the case of not linearly independent pairs (Seber and Wild, 1989; Schittkowski, 2002)

in spite of some empirical approaches subdividing data into subsets with quasi linear regression intervals

(Spearman, 1908; Kärber, 1931; Thompson, 1947; Miller and Urich, 2001; Miller and Ullich, 2004; Oosterbaan, 2017)

to deremine subset confidence intervals to estimate uncertainties, it remains true that:

“In general, there is no closed-form expression for the best-fitting parameters, as there is in linear regression. Usually numerical optimization algorithms are applied to determine the best-fitting parameters. Again in contrast to linear regression, there may be many local minima of the function to be optimized and even the global minimum may produce a biased estimate. In practice, estimated values of the parameters are used, in conjunction with the optimization algorithm, to attempt to find the global minimum of a sum of squares ” Non (2018).

The problem of local minima is inherent to the function fitted and cannot be avoided. Yet, efficient minimization algorithms, which start searching from a set of user provided parameters in case of many “well behaved functions” converge towards the global optimum if is within a certain boundary of the global optimum. There is, however, no analytical solution to the problem of knowing the width the boundary of guaranteed convergence to the global optimum (minimum or maximum) for a given rgf.

The gradient () represents the slope of function’s graph, that is points in the direction of the greatest rate of increase or decrease of a function (Apostol, 1969, Ch. 8). If an optimization process fitting the rgf reaches an optimum, , the following implication


is true. It is usually assumed that the rgf exactly describes the system whose optimum set of values is being determined, and that differences between model and data (called residuals) result from lack of accuracy in measuring a given which depends on .

Yet, data from the physical world always contains uncertainty which does not result from measurement error. The sources are many:

  • Observation instrument limitations make the measurements fuzzy. this is the case of the point dispersion function of optical instruments, which is dependent on the wavelength of light used to observe (Scalettar et al., 1996).

  • Impossibility of accurately measuring something, a classical example deals with the velocity and position of a particle (Heisenberg, 1927).

  • Observing reality with a scope (aim or purpose) modifies the object observed (Sassoli de Bianchi, 2013a, b), this is specially relevant to quantum physics, but applies to any measurement (draining current, compressing with a caliper, heating, etc.) to, hopefully, a minor extent.

  • Uncertainty is essential to life, otherwise any noxious factor would affect equally a whole species population making its extinction likelier, thus any parameter measured on living beings is significantly variable, uncertain, fuzzy (Conrad, 1977).

  • Ffuzziness appears also when the object measured changes more or less cyclically in time, the height of the Mont Blanc peak (like most other mountains) is a well known case Gilluly (1949); Evans (2015).

  • Since temperature () is , molecules vibrate and rapidly change between conformations and molecular properties are fuzzy too (Sherwood, 1972; Mörters, 2009; Mörters and Peres, 2010).

  • In high energy physics the existence of a particle was evidenced by an energy peak which had to be differentiated from background noise (The CMS Collaboration, 2015).

  • All processes of chemical or electrical intercellular communication are stochastic in nature (Del Castillo and Katz, 1954; Armstrong and Bezanilla, 1974; Fishman et al., 1975; Neher and Sakmann, 1976).

Figure 1: Plot of the Minkovski sumset in Eq. (4). Dotted arrow represent

bijection as a vector. A dashed arrow drawn to stress graphically that

and are bijections one of them representing objective function parameters at the optimum, and, the other one, their inherent uncertainties. All sets in the figure have same cardinality , , which equals the number of parameters required by the objective function. ‟˝, indicates the origin of coordinates. Other details in the text of the communication.

In all these instances observations randomly vary, not due to experimental “error” but due to the stochastic nature of the process under study by it self. This is most likely the case for processes such as the ones described with the Hill [Eq. (17)] or Boltzmann [Eq. (45)] equations, when these equations are used mechanistically, not just as curve fitting processes. The inter-molecular reaction parameters of the Hill equation are scholastically independent, the maximum effect of speed of catalysis () does not depend on the affinity constant of the reactants () and none of them depends on the molecularity of the reaction (number of molecules of one kind reacting with a molecule of another kind, ) Ariëns et al. (1964); Segel (1975); Érdi and Tóth (1989). A similar reasoning can be used in connection with the Boltzmann equation [Eq. (45)] where and are mechanistically independent. In both cases rgf parameters are causally independent. Briefly unther these conditions we have

Condition 1.

where indicates stochastic independence, the concept of causal independence between variables or factors (Hogg and Craig, 1978; Spohn, 1980; Ghorbal and Schürman, 2002), a notion which has also recently been used in fields such as quantum thermodynamics (abd M. P. Muller and Pastena, 2015; Lostaglio et al., 2015; Goold et al., 2016), is opposite to meaning stochastic dependence. We may write a function representing these conditions based on Eq. (2)


, is a Minkowski sum Weisstein (2018) of two independents sets such as


as iluistrated in figure 1, Eq. (4) is a bijection for which the following holds:


In Eq. (3),

is a random variable with a location parameter

with variability depending on a parameter . This suits “pathological distribution”, such as the Cauchy probability density function (pdf

), whose statistical central moments are undefined (have no meaning) and thus neither its population mean nor its variance are defined

(Cramér, 1991; Pitman, 1993; Wolfram, 2003), for which , the median and its width factor

. When dealing with other probability functions having defined central momentsdd, Eq. (

3) may be rewritten making where is the mean and is the variance. Since a gradient is a linear combination of all the function’s first derivatives (Apostol, 1969), at an optimum :


In tems of finite diferences


and thus


which shows that it is possible to predict small changes in parameter from small changes of ref when the regression function is at an optimum, but fluctuates stochastically, if the partial derivative of the regression function about its parameters known.

If, fluctuates stochastilly about , the fluctuations are the residuals, :


This paper proposes that the higher a partial derivative respect to in Eq. (6) is, the more it contributes to rgf changes. Combining Eqs. (8) and (9) as


a set of parameter fluctuation estimates.

In this work the set is used to determine how much of the empirical rgf uncertainty at an optimum is contributed by each parameter, and methods are presented to enable statistical comparisons between those parameters determined under different experimental conditions.

2 Methods

2.1 Monte Carlo random variable simulation.

To test the goodness of fitting curves to data, random data with known statistical properties were generated using Monte Carlo simulation (Dahlquist and Björk, 1974). For this purpose sets of pairs were generated as


where, as said, is the variance and is the Cauchy pdf scale factor. Thus for population having defined nean and variance:


When needed, Gaussian pseudo-random vaiables were generated using the Box and Muller (1958) algorithm as modified by Press et al. (1992). Fundamental to all Monte Carlo simulations (Dahlquist and Björk, 1974) is a good uniform (pseudo) random (PRNG) number generator. Data for all numerical simulations carried out in this work were produced using random numbers (

) with continuous rectangular (uniform) distribution in the closed interval [0,1] or

. All were generated using the 2002/2/10 initialization-improved 623-dimensionally equidistributed uniform pseudo random number generator MT19937 algorithm (Matsumoto and Nishimura, 1998; Panneton et al., 2006). The generator has passed the stringent DIEHARD statistical tests (Marsaglia et al., 2003; Bellamy, 2013). It uses 624 words of state per generator and is comparable in speed to other generators. It has a Mersenne prime period of ().

The MT19937 requires an initial starting value called seed. The seed used was a 64-bit unsigned integer obtained using the /dev/random Linux PRNG , which saves environmental noise from device drivers and other sources into an entropy pool. Device /dev/random gets temporarily blocked, and stops producing random bytes, when the entropy of the device gets low, and commences producing output again when it recovers to safe levels. No such delays were perceived during this work. Using /dev/random seed makes exceedingly unlikely () that the same sequence, , of is used twice. Calculations were programmed in C++ using g++ version 5.4.0 20160609 with C++14 standards, umder Linux Mint version 18.2 running on an Apple MacBook Air computer with 8 GB RAM, Intel ® CoreTM i7-4650U CPU @ 1.70 GHz × 4 with a 500 GB disk.

2.2 Statistical procedures.

2.2.1 Fitting functions to data.

Functions were adjusted to data using a simplex minimization (Nelder and Mead, 1965). The simplex procedure was designed to minimize diferences between empirical data assumed to obey a function such as , where is a set of observables, and a model function . In this work the simplex was designed to minimize


instead of the least squares procedure (Montgomery et al., 2015)

. Least squares give too much weight to outliers which may be due to random data variability (Such as in the Cauchy distribution case, see

3.3.2), but could stem from gross deviation from a prescribed experimental procedure or to error in calculating or recording numerical values.

2.2.2 Details of simplex optimizations.

Simplex parameter initialization.

Simplex algorithm requires initial parameter values, , and an initial increment value, . is the initial fraction to change the parameters which is subsequently modified by the algorithm as optimization continues (Nelder and Mead, 1965).

Criteria to stop optimizations.

Optimization continued until one of the following conditions was fulfilled:

Condition 2.

Keep looping while


with calculated as indicated in Eq. (13). In Eq. (14), indicates loop number.


Condition 3.

Keep looping while in Eq. (14) was used to stop optimization, to prevent the algorithm from running forever when Condition 2 was not possible or would take too long to reach.

Simulation of residuals.

The simplex was implemented to provide a set , used to calculate the uncertainties of estimated as described in Section 3.

2.2.3 On statistical procedures utilized.

Gaussianity of data was tested witth the Jarque-Bera test, which allso provides data on skewness and kurtosis of data

(Bera and Jarque, 1981) and with the Shapiro-Wilks test (Shapiro and Wilk, 1965). Unless otherwise is indicated, data are presented as medians and their 95% confidence intervals (95% CI) calculated using nonparametric Moses (Hollander and Wolfe, 1973) statistics. Other data are presented as medians and their 95% confidence interval calculated with the procedure of Hodges and Lehmann (Hollander and Wolfe, 1973). Statistical significance of differences was decided with Mann–Whitney (Wilcoxon) test. Multiple comparisons were done with the nonparametric Kruskall-Wallis analysis of variance. See Hollander and Wolfe (1973) for all not specified details of nonparametric methods used. Statistical differences between samples were considered significant when the probability that they stem from chance was () (Ioannidis, 2005; Benjamin et al., 2018).

3 Results and discussion.

Figure 2: Cell death induced by fractions isolated from Polychirun constellatum (Savigny, 1846) in 4T1 breast cancer cell cultures (Quintana-Hernández, 2017). Percent of death calculated with Eq. (53). Ordinate is the percentage of dead cells, abscissa is concentration ([D] in mg/mL) of fraction tested, plotted in decimal logarithmic scale. Data presented as medians () and their 95% confidence interval (bracket lines) calculated as indicated by Hodges and Lehmann (Hodges, Jr. and Lehmann, 1963). Straight lines were used to connect medians to help interpretation. The number of data processed for each fraction concentration was (nb = 10, nd = 48, nf = 5). Other details in the text of the communication and Appendix D.
Figure 3: Cell death induced by fractions isolated from P. constellatum (Savigny, 1816) in 4T1 breast cancer cell cultures (Quintana-Hernández, 2017). Ordinate was clipped at -20% to increase visibility of the biologically meaningfull range of effects. All other details are equal to Figure 2 except for the sigmoid curves drawn which were calculated fitting Eq. (18) to the data with a simplex optimization procedure minimizing deviations between curves and data as indicated by Eq. (13). Values used to draw the curves are in Table 3.1. The simplex algorithm was initialized with the same set of values for the five fractions:

3.1 A challenging data set obtained with a procedure commonly used in cell biology.

Fraction y0 ym Km n
(%) (%) (mg/mL)

FII () ()
FV () () ()

Parameters of the modified Hill Equation (18): , offset parameter; , maximum effect; , concentration producing half maximum effect and , is called Hill coefficient or molecularity in some pharmacology and enzymology work (Segel, 1975). The simplex algorithm was initialized with the same set of values for the five fractions: ; was set as 0.1. All data presented as medians and their 95% CI between parenteses. Confidence intervals calculated with the Hodges and Lehman (Hodges, Jr. and Lehmann, 1963) procedure based on data determined as indicated in relation with Eq. (10). Sizes of were: FI, : FII : FIII, ; FIV, and FV, . Differences in were due to sample sizes and data peculiarities due to which Eq. (52) produced undefined values called NaN (Not a Number, such as or ) in C++ or in values such as or called inf in C++. NaN and inf results were eliminated from the calculations. Other details in the text of the communication.

Table 1: Parameters characterizing the regression curves in Figure 3.

The data shown in Figure 2 (taken from (Quintana-Hernández, 2017)) are effects of several fractions (FI – FV) isolated from P. constellatum (Savigny, 1846) which were able to kill 4T1 breast cancer cells. Apparent effects calculated with Eq. (53) are presented in the ordinate, versus fraction concentration indicated in the abscissa (in mg/mL).

There are several oddities in the data, the effects at some concentrations are very disperse (s indicated by the brackets representing 95% CI), and, notably at low concentrations, they indicate negative percentages of death. At the lowest concentrations even median values are slightly, but significantly, bellow zero; this could be expected if the background correction [Eqs. (50) and (51)] actually overcorrects absorbance data. The large variability is most likely a result of subtractive cancellation, combined with the quotient represented by Eq. (52) which are prone to produce variance in [Eq. (52)] and [Eq. (53)].

Gaussianity tests suggests that all data set in the figures were leptokurtic and skewed. The Jarque-Bera test (Bera and Jarque, 1981) (based on data skewedness and kurtosis) indicated that the probability of data sets in the figures are Gaussian is . The Shaarepiro-Wilk test (Shapiro and Wilk, 1965) indicated the same low probability of Gaussianity ().

3.1.1 An example using a modified Hill equation.

Experimental method used in (Quintana-Hernández, 2017) to estimate cell mortality shown in Figures 2 and 3 (Mosmann, 1983; Denizot and Lang, 1986; Swinehart, 1962) have been cited at least 47278 times in the literature (July 22, 2017, source:

). This indicates that, in spite of its odd management of uncertainty, the method is believed useful by a substantial number of researchers. Notwithstanding the oddities of data in Figure

2, there are several features evidenced by the medians: all five fractions increased cell mortality as concentration raised, and in all cases there is a sigmoid aspect of the dose–effect semilogarithmic plots.

The first example using the theory expressed by Eqs. (6) – (10) is fitting data to a modified Hill (1910, 1913) equation. When the Hill equation is plotted as effect versus concentration’s logarithm, a sigmoid curve is produced. In its classical form, the Hill equation is used in enzyme kinetics and in pharmacology to represent the interaction of one or more molecules of subrate with the catalytic site of an enzyme, or of a drug’s molecule with its receptor site; it derives from the mass action law (Rang, 2009):


When the effect of a drug or the rate of enzyme cathalysis (), depend linearly on molecules of D binding receptor R, the followng holds (Rang, 2009):


Under these conditions is called the molecularity of the reaction. Also, is used in situations where properties of the enzyme or drug receptor are modified during the interaction, the, so called, cooperative schemes, where is plainly named Hill coefficient (Monod et al., 1965; Segel, 1975).

The Hill equation (Hill, 1913) in its original form is


which does not include a term for “offset,” occurring when .

Since data in the figures seems to include an overcorrection for the basal absorbance, this modified Hill equation will be used. as a particular case, in our analysis:


its first derivatives on are given in Section B.1 as Eqs. (31) – (34).

Figure (3) shows the results of adjusting Eq. (18) to the data of Quintana-Hernández (2017). In all cases the simplex optimization started from the same set of values: and . Since Eqs. (50) – (52) produce 24000 points per concentration, the number of pairs in each fraction’s regression analysis ranged 120000 – 144000 in the plots shown in Fig. 3. Interestingly, the curves in Fig. 3 follow, rather closely, the median percent of dead cells at each concentration in all the plots. This is particularly clear for FI and FV. The parameter values describing the curves are in Table 3.1. The curves in Figure 3, and the sets of data in Table 3.1 “look good” but uncertainty estimator for the parameters are necessary to properly state which fraction effect differs from which fractions, specially if the outliers sugested by the 95% CI and “skewness” analysis are considered.

Medians and their 95% CIs of Quintana-Hernández’s compounds’ (Quintana-Hernández, 2017) sets [Eq. (10)] are presented in Table 3.1. All sets used to guess residuals in Table 3.1 were tested for Gaussianty with the Shapiro-Wilk and Jarque-Bera methods, and both procedures predicted a probability that any of the sets is Gaussian.

Fraction Loops Parameter Sk Ku Range
ym ,

ym ()
ym ()
ym ()

Parameters of the modified Hill Equation (18): , offset parameter; , maximum effect; , concentration producing half maximum effect, , is called Hill constant; Ku, kurtosis and Sk, skewness. Range, skewness and kurtosis have the usual statistical meanings. Loop, indicates the number of times a parameter was changed during the simplex optimization (Nelder and Mead, 1965). For fractions I, II, III and V optimización stopped when when Condition 2 was fulfilled. In case of FIV the optimization was topped fulfilling Condition 3 after 3 h attempting unsuccessfully to fulfill Condition 2. See the text for further discussion.

Table 2: Parameters characterizing the sets used to calculate parameter uncertainty in Table 3.1. was set as 0.1.

3.1.2 An insight on the complexity of the data used for this example

Table 3.1.1 presents data on the number of iterations required by the simplex algorithm to converge to the optimum reaching Condition 2. The only exception is data for FIV, which after 1024007 (number labeled with an asterisk in the table) loops, was still unable to reach Condition 2 and after h of iterations (in the author’s computer) the process was stopped after reaching Condition 3. The table also presents some statistical properties of the sets used to calculate the uncertainty of the parameters characterizing curves fitted to data in Figure 3.

The data in the table indicates that in all cases presented sets are highly leptokurtic and very skewed, and in some cases (as indicated by the ranges presented at the leftmost column of the table), very wide ranges indicated that extreme values were observed. These extreme values are in all likelihood due to subtractive cancellation in Eqs. (50) and (51) combined with division by very small numbers in Eq. (52) and by the nature of the distribution of ratios per se (see Section 3.2.1). Interestingly, most data points seem closely packed around the median value, since the 95% CI of the medians are narrow.

r Simulated Predicted Range Loops Sk Kr
3 y0
ym ()
10 y0
ym ()
100 y0
ym ()
2000 y0 ()
ym ()
n 2

The concentrations required by Eq. (18) were defined as: , , , , , and ; is the number of values used to calculate medians, 95% confidence in terval and ranges. Parameters and heading have same meaning as used in Table 3.1.1; indicates the number of random Cauchy values of type [Eq. (23)] which were simulated for each concentration. Please notice that the units of are irrelevan as long as they are equal to the units of [D]. See the text for further discussion.

Table 3: Fitting Cauchy data generated with Eq. (23) setting and simplex optimization to the modified Hill equation. In all cases the optimization started with , , and ; was set as 0.1.

Table 3.2.2 presents Monte Carlo simulation of Gaussian random data distributed about Eq. (18). Data was calculated setting the concentration term at the following values (in arbitrary units): , , , , , and . The number of replicates were simulated for each concentration were: 3, 10, 100 and 2000.

3.2 Monte Carlo simulation of data described by Hill’s equation modified as in Eq. (18).

3.2.1 Fitting Cauchy-distributed data to the modified Hill equation.

The analysis in Section 3.1.2 suggests that using the first derivatives of the regression function may produce confidence limits for stochastically independent parameters obtained from non-linear regressions. To simulate this kind of data with Monte Carlo methods we must consider the statistical properties of a quotient of two Gaussian random variables having and variance , ,distributed following the Cauchy distribution (also called Lorentz, Cauchy-Lorentz or Breit–Wigner distribution) (Cramér, 1991; Pitman, 1993; Wolfram, 2003) which has a pdf



. The probability distribution function (

PDF) is


which is symmetric about , the median and mode of the distribution. The maximum value or amplitude of the Cauchy pdf is , located at , is called the scale factor. Using Eq. (20), it is easy to calculate the probability , thus is the % CI of , the 69% CI (like the CI in Gaussian statistics) is . The broadness the Cauchy distribution “shoulders” becomes quite evident when a 95% CI is calculated, for Caucy variables, wider than for Gaussian variables.

All statistical central moments, mean, variance, kurtosis and skewedness, of the Cauchy distribution are undefined: they lack any meaning. The Cauchy distribution is considered an example of a “pathological” distribution function. Thus even when populations and are Gaussian,, population [see Eqs. (5052)], their quotient, will be “pathologically” distributed and its mean and variance will be undefined. Sample values will be symmetrically distributed about , but empirical sample means () will be increasingly variable as the number observations increases, due to increased likelihood of encountering sample points with a large absolute value (“outliers”). A similar situation applies to empirical sample variance, . Neither nor provide any information on the pdf. The distribution of will be equal to outlying observations distribution; i.e., is just an estimator of any single outlying observation from the sample. Similarly, calculating will result in values which grow larger as more observations are collected (Rothenberg et al., 1964; Fama and Roll, 1968; Lohninger, 2012).

Lemma 1.

[Wilks (1962), pg. 156] If is a random variable having a PDF then the random variable has the rectangular distribution .


This follows at once from the fact that the PDF of is


which is the pdf of the rectangular distribution . ∎

Wilks’ Wilks (1962) , is denoted in this paper. Lemma 1 enables to simulate random variables distributed as using uniform random variables and the following expression

Figure 4: The Cauchy probability distribution an a set of representative empirical distribution functions for FIII from Quintana-Hernández (2017) A- Cauchy probability density [Eq. (19)] function calculated with and . B- Cauchy probability distribution function . C- Several empirical probability distribution functions (Shorack and Wellner, 1986; Pitman, 1993) estimated for FIII at concentrations of 0.003, 0.01, 0.03 and 1 mg/ml. This curves were selected because are representative of the ones obtained with other fractions. The empirical curves were plotted after subtracting the median of each killed cells fraction from the remaining values in the set. Other details in the text of the communication.

Figure 4 presents a plot of a Cauchy probability density function (pdf) [Eq. (19)] calculated with and (Panel 4A), and a Cauchy probability distribution function (PDF) [Eq. (20)] also calculated with and (Panel 4B). Also in Figure 4 (Panel 4C) is a selection of empirical distribution functions (Shorack and Wellner, 1986; Pitman, 1993) determined for the sets of killed cells fractions observed with FIII (Quintana-Hernández, 2017); this set was representative of other observed with the remaining fractions in Figures 2 and 3. Sets and [Eqs. (50) and (51)] were found not to be Gaussian using the Jarque-Bera (Bera and Jarque, 1981) and Shapiro-Wilks (Shapiro and Wilk, 1965) test, this only means that in addition to their most likely “pathological” distribution the precise nature of this distribution remains unknown. Yet, Figure 4C suggests that the empirical PDF of data in Figures 2 and 3 resemble the Cauchy PDF in Figure 4B.

To test the procedure discussed in this paper Cauchy-distributed data sets were generated using Monte Carlo simulation combining Eqs. (18) and (22) as


Eq, (23) was used to generaate data sets, processed as Quintana-Hernandez’ data Quintana-Hernández (2017)(Section 3.1.1).

Some results of the fits of Eq. (18) to Cauchy data appear in Table 3.1.2 together with their apparent sample skewness (Pearson, 1894; Pearsons, 1963) calculated as


and their apparent sample kurtosis calculated as


where is the apparent sample mean. The definition represented by Eq. (25) is presented here since there are controversies and discrepancies in the definition and interpretation of “kurtosis” and “excess kurtosis” or “Pearson’s kurtosis”, in the literature (Pearson, 1905; Faleschini, 1948; Pearsons, 1963; Proschan, 1965; Darlington, 1970; Ali, 1974; Johnson et al., 1980; Moors, 1986; Ruppert, 1987; Westfall, 2014), kurtosis is used here in sense of Moors (1986):

“High kurtosis, therefore, may arise in two situations: (a) concentration of probability mass near (corresponding to a peaked unimodal distribution) and (b) concentration of probability mass in the tails of the distribution.”

In spite of their undefined central moments, both conditions occur in Cauchy-distributed variables if mean is replaced by median in the preceding Moors’ quote (see Figure 5). Estimated and were incompatible to the ones expected for Gaussian variables (), this is not surprising since the data subject of the simplex optimization were generated for Cauchy-distributed random variables. The values of indicate that the parameter estimates are asymmetrically distributed about the mean (median?), and increases with sample size. Thus the parameter estimates are not exactly Cauchy-distributed either (Cramér, 1991; Pitman, 1993). Data in Tables 3.1.2 through 3.2.2 are very leptokurtic, clustered about the median, but extreme values are observed as indicated by the parameters’ ranges (DeCarlo, 1997). Yet, in spite of the wide ranges, 95% CIs are relatively narrow suggesting strong clustering of data arround the median. Raising the initial did not improve consistently the final estimate of , and worsened the estimations of the other parameters too.

Figure 5: Gauss and Cauchy probability density (pdf, left panels) and probability distribution (PDF, right panels) functions. The functions are centered at the Gauss mean () and the Cauchy median (). Upper left: Gauss pdf; upper right: Gauss PDF; lower left: Cauchy pdf; lower right: Cauchy PDF. Numbers near the Gauss curves indicate the value of the variance (1, 2, 3 or 4) for each curve, numbers near the Cauchy curves indicate yhe value of the width factor ( 1, 2, 3 or 4). The figure enables a naked eye comparison of the two distributions, and shows that the Cauchy distribution has a sharper peak (most evident when in the figure) at the median () of the distribution, and has broader shoulders (particularly evident in the Cauchy PDF plot) as ), both factors explain apparently higher kurtosis () sample estimates. Other details in the text of the communication.

Table 3.1.2 shows that the parameters used to simulate data distributed as Cauchy are well predicted if a simplex optimization is used. At least if the simplex is initiated with a “reasonable” set of parameters .

Table 3.2.1 presents data generated with Eq. (23) using a parameter set , and the simplex optimization started from as in Table 3.1.2. As seen in the Table 3.2.1, were well estimated. Yet, in all instances presented in Table 3.2.1.

To check if starting the simplex from higher values of improves estimate, in Table 3.2.2 are results calculated for exactly the same Monte Carlo data used in Table 3.2.1, but starting the simplex optimization with . As indicated in Section B.1 and shown in Figure 6, raising the initial value of did not improve the estimation of the parameter. More surprising, rising the initial value of n worsened the estimates of all the other parameters characterizing Eq. (18). Taken togheter the results in tables 3.2.1 and 3.2.2, suggests that is the most difficult parameter to estimate in Equation (18). The difficulty to estimate correctly agrees with the discussion on in Section B.1.

In Tables, specially in optimizations with larger , some sets of Monte Carlo simulated data did not reach Condition 2, and the optimization stopped on Condition 3 (numbers with asterisks), this appears once in each Table, but each of these data sets could have been replaced by results obtained for other pseudorandom sets of data, generated under similar conditions. where Condition 2 was indeed achieved. I.e., failure to comply with Condition 2 did not always occurred. Instances where Condition 3 stopped calculations were included in the tables to show that they do occur. Athough in Tables all set of parameters are apparently leptokurtic and skewed, increased with the number of points () used for each simulated “concentration” in the optimizations, did not increase as much as but was noticeably large wit . As previously said, and depend on the 2nd, 3rd and 4th central moments of a distribution which do not exist for the Cauchy pdf, thus sample values cannot converge towards any population value as required by sampling theory (Wilks, 1962), since population mean, variance, skewness and kurtosis do not exist for Cauchy distributed data. Sample variance and mean, grow, more or less randomly, with sample size (Section 3.2.1). This makes the mean to vary considerably even when several hundred or thousand random, Cauchy dystributed, numbers are averaged (Rothenberg et al., 1964; Fama and Roll, 1968; Lohninger, 2012). Yet, with all the uncertainties, the and estimates clearly indicate that data in tables are non-Gaussian.

r Simulated Predicted Range Loops Sk Kr
3 y0 (-5.5,  41.0)
ym 100 () ()
n 5.352
10 y0
ym ()
4.995 30.0
100 y0 -5 -5.9
ym ()
n 15
2000 y0 ()
ym ()
Km 12272
n 15 106.3 12272

Parameters and heading have same meaning as used in Table 3.1.1; indicates the number of random Cauchy values fitted to Eq. (18) which were exactly the same used in Table 3.1.2. Please notice that the units of are irrelevan as long as they are equal to the units of [D]. Parameter names () with a bullet () indicated that, tested with the Jarque-Bera test, the probability that the parameter is not Gaussian by chance is not too low (). Number of loops with an asterisk indicates the optimization stopped after fulfilling Condition 3. Other conditions as in Table 3.1.2. Please notice that in all instances. See the text of the communication for further discussion.

Table 4: Fitting Cauchy data generated with Eq. (23) setting and simplex optimization to the modified Hill equation. In all cases the optimization started with , , and ; was set as 0.1.

3.2.2 Fitting Gauss distributed data to the modified Hill equation.

To evaluate the behavior of Eq. (18) when Gaussian variables are adjusted to the equation, normal variates generated as


and used to test the ability of the method proposed here to determine the parameters, with , in which and were the simulated effects’ sample variances and means, respectively. Tables 3.2.2 to 3.2.2 follow the same sequence of simulated sets changes and initial values as data in Tables 3.1.2 to 3.2.2, but the data adjusted to Eq. (18) was Gaussian and generated as indicated in the prior paragraph. As in the case of Cauchy distributed data in Section 3.2.1, are well estimated using the simplex minimization described. Yet, ( symbol is used to mean very significantly different) in all instances presented in Table 3.2.1, this suggests that is the most difficult parameter to estimate if the initiating value of in the simplex is very different from the value used in the Monte Carlo simulation.

(used as initiation value in Tables 3.1.2 to 3.2.2) means that the initial parameters begin increasing by 10%. In calculations, not published here, the simulations in Tables 3.1.2 to 3.2.2 were carried out setting , and yet , and the estimates of the other parameters became worse. This agrees with the discussion about Eqs. (31) to (34) in Section B.1 which holds for Gaussian random variables, since the discussion in Section B.1 is distribution independent.

r Simulated Predicted Range Loops Sk Kr
3 y0 -5 8.64 (-5.45, 41.02) (-18.1, 95.9) 31829 1.260 3.147
ym 100 93.7 () () -4.072 18.0
Km 0.01 0.20 (0.06, 0.52) (-0.07, 1.07) 1.260 3.147
n 15 11.84 (11.70, 12.17) (11.6, 12.72) 1.260 3.147
10 y0 -5 2.38 (1.22, 4.05) (-357.8, 62.1) 2176 -4.995 30.0
ym 100 84.6 (, 92.0) () -4.89 30.79
Km 0.01 -0.007 (-0.023, 0.007) (-0.334, 0.363) -0.352 8.142
n 15 9.46 (9.45, 9.48) (9.3, 9.8) -0.352 8.142
100 y0 -5 -4.19 (-4.50, -3.88) (-1925.1, 972.1) 1060 -12.2 255.4
ym 100 97.61 (96.66, 98.39) () 25.6 669.7
Km 0.01 0.006 (0.003, 0.009) (-19.2, 9.8) -12.2 255.4
n 15 14.99 (14.99, 15.00) (-4.2, 24.8) 106.3 255.4
2000 y0 -5 26.5 (26.2, 26.8) () 2010 32.9 12272.1
ym 100 59.3 (28.4, 63.6) () -113.6 13261.5
Km 0.01 0.07 (0.06, 0.07) (-197.2, 1105.3)