Jensen-Shannon Divergence as a Goodness-of-Fit Measure for Maximum Likelihood Estimation and Curve Fitting

09/28/2018 ∙ by Mark Levene, et al. ∙ Vilnius University Birkbeck, University of London 0

The coefficient of determination, known as R^2, is commonly used as a goodness-of-fit criterion for fitting linear models. R^2 is somewhat controversial when fitting nonlinear models, although it may be generalised on a case-by-case basis to deal with specific models such as the logistic model. Assume we are fitting a parametric distribution to a data set using, say, the maximum likelihood estimation method. A general approach to measure the goodness-of-fit of the fitted parameters, which we advocate herein, is to use a nonparametric measure for model comparison between the raw data and the fitted model. In particular, for this purpose we put forward the Jensen-Shannon divergence (JSD) as a metric, which is bounded and has an intuitive information-theoretic interpretation. We demonstrate, via a straightforward procedure making use of the JSD, that it can be used as part of maximum likelihood estimation or curve fitting as a measure of goodness-of-fit, including the construction of a confidence interval for the fitted parametric distribution. We also propose that the JSD can be used more generally in nonparametric hypothesis testing for model selection.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

Code Repositories

pyjsd

Python implementation of the Jensen-Shannon divergence


view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

We assume a general scenario, where we have some data from which we derive an empirical distribution that is fitted with maximum likelihood [33] or curve fitting [11] to some, possibly parametric distribution [26].

The coefficient of determination, [32]

, is a well-known measure of goodness-of-fit for linear regression models. Despite its wide use, in its original form, it is not fully adequate for nonlinear models,

[1], where the author recommends to define as a comparison of a given model to the null model, claiming that this view allows for the generalisation of . Further, in [38] the inappropriateness of for non-linear models is clearly demonstrated via a series of Monte Carlo simulations. In [5], a novel

measure based on the Kullback-Leibler divergence

[7] was proposed as a measure of goodness-of-fit for regression models in the exponential family.

Alternative nonparametric methods have also been proposed. In particular, the Akaike information criterion () and its counterpart the Bayesian information criterion () [4, 41], are widely used estimators for model selection. Both AIC and BIC are asymptotically valid maximum likelihood estimators, with penalty terms to discourage overfitting. The likelihood ratio test is also an established method for model selection between a null model and an alternative maximum likelihood model [42, 27]. Despite the popularity of maximum likelihood methods, there is some controversy in their application as goodness-of-fit tests [16].

The Jensen-Shannon divergence () [28, 9] is a symmetric form of the nonparametric Kullback-Leibeler divergence [7]

, providing a measure of distance between two probability distributions. It has been employed in a wide range of applications such as detecting edges in digital images

[13], measuring the similarity of texts [31]

, training adversarial neural networks

[14], comparison of genomes in bioinformatics [37], distinguishing between quantum states in physics [29] and as a measure of distance between distributions in a social setting [10].

Here we apply the as an alternative measure of goodness-of-fit of a parametric distribution, acting as the model, to an empirical distribution, which comprises the raw data. The provides a direct measure of goodness-of-fit without the need of the maximum value of the likelihood function, as used in the AIC and BIC, or any linearity assumptions of the model being fitted, as often made when using .

The rest of the paper is organised as follows. In Section 2, we introduce the and some of its characteristics. In Section 3, we define the as a measure of goodness-of-fit within the context of distribution fitting and define the notion of the factor. In Section 4, we describe some experiments we did, with simulated data in Subsection 4.1 and empirical data in Subsection 4.2, to test the viability of using the as a measure of goodness-of-fit. Finally, in Section 5, we give our concluding remarks.

2 The Jensen-Shannon Divergence

Let and be finite distributions, and be a mixture distribution of and . Then the between and , denoted by , is given by

(1)

where the entropy of a distribution , denoted by , is defined as

(2)

We note that the is bounded between and [9], and can thus be readily normalised; for convenience we will assume that the is normalised. Moreover, the may still be used if and are improper, i.e. their sum is less than one; see for example [10]. We further note that in order for the to be a metric we need to take the square root of [9], and thus whenever we compute the value of the we will assume that its square root is taken.

The intuition behind the is as follows. We have knowledge of both distributions and and we would like to know how distant they are from each other. In order to do so, we set up a simple experiment, where we take a sample of length one from the mixture distribution . Now, how much information do we gain from observing ? The answer is exactly . If we are none the wiser, i.e. could have equally come from or , then , and for all intents and purposes we consider to be equal to . On the other hand, we may have some information about whether comes from and , and the information we gain on a scale between and is exactly .

Now, let

be a discrete random variable associated with

, and let

be a binary indicator random variable, which is associated with

if and with if . As in [15], it can be shown that

(3)

where

(4)

and is the conditional entropy of conditioned on [7].

It is worth noting that, asymptotically,

is distributed as a quarter of a Chi-squared distribution

[19] with degrees of freedom [9], i.e.,

(5)

where the right-hand side of (5

) is the Chi-squared goodness-of-fit test statistic

[40]. When the number of degrees of freedom is large a Normal approximation of the Chi-squared statistic is often used [44]; also see [3] for an application of this approximation.

The extends to cumulative distributions in natural manner by replacing probability mass functions with their cumulative counterparts. More specifically, this is formalised on using the extension of the Kullback-Leibler divergence [7] in [45, Definition 2.1] to cumulative distributions, and the fact that

(6)

where the Kullback-Leibler of two distributions, and , denoted by is defined as

(7)

An important fact to note is that the square root of the cumulative version of the is also a metric [35], and thus it essentially possesses the same properties as the non-cumulative , but with a different normalisation constant. Moreover, it is often advantageous to use the cumulative distribution instead of the probability mass function as it may be easier to interpret and manipulate, and it also acts to smooth the data. For these reasons, we prefer to employ the cumulative in our experiments, and from now on, for convenience, will not make any distinction between the two versions, refering to both simply as the .

In the experiments we will make use of the bootstrap method [8], which is a technique for computing a confidence interval that relies on random resampling with replacement from a given sample data set. The bootstrap method is usually nonparametric, making no distributional assumptions about the data set employed.

3 The Jsd as a Goodness-of-Fit Measure

Making use of the as a measure of goodness-of-fit is quite straightforward. Assume that is a sample from a parametric distribution , with parameters , and that is fitted with maximum likelihood [33] or curve fitting [12] to an empirical distribution, .

The goodness-of-fit of the distribution , with parameters , to the empirical distribution is now defined as

(8)

where is a finite distribution, which is distributed according with parameters . We note that (8) does not restrict the , and so it is also possible to measure one empirical distribution against any another.

The Bayes factor

[20]

is a method for model comparison, taking the ratio of the models representing the likelihood of the data under the alternative hypothesis and likelihood of the data under the null hypothesis. In particular, the Bayes factor is advocated as an alternative method for null hypothesis significance testing, which depends only on the data and considers the models arising from both the null and alternative hypotheses

[18].

The JSD factor is reformulation of the Bayes factor with the , defined as

(9)

which is the odds ratio of choosing the alternative hypothesis,

, in preference to the null hypothesis, .

4 Experiments and Analysis

To assess the use of the as a goodness-of-fit measure we provide experimental results with simulated data of various parametric distributions including the Uniform, Normal, Log-normal, Exponential, Gamma, Beta, Weibull, Pareto [26] and -Gaussian [39] distributions. Our methodology for the experiments with simulated data (see Subsection 4.1) was as follows:

  1. First we generated a data set, say , of size from a given distribution, say , with chosen parameters, say , which was then taken to be the empirical distribution.

  2. We then considered to be distributed according to a hypothesised distribution, , where may not be the same as , and used the maximum likelihood method to obtain the parameters of , say , assuming its distribution was . (Obviously, if , then is expected to be very close to .)

  3. Next, assuming that was distributed according to with parameters , we generated a second data set, , from distribution with parameters .

  4. Finally, we evaluated as a measure of the goodness-of-fit of , with parameters , to , and computed a 95% confidence interval for the from 1000 bootstrap resamples using the basic bootstrap percentile method [8, Section 5.3.1].

For the experiments with empirical data sets we followed the same methodology, with the difference that the data set was an empirical data set rather than a generated one.

For each set of experiments we followed the methodology described above for several possible alternative parametric distributions, , and then computed the factor between the best and a lower performing distribution. As mentioned towards the end of Section 2 the cumulative version of the was used in all the experiments.

The tables showing the results are given in the appendix at the end of the paper. For Normal and Log-normal distributions, the first parameter is the mean and the second the standard deviation, while for Gamma and Weibull distributions, the first parameter is the shape and the second the scale. On the other hand, for the Uniform distribution the first parameter is the lower bound and the second the upper bound, for the Beta distribution the first parameter is

and the second , while for the -Gaussian the first parameter is the shape () and the second the scale (), as in (10). The Exponential and Pareto distributions are characterised by a single parameter. In addition, the lower bound of the 95% bootstrap confidence interval is denoted by lb and the upper bound by ub.

4.1 Experiments with Simulated Data

We now provide commentary on the results for the simulated data, shown in Tables 4, 5, 6, 7 8, 9 and 10, which are given in an appendix at the end of the paper. We note that all the computations described in this subsection were carried out using the Matlab software package. Table 2 summarises, for all experiments, the factors between the best (highlighted in bold) and second best (highlighted in italics) performing distributions. In all cases, apart from experiment 6 for the Beta distribution with and , shown in Table 9, the factor overwhelmingly supports the given distribution, as we would expect. The reason for the relatively low factor in this particular case is the known fact that the Beta distribution can be approximated by the Normal distribution when and are large [36].

It is evident that the larger the size of the data set the more accurate the will be. In Table 1 we demonstrate how the accuracy of the increases while the data set size increases, when both the given and hypothesised distribution are Normal with mean and standard deviation ; it is also noticeable that the maximum likelihood estimation of parameters converges to the correct one as the size of the data increases. In Figure 1 we show that the decrease of the follows a power-law distribution with an exponent of approximately 0.5, which is , where represents the data size.

Parameter 1 Parameter 2 Data set size
0.1941 1.0013 0.0459 32
0.1844 1.1171 0.0375 64
0.0714 1.0367 0.0204 128
0.0497 1.0010 0.0180 256
0.0236 1.0378 0.0164 512
-0.0171 1.0223 0.0080 1024
0.0301 1.0138 0.0038 2048
0.0135 1.0091 0.0055 4096
-0.0042 0.9821 0.0034 8192
-0.0065 1.0078 0.0018 16384
-0.0075 1.0038 0.0013 32768
-0.0009 0.9945 0.0007 65536
-0.0006 1.0012 0.0007 131072
0.0014 0.9998 0.0006 262144
0.0017 0.9978 0.0003 524288
0.0000 0.9997 0.0003 1048588
Table 1: resulting from increasing the data set size, where the given distribution is Normal with mean and standard deviation .
Figure 1: Power-law fit, , for data in Table 1.
Experiment Distribution factor
1 Normal 803.9692
2 Log-normal 270.5271
3 Gamma 85.8914
4 Gamma 17.9510
5 Beta 99.0703
6 Beta 6.4432
7 Beta 30.8739
Table 2: factors between the best and second best performing distributions for the simulated data.

In the first experiment the given distribution was Normal with mean and standard deviation , and the hypothesised distributions were Normal and Uniform. The factor between the s of the Normal and Uniform distributions is , which can be derived from Table 4. In the second experiment the given distribution was Log-normal with mean and standard deviation , and the hypothesised distributions were Normal, Uniform, Log-normal, Gamma and Weibull. The factor between the

s of the Log-normal distribution, which is the smallest, and the Gamma distribution, whose

is the closest to it, is , which can be derived from Table 5. In the third experiment the given distribution was Gamma with shape and scale , and the hypothesised distributions were Normal, Uniform, Log-normal, Weibull and Gamma. The factor between the s of the Gamma distribution, which is the smallest, and the Weibull distribution whose is the closest to it, is , which can be derived from Table 6. In the fourth experiment the given distribution was Gamma with shape and scale , and the hypothesised distributions were Normal, Uniform, Log-normal, Weibull and Gamma. The factor between the s of the Gamma distribution, which is the smallest, and the Log-normal distribution whose is the closest to it, is , which can be derived from Table 7. In the fifth experiment the given distribution was Beta with parameters and , and the hypothesised distributions were Normal, Log-normal, Gamma, Weibull and Beta. The factor between the Beta distribution which is the smallest, and the Normal distribution, whose is the closest to it, is , which can be derived from Table 8. In the sixth experiment the given distribution was Beta with parameters and , and the hypothesised distributions were Normal, Log-normal, Gamma, Weibull and Beta. The factor between the Beta distribution, which is the smallest, and the Normal distribution, whose is the closest to it, is , which can be derived from Table 9. In the seventh and final experiment the given distribution was Beta with parameters and , and the hypothesised distributions were Normal, Log-normal, Gamma, Weibull and Beta. The factor between the Beta distribution, which is the smallest, and the Normal distribution, whose is the closest to is, is , which can be derived from Table 10.

4.2 Analysis of Empirical Data

We now provide commentary on the results for the empirical data sets, shown in Tables 11, 12 and 13, which are given in an appendix at the end of the paper. We note that all the computations described in this subsection were carried out using Python. Table 3 summarises, for all three data sets, the factors between the best (highlighted in bold) and a lower performing distribution.

The first empirical data set we consider, contains detailed voting results of party vote shares in different polling stations, during the Lithuanian parliamentary election of 1992 (the data was obtained from [21]). Note that we consider only the top three parties and have renormalised the original data so that the total vote share of the top three parties would sum to one in each polling station. This data set was first considered in [22], where an agent-based model generating the Beta distribution, and reasonably well reproducing detailed election results, was proposed. In [23] a statistical comparison between the four commonly used distributions in sociophysics, the Normal, Log-normal, Beta and Weibull, was carried out using the Watanabe-Akaike information criterion (WAIC) [43], which is a generalisation of the AIC. The comparison concluded that the Beta and Weibull distributions provide the best fits for the empirical data. However, their respective WAIC scores were within each other’s confidence intervals, and therefore no final conclusion was made. Here we also obtain a similar result, the Beta and Weibull distributions clearly have the overall best scores, however, as before, their confidence intervals overlap (see Table 11). As was noted in [23], the Beta and Weibull distributions are similar when the observed mean is close to

and the observed variance is reasonably small. In the empirical analysis this similarity is further increased when the sample size is small. In addition, for the estimated parameter values, the Gamma and Weibull distributions behave similarly when

. Therefore, we report the factor between the best performing distribution (highlighted in bold) and the next best distribution which is neither a Beta, Gamma or Weibull distribution (see Table 3).

The second data set we consider contains the log-returns of two different exchange rates. We consider the BTC/JPY exchange rate during the time period between July 4, 2017 and July 4, 2018 (the data was obtained from [2]) on the bitFlyer exchange, as well as the EUR/USD exchange rate during the time period between June 1, 2000 and September 1, 2010 (the data was obtained from [17]). We consider the daily and one minute log-returns. For this data set we use moving block bootstrap [25] with a block size of one day.

In the econophysics literature it is commonly accepted that the log-returns are power-law distributed [6]. One of the commonly used fits for the log-returns is so-called

-Gaussian distribution

[39], which we add to our analysis for this empirical data set. Here we use the following parametrization of -Gaussian distribution [34]:

(10)

which is equivalent to Student’s t-distribution [26].

However, as can be seen in Table 12, we find that the Gamma and Weibull distributions noticeably outperform the

-Gaussian distribution. Performance of the Gamma and Weibull distributions is similar, due to the fact that for the estimated parameter values both of these distributions behave reasonably similarly; for these parameter values they are reasonably close to the Exponential distribution. Therefore, we report the

factor between the best performing distribution (highlighted in bold) and the next best distribution, which is neither a Gamma or Weibull distribution (see Table 3).

For our fourth sample in the second empirical data set, i.e. the EUR/USD one minute log-returns, unexpectedly the Log-normal and -Gaussian distributions had the best performance. Though they are far from being similar for the considered observable value range and parameter values, they, most likely attained similar scores due to the shape of the empirical distribution. The Log-normal distribution seems to represent smaller log-returns well, while the -Gaussian is better at describing the tail events.

The third data set we consider is the European soccer data set [30], which contains thousand matches played in European national championships throughout 2008–2016. From this data set we have extracted five random teams and computed inter–goal times for each team. We have treated goals scored during extra time as scored on the 45th minute (if scored during the first half) and the 90th minute (if scored during the second half). In this analysis we have added the Exponential and Pareto distributions. For the estimated parameter values the Gamma and Weibull distributions behave similarly to the Exponential distribution. Note that the shape parameter values of the Gamma and Weibull distributions are very close to and the respective scale parameter values are similar. In this case it is known that Gamma and Weibull distributions are equivalent to the Exponential distribution with the appropriate scale parameter value. We therefore report the factor between the best performing distribution (highlighted in bold) and the next best distribution, which is neither an Exponential, Gamma or Weibull distribution (see Table 3). We observe that for the ELC sample, the obtained factor is the lowest and the score is the largest. This is most likely due to this team having played opponents with a larger variety of skill. In particular, it played in the top and the second tiers of the national championship during the considered time period, resulting in a goal scoring rate with a higher variation.

Data set Sample Distribution factor
1 SK Weibull 2.7128
1 LKDP Beta 2.9507
1 LDDP Weibull 1.8957
2 Daily BTC/JPY Gamma 2.3077
2 1 min BTC/JPY Gamma 2.7037
2 Daily EUR/USD Weibull 3.5381
2 1 min EUR/USD Log-normal 1.0514
3 TOT Exponential 4.3233
3 GLA Exponential 3.5087
3 MUN Exponential 4.2010
3 VAL Weibull 5.6370
3 ELC Weibull 2.5022
Table 3: Factors between the best and a lower performing distribution for the empirical data sets.

5 Concluding remarks

We have proposed the Jensen-Shannon divergence () as a goodness-of-fit measure for data fitted with maximum likelihood estimation or curve fitting. Our experiments with simulated and empirical data in Section 4, for a variety of parametric distributions, shows that for simulated data the method is unequivocal in its preference for the true distribution (see Subsection 4.1), and for empirical data the method is effective in selecting the more likely distributions from a selection of hypothesised distributions (see Subsection 4.2).

As we have shown in Section 2 the has a precise information-theoretic meaning, and the factor has an intuitive meaning in terms of an odds ratio, in analogy to the Bayes factor. Moreover, the implementation of the as a measure of goodness-of-fit or for model comparison is relatively straightforward; see [24] for a Python implementation of the .

Ultimately more experience with empirical data sets is needed for a definitive assessment of how the performs in practice.

References

  • [1] R. Anderson-Sprecher. Model comparisons and . The American Statistician, 48:113–117, 1994.
  • [2] Bitcoincharts.com. http://api.bitcoincharts.com/v1/csv/.
  • [3] J. Borges and M. Levene. Testing the predictive power of variable history web usage. Soft Computing, 11:717–727, 2006.
  • [4] K.P. Burnham and D.R. Anderson. Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach. Springer-Verlag, New York, second edition, 2002.
  • [5] A.C. Cameron and F.A.G. Windmeijer. An R-squared measure of goodness of fit for some common nonlinear regression models. Journal of Econometrics, 77:329–342, 1997.
  • [6] R. Cont. Empirical properties of asset returns: Stylized facts and statistical issues. Quantitative Finance, 1:1–14, 2001.
  • [7] T.M. Cover and J.A. Thomas. Elements of Information Theory. Wiley Series in Telecommunications. John Wiley & Sons, Hoboken, New Jersey, second edition, 2006.
  • [8] A.C. Davison and D.V. Hinkley. Bootstrap Methods and their Applications. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge, UK, 1997.
  • [9] D. Endres and J. Schindelin. A new metric for probability distributions. IEEE Transactions on Information Theory, 49:1858–1860, 2003.
  • [10] T. Fenner, M. Levene, and G. Loizou. A multiplicative process for generating the rank-order distribution of UK election results. Quality & Quantity, 52:1069––1079, 2018.
  • [11] J. Fox.

    Applied Regression Analysis and Generalized Linear Models

    .
    Sage Publications, Thousand Oaks, Ca., 3rd edition, 2016.
  • [12] R.J. Freund, W.J. Wilson, and P. Sa.

    Regression Analysis: Statistical Modeling of a Response Variable

    .
    Academic Press, San Diego, CA., second edition, 2006.
  • [13] J.F. Gómez-Lopera, J. Martínez-Aroza, A.M. Robles-Pérez, and R. Román-Roldán. An analysis of edge detection by using the Jensen-Shannon divergence. Journal of Mathematical Imaging and Vision, 13:35–56, 2000.
  • [14] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In Proceedings of Advances in Neural Information Processing Systems (NIPS) 27, pages 2672–2680, Montreal, 2014.
  • [15] I. Grosse, P. Bernaola-Galván, P. Carpena, R. Román-Roldán, J. Oliver, and H.E. Stanley. Analysis of symbolic sequences using the Jensen-Shannon divergence. Physical Review E, 65:041905–1 – 041905–16, 2002.
  • [16] J. Heinrich. Pitfalls of goodness-of-fit from likelihood. In Proceedings of the Conference on Statisical Problems in Particle Physics, Astrophysics and Cosmology (PHYSTAT2003), pages 52–55, Stanford, Ca., 2003.
  • [17] HistData.com. http://www.histdata.com/download-free-forex-data/.
  • [18] A.F. Jarosz and J. Wiley. What are the odds? A practical guide to computing and reporting Bayes factors. Journal of Problem Solving, 7:Article 2, 8pp, 2014.
  • [19] N.L. Johnson, S. Kotz, and N. Balkrishnan. Continuous Univariate Distributions, Volume 1, chapter 18 Chi-square distributions including Chi and Rayleigh, pages 415–493. Wiley Series in Probability and Mathematical Statistics. John Wiley & Sons, New York, NY, second edition, 1994.
  • [20] R.E. Kass and A.E. Raftery. Bayes factors. Journal of the American Statistical Association, 90:773–795, 1995.
  • [21] A. Kononovicius. Lithuanian parliamentary election data. http://github.com/akononovicius/lithuanian-parliamentary-election-data.
  • [22] A. Kononovicius. Empirical analysis and agent-based modeling of the Lithuanian parliamentary elections. Complexity, Article ID 7354642:15 pages, 2017.
  • [23] A. Kononovicius. Modeling of the parties’ vote share distributions. Acta Physica Polonica A, 133:1450–1458, 2018.
  • [24] A. Kononovicius and M. Levene. PyJSD: Python implementation of the Jensen-Shannon divergence. http://github.com/akononovicius/pyjsd.
  • [25] J.-P. Kreiss and S.N. Lahiri. Bootstrap methods for time series. In T.S. Rao, S.S. Rao, and C.R. Rao, editors, Handbook of Statistics Volume 30, Time Series Analysis: Methods and Applications, pages 3–26. North-Holland, Oxford, 2012.
  • [26] K. Krishnamoorthy. Handbook of Statistical Distributions with Applications. CRC Press, Boca Raton, FL, second edition, 2015.
  • [27] F. Lewis, A. Butler, and L. Gilbert. A unified approach to model selection using the likelihood ratio test. Methods in Ecology and Evolution, 2:155–162, 2011.
  • [28] J. Lin. Divergence measures based on the Shannon entropy. IEEE Transactions on Information Theory, 37:145–151, 1991.
  • [29] A.P. Majtey, P.W. Lamberti, and D.P. Prato. Jensen-Shannon divergence as a measure of distinguishability between mixed quantum states. Physical Review A, 72:052310–1–052310–6, 2005.
  • [30] H. Mathien. European soccer dataset. http://www.kaggle.com/hugomathien/soccer.
  • [31] A. Mehria, M. Jamaati, and H. Mehri. Word ranking in a single document by Jensen–Shannon divergence. Physical Letters A, 379:1627–1632, 2015.
  • [32] H. Motulsky. Intuitive Biostatistics. Oxford University Press, Oxford, 1995.
  • [33] I.J. Myung. Tutorial on maximum likelihood estimation. Journal of Mathematical Psychology, 47:90–100, 2003.
  • [34] S. Nadarajaha and S. Kotz. On the -type distributions. Physica A, 377:465–468, 2007.
  • [35] H.-V. Nguyen and J. Vreeken. Non-parametric Jensen-Shannon divergence. In

    Proceedings of European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD)

    , pages 173–189, Porto, 2015.
  • [36] D.B. Peizer and J.W. Pratt. A normal approximation for binomial, F, beta, and other common, related tail probabilities, I. Journal of the American Statistical Association, 63:1416–1456, 1968.
  • [37] G.E. Sims, S.-R. Jun, G.A. Wu, and S.-H. Kim. Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions. Proceedings of the National Academy of Sciences of the United States of America, 106:2677––2682, 2009.
  • [38] A.-N. Spiess and N. Neumeyer. An evaluation of as an inadequate measure for nonlinear models in pharmacological and biochemical research: a Monte Carlo approach. BMC Pharmacology, 10, 2010. 11 pages.
  • [39] C. Tsallis. Economics and finance: -Statistical stylized features galore. Entropy, 19(9):457, 2017.
  • [40] V. Voinov, M. Nikulin, and N. Balakrishnan. Chi-Squared Goodness of Fit Tests with Applications. Academic Press, London, UK, 2013.
  • [41] S.I. Vrieze. Model selection and psychological theory: A discussion of the differences between the Akaike information criterion (AIC) and the Bayesian information criterion (BIC). Psychological Methods, 17:228–243, 2012.
  • [42] Q.H. Vuong. Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica, 57:307–333, 1989.
  • [43] S. Watanabe. A widely applicable Bayesian information criterion. Journal of Machine Learning Research, 14:867–897, 2013.
  • [44] E. Wilson and M. Hilftery. The distribution of chi-square. Proceedings of the National Academy of Sciences of the United States of America, 17:684–688, 1931.
  • [45] G. Yari and A. Saghafi. Unbiased Weibull modulus estimation using differential cumulative entropy. Communications in Statistics – Simulation and Computation, 41:1372–1378, 2012.

Appendix A Appendix of tables with results from the experiments with simulated and empirical data

Distribution Parameter 1 Parameter 2 lb ub
Normal 0.0003 0.9995 0.0002 0.0002 0.0005
Uniform -4.7467 4.8122 0.1947 0.1911 0.1950
Table 4: Experiment 1: Normal distribution with mean and standard deviation .
Distribution Parameter 1 Parameter 2 lb ub
Normal 1.6492 2.1497 0.1520 0.1513 0.1527
Uniform 0.0054 104.6298 0.9001 0.8816 0.9005
Log-normal 0.0001 1.0006 0.0002 0.0002 0.0005
Gamma 1.1373 1.4501 0.0481 0.0477 0.0484
Weibull 1.0002 1.6494 0.0543 0.0540 0.0547
Table 5: Experiment 2: Log-normal distribution with mean and standard deviation .
Distribution Parameter 1 Parameter 2 lb ub
Normal 3.9999 2.8282 0.0743 0.0741 0.0746
Uniform 0.0020 34.4431 0.5539 0.5137 0.5543
Log-normal 1.1164 0.8020 0.0308 0.0305 0.0311
Gamma 2.0031 1.9969 0.0002 0.0002 0.0005
Weibull 1.4831 4.4386 0.0150 0.0147 0.0153
Table 6: Experiment 3: Gamma distribution with shape and scale .
Distribution Parameter 1 Parameter 2 lb ub
Normal 100.0013 14.1418 0.0132 0.0130 0.0135
Uniform 44.9060 186.2970 0.2073 0.2019 0.2103
Log-normal 4.5952 0.1421 0.0059 0.0057 0.0062
Gamma 50.0161 1.9994 0.0003 0.0002 0.0006
Weibull 7.2817 106.2044 0.04636 0.0460 0.04667
Table 7: Experiment 4: Gamma distribution with shape and scale .
Distribution Parameter 1 Parameter 2 lb ub
Normal 0.4999 0.2237 0.0246 0.0244 0.0248
Log-normal -0.8340 0.6020 0.0583 0.0581 0.0586
Gamma 3.7158 0.1345 0.0399 0.0396 0.0401
Beta 1.9973 1.9988 0.0002 0.0002 0.0005
Weibull 2.3816 0.5628 0.0255 0.0253 0.0258
Table 8: Experiment 5: Beta distribution with and .
Distribution Parameter 1 Parameter 2 lb ub
Normal 0.5000 0.0497 0.0010 0.0008 0.0013
Log-normal -0.6982 0.1007 0.0128 0.0125 0.0131
Gamma 99.7537 0.0050 0.0086 0.0083 0.0088
Beta 50.0429 50.0496 0.0002 0.0002 0.0004
Weibull 10.6949 0.5225 0.0369 0.0365 0.0372
Table 9: Experiment 6: Beta distribution with and .
Distribution Parameter 1 Parameter 2 lb ub
Normal 0.6666 0.0494 0.0064 0.0062 0.0067
Log-normal -0.4084 0.0750 0.0158 0.0156 0.0161
Gamma 179.5350 0.0037 0.0127 0.0125 0.0130
Beta 60.0885 30.0540 0.0002 0.0002 0.0005
Weibull 14.6275 0.6893 0.0326 0.0323 0.0329
Table 10: Experiment 7: Beta distribution with and .
Distribution Parameter 1 Parameter 2 lb ub
SK – Sąjūdžio koalicija
Normal 0.2412 0.1132 0.0264 0.0229 0.0307
Log-normal -1.5556 0.5721 0.0243 0.0203 0.0290
Gamma 3.9083 0.0617 0.0126 0.0097 0.0161
Beta 3.0771 9.7045 0.0091 0.0067 0.0125
Weibull 2.2492 0.2722 0.0090 0.0068 0.0127
LKDP – Lietuvos krikščionių demokratų partija
Normal 0.1575 0.0921 0.0305 0.0276 0.0339
Log-normal -2.0435 0.6884 0.0168 0.0138 0.0202
Gamma 2.7196 0.0579 0.0060 0.0044 0.0084
Beta 2.3282 12.4475 0.0057 0.0044 0.0082
Weibull 1.7869 0.1773 0.0081 0.0060 0.0110
LDDP – Lietuvos demokratinė darbo partija
Normal 0.6013 0.1516 0.0169 0.0135 0.0236
Log-normal -0.5459 0.2892 0.0524 0.0468 0.0582
Gamma 13.5242 0.0445 0.0422 0.0361 0.0485
Beta 5.4454 3.6014 0.0105 0.0093 0.0181
Weibull 4.5344 0.6588 0.0089 0.0081 0.0172
Table 11: Empirical data set 1: Lithuanian parliamentary election 1992.
Distribution Parameter 1 Parameter 2 lb ub
Daily – BTC/JPY (bitFlyer)
Normal 0.0415 1.0000 0.0099 0.0055 0.0144
Log-normal -0.9039 1.1431 0.0030 0.0017 0.0050
Gamma 1.1015 0.6174 0.0013 0.0010 0.0028
Weibull 1.0276 0.6882 0.0015 0.0010 0.0030
q-Gaussian 3.9028 1.0413 0.0060 0.0048 0.0116
One minute – BTC/JPY (bitFlyer)
Normal 0.0011 1.0000 0.0140 0.0121 0.0158
Log-normal -1.2561 1.5771 0.0073 0.0070 0.0077
Gamma 0.7717 0.7993 0.0027 0.0026 0.0029
Weibull 0.8468 0.5654 0.0029 0.0028 0.0031
q-Gaussian 2.8964 0.5871 0.0082 0.0069 0.0097
Daily – EUR/USD (Forex)
Normal 0.0179 1.0000 0.0038 0.0026 0.0053
Log-normal -0.7227 1.1086 0.0047 0.0042 0.0052
Gamma 0.6102 1.2489 0.0010 0.0007 0.0014
Weibull 1.1579 0.8021 0.0006 0.0004 0.0011
q-Gaussian 7.8686 2.2146 0.0022 0.0018 0.0039
One minute – EUR/USD (Forex)
Normal 0.0004 1.0000 0.0139 0.0136 0.0142
Log-normal -0.3236 0.6165 0.0086 0.0085 0.0086
Gamma 2.3423 0.3881 0.0115 0.0113 0.0116
Weibull 1.3548 1.0057 0.0143 0.0142 0.0144
q-Gaussian 2.9820 0.6415 0.0090 0.0087 0.0093
Table 12: Empirical Data Set 2: Daily and one minute log-returns of EUR/USD and BTC/JPY exchange rates.
Distribution Parameter 1 Parameter 2 lb ub
TOT – Tottenham Hotspur (English Premier League)
Normal 57.7346 58.2559 0.0369 0.0326 0.0423
Gamma 1.0565 54.6477 0.0066 0.0055 0.0109
Weibull 1.0183 58.1866 0.0062 0.0054 0.0102
Exponential 57.7346 - 0.0057 0.0057 0.0115
Pareto 1.6809 - 0.0249 0.0200 0.0310
GLA – Borussia Monchengladbach (German Bundesliga)
Normal 61.5088 61.3039 0.0378 0.0334 0.0424
Gamma 1.0186 60.3877 0.0072 0.0063 0.0111
Weibull 1.0012 61.5400 0.0067 0.0062 0.0106
Exponential 61.5088 - 0.0067 0.0062 0.0127
Pareto 1.6850 - 0.0235 0.0187 0.0303
MUN – Manchester United (English Premier League)
Normal 48.3174 49.8385 0.0353 0.0315 0.0394
Gamma 1.0497 46.0306 0.0065 0.0051 0.0101
Weibull 1.0084 48.4960 0.0060 0.0050 0.0094
Exponential 48.3174 - 0.0058 0.0050 0.0103
Pareto 1.6886 - 0.0245 0.0203 0.0301
VAL – Valencia CF (Spanish La Liga)
Normal 57.2779 52.8406 0.0321 0.0284 0.0365
Gamma 1.1212 51.0879 0.0052 0.0048 0.0090
Weibull 1.0725 58.8645 0.0051 0.0047 0.0093
Exponential 57.2779 - 0.0058 0.0051 0.0122
Pareto 1.6547 - 0.0286 0.0238 0.0347
ELC – Elche CF (Spanish La Liga/La Liga 2)
Normal 102.1875 87.1523 0.0393 0.0263 0.0658
Gamma 1.3220 77.2953 0.0172 0.0161 0.0349
Weibull 1.1929 108.4610 0.0157 0.0157 0.0332
Exponential 102.1875 - 0.0279 0.0192 0.0581
Pareto 1.6154 - 0.0398 0.0257 0.0632
Table 13: Empirical data set 3: Inter–goal times recorded in the European soccer data set.