Power laws distributions in objective priors

05/15/2020
by   Pedro L. Ramos, et al.
Universidade de São Paulo
0

The use of objective prior in Bayesian applications has become a common practice to analyze data without subjective information. Formal rules usually obtain these priors distributions, and the data provide the dominant information in the posterior distribution. However, these priors are typically improper and may lead to improper posterior. Here, we show, for a general family of distributions, that the obtained objective priors for the parameters either follow a power-law distribution or has an asymptotic power-law behavior. As a result, we observed that the exponents of the model are between 0.5 and 1. Understand these behaviors allow us to easily verify if such priors lead to proper or improper posteriors directly from the exponent of the power-law. The general family considered in our study includes essential models such as Exponential, Gamma, Weibull, Nakagami-m, Haf-Normal, Rayleigh, Erlang, and Maxwell Boltzmann distributions, to list a few. In summary, we show that comprehending the mechanisms describing the shapes of the priors provides essential information that can be used in situations where additional complexity is presented.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

02/13/2020

Power-Expected-Posterior Priors as Mixtures of g-Priors

One of the main approaches used to construct prior distributions for obj...
05/17/2020

Posterior properties of the Weibull distribution for censored data

The Weibull distribution is one of the most used tools in reliability an...
12/28/2020

Objective Bayesian Analysis for the Differential Entropy of the Gamma Distribution

The use of entropy related concepts goes from physics, such as in statis...
07/23/2021

Plinko: A Theory-Free Behavioral Measure of Priors for Statistical Learning and Mental Model Updating

Probability distributions are central to Bayesian accounts of cognition,...
11/04/2019

Global Regularity and Individual Variability in Dynamic Behaviors of Human Communication

A new model, called "Human Dynamics", has been recently proposed that in...
03/29/2021

Hybrid Power-Law Models of Network Traffic

The availability of large scale streaming network data has reinforced th...
03/30/2020

Empirical Analysis of Zipf's Law, Power Law, and Lognormal Distributions in Medical Discharge Reports

Bayesian modelling and statistical text analysis rely on informed probab...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Bayesian methods have become ubiquitous among statistical procedures and have provided important results in areas from medicine to engineering [21, 31]

. In the Bayesian approach, the parameters in a statistical model are assumed to be random variables

[8], differently from the frequentist approach, that consider these parameters as constant. Moreover, a subjective ingredient can be included in the model, to reproduce the knowledge of a specialist (see O’Hagan et al. [26]). On the other hand, in many situations, we are interested in obtaining a prior distribution, which guarantees that the information provided by the data will not be overshadowed by subjective information. In this case, an objective analysis is recommended by considering non-informative priors that are derived by formal rules [9, 19]. Although several studies have found weakly informative priors (flat priors) as presumed non-informative priors, Bernardo [8] argued that using simple proper priors, supposed to be non-informative, often hides significant unwarranted assumptions, which may easily dominate, or even invalidate the statistical analysis.

The objective priors are constructed by formal rules [19]

and are usually improper, i.e., do not correspond to proper probability distribution and could lead to improper posteriors, which is undesirable. According to Northrop and Attalides

[25], there are no simple conditions that can be used to prove that improper prior yields a proper posterior for a particular distribution. Therefore a case-by-case investigation is needed to check the propriety of the posterior distribution. The Stacy [29] general family of distribution overcomes this problem by proving that if the objective priors follow asymptotically a power-law model with the exponent in some particular regions, then the obtained posteriors are proper or improper. As a result, one can easily check if the obtained posterior is proper or improper, directly looking at the behavior of the improper prior as a power-law model.

Understanding the situations when the data follow a power-law distribution can indicate the mechanisms that describe the natural phenomenon in question. Power-law distributions appears in many physical, biological, and man-made phenomena, for instance, they can be used to describe biological network [27], infectious diseases [14], the sizes of craters on the moon [23], intensity function in repairable systems [22] and energy dissipation in cyclones [10] (see also [15, 2, 24]

). The probability density function of a power-law distribution can be represented as

(1)

where is a normalized constant and the exponent parameter. During the applications of Bayesian methods the normalized constant is usually omitted and the prior can be represented by .

In this paper, we analyze the behavior of different objective priors related to the parameters of many distributions. We show that its asymptotic behavior follows power-law models with exponents between 0.5 and 1. Under these cases, they may lead to proper or improper posterior depending on the exponent values of the priors. Situations, where a power-law distribution is observed with an exponent smaller than one were observed by Goldstein et al. [15], Deluca and Corral [11] and Hanel et al. [17]. The objective priors are obtained from the Jeffreys’ rule [19], Jeffreys’ prior [18] and reference priors [7, 8, 4]

. Although the posterior distribution may be proper, the posterior moments can be infinite. Therefore, we also provided sufficient conditions to verify if the posterior moments are finite. These results play an important role in which the acknowledgement of the power-law behavior for the prior distribution related to a particular distribution can provide an understanding of the shapes of the prior that can be used in situations where additional complexity (e.g. random censoring, long-term survival, among others) is presented. Priors obtained from formal rules are more difficult or cannot be obtained.

The remainder of this paper is organized as follows. Section 2 presents the theorems that provide necessary and sufficient conditions for the posterior distributions to be proper depending on the asymptotic behavior of the prior as a power-law model. Additionally, we also discuss sufficient conditions to check if the posterior moments are finite. Sections 3 present study of the behavior of the objective priors. Finally, Section 4 summarizes the study with concluding remarks.

2 An general model

The Stacy family of distributions plays an important role in statistics and has proven to be very flexible in practice for modeling data from several areas, such as climatology, meteorology medicine, reliability and image processing data, among others [29]. A random variable X follows Stacy’s model if its probability density function (PDF) is given by

(2)

where is the gamma function, , and are the shape parameters and is a scale parameter. The Stacy’s model unify many important distributions, as shown in Table 1.

Distribution
Exponential 1 1
Rayleigh 1 2
Haf-Normal 0.5 2
Maxwell Boltzmann 2
scaled chi-square 0.5n 1
chi-square 2 0.5n 1
Weibull 1
Generalized Haf-Normal 2
Gamma
Erlang
Nakagami
Wilson-Hilferty
Lognormal

Table 1: Distributions included in the Stacy family of distributions (see equation 2).

The inference procedures related to the parameters are conducted using the joint posterior distribution for that is given by the product of the likelihood function and the prior distribution divided by a normalizing constant , resulting in

(3)

where

(4)

and is the parameter space of . Considering any prior in the form our main aim is to analyze the asymptotic behavior of the priors that leads to power-law distributions allowing to find necessary and sufficient conditions for the posterior to be proper, i.e., .

In order to study such asymptotic behavior the following definitions and propositions will be useful to prove the results related to the posterior distribution. Let denote the extended real number line with the usual order , let denote the positive real numbers and denote the positive real numbers including , and denote and analogously. Moreover, if and , we define as the usual product if , and if .

Definition 2.1.

Let and . We say that if there exist such that . If and then we say that .

In other words, by the Definition 2.1 we have that if either or , and we have that if either and , or .

Definition 2.2.

Let and , where . We say that if there exist such that for every . If and then we say that .

Definition 2.3.

Let , , and . We say that if . If and then we say that .

The meaning of the relations and for are defined analogously. Note that, if for some we have , then it follows directly that . The following proposition is a direct consequence of the above definition.

Proposition 2.4.

Let , , , , and let , , and be continuous functions with domain such that and . Then the following hold

The following proposition relates Definition 2.2 and Definition 2.3.

Proposition 2.5.

Let and be continuous functions on , where and . Then if and only if and .

Proof.

See Appendix 4.2.∎

Note that if and are continuous functions on , then by continuity it follows directly that and therefore for every . This fact and the Proposition 2.5 imply directly the following.

Proposition 2.6.

Let and be continuous functions in , where and , and let . Then if (or ) we have that (respectively ).

2.1 Case when is known

Let be of the form (3) but considering fixed and , the normalizing constant is given by

(5)

where is the parameter space. Here our purpose reduce to analyze and find necessary and sufficient conditions for .

Theorem 2.7.

Suppose that for all , that , and suppose that and the priors have asymptotic power-law behaviors with

such that with , or with , then is proper.

Proof.

See Appendix 4.3. ∎

Theorem 2.8.

Suppose that , , and the priors have asymptotic power-law behaviors where and one of the following hold:

  • ; or

  • where with ; or

  • where with ,

then is improper.

Proof.

See Appendix 4.4

Theorem 2.9.

Let and the behavior of , follows asymptotic power-law distributions given by

for , and . The posterior related to is proper if and only if with , or with , and in this case the posterior mean of and are finite, as well as all moments.

Proof.

Since the posterior is proper, by Theorem 2.7 we have that with or with .

Let . Then , where and , and we have

Since with or with , it follows from Theorem 2.7 that the posterior

related to the prior is proper. Therefore

Analogously one can prove that

Therefore we have proved that if a prior satisfying the assumptions of the theorem leads to a proper posterior, then the priors and also leads to proper posteriors. It follows by induction that also leads to proper posteriors for any and , which concludes the proof. ∎

2.2 Case when is known

Let be of the form (3) but considering fixed and , the normalizing constant is given by

(6)

where is the parameter space. Let , our purpose is to find necessary and sufficient conditions where .

Theorem 2.10.

Suppose that for all , that , and suppose that and the priors have asymptotic power-law behaviors with

such that , and . then is proper.

Proof.

See Appendix 4.5. ∎

Theorem 2.11.

Suppose that and that , and suppose that and the priors have asymptotic power-law behaviors where and one of the following hold

  • ;

  • such that with ; or

  • such that with

then is improper.

Proof.

See Appendix 4.6. ∎

Theorem 2.12.

Let and the behavior of , follows asymptotic power-law distributions given by

for , and . The posterior related to is proper if and only if with , and in this case the posterior mean of is finite for this prior, as well as all moments relative to , and the posterior mean of is not finite.

Proof.

Since the posterior is proper, by Theorem 2.11 we have that and .

Let . Then , where and , and we have

But since it follows from Theorem 2.10 that the posterior

relative to the prior is proper. Therefore

Analogously one can prove using the item ii) of the Theorem 2.11 that

since in this case .

Therefore we have proved that if a prior satisfying the assumptions of the theorem leads to a proper posterior, then the prior also leads to proper posteriors. It follows by induction that also leads to proper posteriors for any in , which concludes the proof. ∎

2.3 General case when , and are unknown

Theorem 2.13.

Suppose that for all , that , and suppose that and the priors have asymptotic power-law behaviors with

such that , , , and , then is proper.

Proof.

See Appendix 4.7

Theorem 2.14.

Suppose that and that , then the following items are valid

  • for all where , such that and one of the following hold

    • ;

    • ; where with ; or

    • ; where with and .

    then is improper.

  • such that and one of the following occur

    • and where either or ;

    • and where either or ;

    then is improper.

Proof.

See Appendix 4.8

Theorem 2.15.

Suppose that for all , and suppose that where the priors have asymptotic power-law behaviors with

then the posterior is proper if and only if , , , and . Moreover, if the posterior is proper then leads to a proper posterior if and only if , and .

Proof.

Notice that under our hypothesis, Theorems 2.14 and 2.15 are complementary, and thus the first part of the theorem is proved. Analogously, by the Theorems 2.14 and 2.15 the prior leads to a proper posterior if and only if , , , and . The last two proportionalities are already satisfied since and . Combining the other inequalities the proof is completed. ∎

3 Some common objective priors with power-law asymptotic behavior

A common approach was suggested by Jeffreys’ that considered different procedures for constructing objective priors. For (see, [19]), Jeffreys suggested to use the prior , i.e., a power-law distribution with exponent 1. The main justification for this choice is its invariance under power transformations of the parameters. As the parameters of the Stacy family of distributions are contained in the interval , the prior using Jeffreys’ first rule is .

Let us consider the case when is known. Hence, the results is valid for the Gamma, Nakagami, Wilson-Hilferty distributions, among others. The Jeffreys’ first rule when is known follows power-law distributions with and . Hence the posterior distribution obtained is proper for all as well as its higher moments. This can be easily proved by noticing that as we can apply Theorem 2.12 with and it follows that the posterior is proper for as well as its moments.

On the other hand, under the general model where all the parameters are unknown, we have the posterior distribution (3) obtained using Jeffreys’ first rule is improper for all . Since , and , i.e., power-laws with exponent , we can apply Theorem 2.14 ii) with , where , and therefore we have that leads to an improper posterior for all .

Let us consider the cases where and the has different forms which can be written as

(7)

where is the index related to a particular prior. Therefore, our main focus will be to study the behavior of the priors .

One important objective prior is based on Jeffreys’ general rule [18] and known as Jeffreys’ prior. This prior is obtained through the square root of the determinant of the Fisher information matrix and has been widely used due to its invariance property under one-to-one transformations. The Fisher information matrix for the Stacy family of distributions was derived by [16] and its elements are given by

where is the trigamma function.

Van Noortwijk [30] provided the Jeffreys’ prior for the general model, which can be expressed by (7) with

(8)
Corollary 3.1.

The prior has the asymptotic behavior given by

then the obtained posterior distribution is improper for all .

Proof.

Ramos et al. [28] proved that

(9)

Since , the hypotheses of Theorem 2.14, ii) hold with and , where , and therefore leads to an improper posterior for all . ∎

Let be known, then the Jeffreys’ prior has the form (7) where is given by

(10)
Corollary 3.2.

The prior has the asymptotic power-law behavior given by

then the obtained posterior is proper for as well as its higher moments.

Proof.

Here, we have , i.e, power-law distribution. Following [1] we have that , then , and thus

(11)

which implies . Moreover, from [1], we have that and thus

which implies .

Therefore we can apply Theorem 2.9 with and and therefore the posterior is proper and the posterior moments are finite for all . ∎

Fonseca et al. [13] considered the scenario where the Jeffreys’ prior has an independent structure, i.e., the prior has the form , where diag is the diagonal matrix of . For the general distribution the prior is given by (7) with

(12)

Notice that for (12) is only necessary to know the behavior when that provided enough information to very that the posterior is improper.

Corollary 3.3.

The prior (12) has the asymptotic power-law behavior given by and the obtained posterior is improper for all .

Proof.

By Abramowitz and Stegun[1], we have the recurrence relations

(13)

It follows that

Hence, , which implies that

(14)

i.e., power-law distribution with exponent , then, Theorem 2.14 ii) can be applied with , and where and therefore leads to an improper posterior. ∎

This approach can be further extended considering that only one parameter is independent. For instance, let be dependent parameters and be independent then under the partition the -Jeffreys’ prior is given by

(15)

For the general model the partition -Jeffreys’ prior is of the form (7) with