## 1 Introduction

In recent years, there has been a growing interest in estimating different metrics of information theory related to parametric distributions. The Shannon entropy, also known as differential entropy, introduced by Claude Shannon [30], is an essential quantity that measures the amount of available information or uncertainty outcome of a random process. Given a density function , the differential entropy is given by

(2) |

The differential entropy depends on the distribution parameters, and, given a sample, it is necessary to be estimated. The commonly used method to estimate the parameters is the maximum likelihood approach due to its one-to-one invariance property. Hence, we need only to estimate the parameters of the original model and plug-in the entropy function. Under this approach many authors have derived the estimators for different distributions such as, Weibull [9], Inverse Weibull [33], Log-logistic [13]

and for the exponential distribution with different shift origin

[20], to list a few.A major drawback of the maximum likelihood inference is that the obtained estimates are usually biased for small samples [11]

. Another concern happens under small samples when constructing the confidence intervals for the parameters since such intervals are not precise and may not return good coverage probabilities. In this case, the maximum likelihood estimation (MLE) skewness study is essential to assess the quality of the interval

[12]. To overcome these limitations, we can use objective Bayesian methods. In this context, the inference for the parameters of the gamma distribution have been discussed earlier under this approach by Miller [23], Sun and Ye [31], Berger et al. [3], and Louzada and Ramos [21]. Moreover, Ramos et al. [27]revised the most common objective priors and provided sufficient and necessary conditions for the obtained posteriors and their higher moments to be proper.

Although the authors have obtained different joint posterior distributions for the parameters of interest, the obtained posterior means can not be directly plunged in the Shannon entropy. Under the Bayesian approach, it is necessary to obtain the posterior distribution of the entropy measure. In this context, Shakhatreh [29] recently derived different posterior distributions using objective priors for the entropy assuming a Weibull distribution. On the other hand, the cited distribution’s entropy expression is not as complicated as the gamma distribution’s entropy expression. With this in mind, in this paper, focusing on the gamma distribution, we derive the posterior distributions using objective priors, such as Jeffreys prior [18], reference priors [7, 2, 3], and matching priors [32]

, and prove that the obtained posteriors are proper and can be used to construct the posterior distributions of the Shannon entropy. Moreover, even if the posterior distribution is proper, the posterior mean can be infinite, which is undesirable, and thus we shall also prove that the obtained posterior means for the entropy measure are finite. Finally, the credibility intervals are obtained to construct accurate interval estimates.

The gamma distribution considered here is a two-parameter family of distribution among the most well-known distribution used to model different stochastic processes and to make statistical inferences, and has received attention from different fields. It surfaces in many areas of applications, including financial analysis [10], climate analysis [17], reliability analysis [16]

[19], and physics [14]. Particularly, the gamma distribution includes the exponential distribution, Erlang distribution, and chi-square distribution as special cases.

follows a gamma distribution, if its probability density function, parametrized by a shape parameter

and scale parameter , is given by,(3) |

where is the gamma function.

The paper is organized as follows. Section 2 presents the maximum likelihood estimators for the gamma distribution parameters and the Shannon Entropy computation. Section 3 presents the objective Bayesian analysis using objective priors for the Shannon entropy parameter’s reparametrized posterior distribution. Section 4 provides a simulation study to select the best objective prior. In Section 5, the methodology is illustrated on a real dataset. Some final comments are given in Section 6.

## 2 Frequentist approach

The classical inference (frequentist) is a commonly used approach to conduct parameter estimation of a particular distribution. In this case, the parameter is treated as fixed, and the MLE is commonly used to obtain the estimates. The MLE has good asymptotic properties, such as invariance, consistency, and efficiency. This procedure search the parameter space of where the maximum likelihood is obtained. Here our main aim is to obtain the estimate of a function of the parameters. Hence, firstly we need to obtain the entropy measure, mathematically defined as , which quantifies the amount of uncertainty in the data . Besides, it should be noted that a higher realization of indicates more uncertainty.

The entropy of the gamma density is given by

(4) | ||||

where is the digamma function.

Now, consider a change of variable by setting , which implies . The aim of the transformation is to obtain a likelihood of and instead of and . Therefore, if , , are a complete sample from (3) then the likelihood function of and is given as

(5) |

where .

The log-likelihood function is given by

(6) |

The MLEs for the parameters are obtained by directly maximizing the log-likelihood function . Hence, after some algebraic manipulations the MLEs and are obtained from the solution of

where . The solutions for these equations provide the maximum likelihood estimators for the entropy of the gamma distributions, and . Since equation (3) cannot be solved easily using a closed-form solution, numerical techniques must estimate the true parameters.

Following [22]

, the MLEs are asymptotically normally distributed with a joint bivariate normal distribution given by

where is the Fisher information matrix for the reparametrized model given by

(7) |

and () is the derivative of (), called the trigamma function.

In the present paper, we are only interested in , and thus, given and using the element , we can conclude that the confidence interval for the estimate of the entropy measure with a confidence level of for is given by

(8) |

where is the significance level and is the -th percentile of the standard normal distribution.

## 3 Bayesian Inference

Here, the parameter is considered as a random variable and the distribution that represents knowledge about is refereed as a prior distribution and defined by . The distribution provides the knowledge or uncertainty about before obtaining the sample data . After the data

is observed, a natural way of combining the resulting information from the a priori the distribution and the likelihood function is done by the Bayes’ theorem, resulting in the posterior distribution of

given . In a Bayesian framework, Ramos [27] analyzed the properties of the posterior distribution of the gamma distribution parameters and stated the conditions for this distribution to have proper posterior and finite moments.To obtain the posterior distributions for the parameter, we can consider the one-to-one invariance property of the Jeffreys prior, reference prior, and matching prior, and thus we only need to obtain the Jacobian matrix related to the reparametrization from and to and . After some algebraic manipulations, we can conclude that the parameters and can be written as

and thus, from the relations

it follows that the Jacobian matriz (J) relative to the change of variable will be given by

(9) |

where .

The use of objective priors plays an essential role in Bayesian analysis where the data provide the dominant information, and the posterior distribution is not overshadowed by prior information. Such priors allow us to conduct objective Bayesian inference. On the other hand, in most situations, they are not proper prior distributions and may lead to improper posterior, invalidating the analysis since we cannot compute the normalizing constant. Therefore, we need to check if the obtained posterior (and posterior mean) is proper (or finite). The priors for the entropy and its related posterior distributions will be discussed in the next subsections.

Before we derive the priors and posterior distributions, hereafter, we shall always assume that there are at least two distinct data , that is, there exists such that . Additionally, before we proceed, we present below a definition and proposition that will be used to prove that the obtained posteriors are proper. In the following let denote the extended real number line and let denote the strictly positive real numbers. The following definition is a special case from the one presented in [25] and will play an important role in proving that the analyzed posterior distributions and posterior means are proper.

###### Definition 3.1.

Let , and , where and suppose that . Then, if , we say that .

Regarding the above definition, we have the following proposition from [25].

###### Proposition 3.2.

Let and be continuous functions in , where and , and let . Then implies in and implies in .

### 3.1 Jeffreys prior

Jeffreys [18] described a procedure to achieve an objective prior, which is invariant under one-to-one monotone transformations. The invariant property of the Jeffreys prior has been widely exploited to make statistical inferences from its posterior distribution numerical analysis. The prior construction is based on the square root of the determinant of the Fisher information matrix . Thus, the Jeffreys prior to the gamma distribution is given by

(10) |

Additionally, from the determinant of the Fisher information, or using the change of variables over the Jeffreys prior we have

(11) |

Finally, the joint posterior distribution for and produced by the Jeffreys prior is

(12) |

###### Theorem 3.3.

The posterior density (12) is proper for all .

###### Proof.

Using the change of variables and denoting it follows that

where for all . Now, according to [25, 28], we have and and since

it follows by Proposition 3.2 that

Moreover, due to [25, 28] we have and , and since

are not all equal, due to the inequality of the arithmetic and geometric means we have

and thus it follows thatTherefore, from Proposition 3.2 it follows that

which concludes the proof. ∎

###### Theorem 3.4.

The posterior mean of relative to (12) is finite for any .

###### Proof.

Doing the change of variables and denoting , it follows that

Moreover, from the identity one obtains that

and thus, letting denote the absolute value operator and letting for all , and using the triangle inequality we have

where for all .

We shall now prove that and . Indeed, notice that for . Moreover, since due to Abramowitz [1] we have and it follows that

and thus

On the other hand, since due to Abramowitz [1] we have , it follows from the L’hopital rule that

and therefore, considering we have

and thus

Therefore, combining the obtained proportionality with the proportionalities proved in Theorem 3.3 and using Proposition 3.2 we have

Finally, using the proportionality , letting be as in the proof of Theorem 3.3 and using that for , it follows from the proportionalities proved during Theorem 3.3 and from Proposition 3.2 that

which concludes the proof. ∎

In order to sample for the posterior distribution we obtain that the marginal posterior distributions of is given by

and the conditional posterior distribution of is given by

### 3.2 Reference prior

Bernardo [7] discussed a different approach to obtain a new class of objective priors, named as reference priors. Further, many studies were presented to develop formal and rigorous definitions to derive such class of prior distributions under different contexts [4, 5, 6, 2, 3]. The reference prior is obtained by maximizing the Kullback-Leibler (KL) divergence assuming some regularity conditions. The idea of the expected posterior information to the prior allows the data to have the maximum influence on the posterior distributions. The reference priors have essential properties such as consistent sampling, consistent marginalization, and one-to-one transformation invariance [8]. The reference priors may depend on the order of the parameters of interest. Hence, for the gamma distribution, we have two distinct priors that are presented below.

#### 3.2.1 Reference prior when is the parameter of interest

The reference prior when is the parameter of interest and is the nuisance parameter is given by

(13) |

Thus, using the Jacobian transformation it follows that the related reference prior is given by

(14) |

Finally, the joint posterior distribution for and , produced by the reference prior (21), is given by

(15) |

###### Theorem 3.5.

The posterior density (15) is proper for all .

###### Proof.

Doing the change of variables , denoting and proceeding analogously as in the proof of Theorem 3.3 we have

where for all . Now, according to [25, 28], we have and , and since we proved in Theorem 3.3 that , it follows from Proposition 3.2 that

Moreover, from Abramowitz [1] we have , which combined with implies in . Therefore it follows that . and by Proposition 3.2 it follows that

which concludes the proof. ∎

###### Theorem 3.6.

The posterior mean of relative to (15) is finite for all .

###### Proof.

Proceeding analogously as in the proof of Theorem 3.4 it follows that

where is the same as defined in the proof of Theorem 3.4 and

Since in the proof of Theorem 3.4 we showed that , together with the proportionalities proved in Theorem 3.3 and Proposition 3.2 we have

Finally, from the proof of Theorem 3.5 we know that , which implies directly that , and thus from Proposition 3.2 it follows that

which concludes the proof. ∎

The marginal posterior distributions of is given by

Moreover, the conditional posterior distribution of is given by

#### 3.2.2 Reference prior when is the parameter of interest

The reference prior when is the parameter of interest and is the nuisance parameter is given by

(16) |

Therefore, in terms of the reparametrized model, the reference prior when is the parameter of interest and is the nuisance parameter is given by

(17) |

Finally, the joint posterior distribution for and , produced by the reference prior (17) is given by

(18) |

###### Theorem 3.7.

The posterior density (18) is proper for all .

###### Proof.

Since it follows that for all and

Comments

There are no comments yet.