1 Introduction
Throughout, we let denote the set of Borel probability measures on . For , we abuse notation slightly and define , where denotes Euclidean length on . Thus,
is the usual variance in dimension
; it is the trace of the covariance matrix corresponding to for arbitrary dimension . A probability measure is said to be logconcave if for convex . All logarithms are taken with respect to the natural base.Our results are best stated within the general framework of parametric statistics. To this end, we let be a dominated family of probability measures on a measurable space ; with dominating finite measure . To each , we associate a density (w.r.t. ) according to
The Fisher information of the parametric family evaluated at is defined as
where denotes gradient with respect to . Note that is distinct from the information theorist’s Fisher information , defined as
for a probability measure having density with respect to Lebesgue measure. In the special case where is a location parameter, the two quantities coincide.
For a realvalued parameter and an observation , the basic question of parametric statistics is how well can one estimate from . Here, the CramérRao bound is of central importance in proving lower bounds on estimation error, stating that
(1) 
for any unbiased estimator
. The assumption of unbiasedness is quite restrictive, especially since unbiased estimators may not always exist, or may be less attractive than biased estimators for any one of a variety of reasons (computability, performance, etc.). Under the assumption that the parameter is distributed according to some prior , the socalled Bayesian CramérRao bound [1, 2] (also known as the van Trees inequality) states, under mild regularity assumptions, that(2) 
where the expectation is over and, conditioned on , . As noted by Tsybakov [3, Section 2.7.3], this inequality is quite powerful since it does not impose any restriction on unbiasedness, is relatively simple to apply, and often leads to sharp results (including sharp constants). Tsybakov states that one primary disadvantage of (2) is that it applies only to loss. Although it does not appear to be widely known, this is actually not true. Indeed, Efroimovich proved in [4] that
(3) 
which is stronger than (2) by the maximumentropy property of Gaussians. Efroimovich’s inequality can be rearranged to give an upper bound on the mutual information
Such a general upper bound on can be useful in settings beyond those where (2) applies. For example, it can be used to give one direction of the key estimate in Clarke and Barron’s work showing that Jeffrey’s prior is least favorable [5]. It can also be applied to characterize Bayes risk measured under losses other than when coupled with a lower bound on mutual information (see, e.g., [6]). We remark that several systematic techniques exist for lower bounding the mutual information
in terms of Bayes risk (e.g., Fano’s method, or the Shannon lower bound for the rate distortion function), so finding a good upper bound is often the challenge. A typical heuristic is to bound
from above by the capacity of the channel , but this method has the disadvantages that (i) it discards information about the prior ; and (ii) capacity expressions are only explicitly known for very special parametric families (e.g., Gaussian channels). Efroimovich’s inequality overcomes both of these obstacles, but has the undesirable property of being degenerate when . This can be a serious disadvantage in applications since many natural priors have infinite Fisher information, for example uniform measures on convex bodies^{1}^{1}1Mollification may be a useful heuristic to compensate for infinite in low dimensions, but this becomes fundamentally problematic in high dimensions where mollification picks up dimensional dependence, and generally alters the boundary of a set where the measure concentrates..Contributions
We make two main contributions, which we describe in rough terms here. Precise statements are given in Section 2. First, we establish a family of Bayesian CramérRaotype bounds indexed by probability measures that satisfy a logarithmic Sobolev inequality on . This generalizes Efroimovich’s inequality (3), which corresponds to the special case where the reference measure is taken to be Gaussian. Second, we specialize the first result to obtain an explicit Bayesian CramérRaotype bound under the assumption of a logconcave prior . In dimension one, the result implies
(4) 
provided ; a correction is needed if this condition is not met^{2}^{2}2It is easy to see why a condition like this is needed: if there were no such assumption, then we could let approximate a point mass, effectively showing that the CramérRao bound holds – up to an absolute constant – for any estimator. This clearly can not be true (consider constant, not equal to ). (see Theorem 2 for a precise statement). In particular,
holds under our assumptions for a universal constant , regardless of whether is biased. This should be compared to the classical CramérRao bound: morally speaking, (1) continues to hold (up to a modest constant factor) for any estimator , provided we are working with a logconcave prior which, together with , satisfies . Note that the crucial (and somewhat surprising) advantage relative to (3) is that the Fisher information does not appear.
Organization
2 Main Results
2.1 Assumptions
As is typical of CramérRaotype bounds, our main results require us to assume some mild regularity. In particular, for a given measure , we will refer to the following standard condition on the densities associated to :
(5) 
where denotes the gradient with respect to . We remark that this holds whenever the orders of differentiation with respect to and integration with respect to can be exchanged (Liebniz rule).
2.2 Statement of Results
Our first main result establishes a family of CramérRaotype bounds on the mutual information in terms of logarithmic Sobolev inequalities on . To this end, we recall the standard definitions of relative entropy and relative Fisher information (the parlance in which logarithmic Sobolev inequalities are framed). Consider , with and . The entropy of , relative to , is defined as
If the density is weakly differentiable, the Fisher information of , relative to , is defined according to
If is not weakly differentiable, we adopt the convention that so that our expressions make sense even in the general case.
A probability measure is said to satisfy a logarithmic Sobolev inequality with constant (or, for short) if, for all probability measures ,
The standard Gaussian measure on is a prototypical example of a measure that satisfies an LSI, and does so with constant . More generally, if with for and the identity matrix, then satisfies [7]; this result is known as the BakryÉmery theorem, and we shall need it later in the proof of Theorem 2.
With these definitions in hand, our first result is the following:
Theorem 1.
Let satisfy and assume the regularity condition (5) holds. For any probability measure on ,
(6) 
Inequality (6) improves the LSI for . Indeed, taking independent of renders , so that the LSI for is recovered. However, the proof of (6) follows from a relatively simple application of the LSI for and some basic calculus, so the two inequalities should be viewed as being formally equivalent in this sense.
Clearly, the statement of Theorem 1 allows us the freedom to choose the measure so as to obtain the tightest possible bound on . However, a notable example is obtained when is taken to be the standard Gaussian measure on . In this case, upon simplification we obtain
(7) 
Of note, (7) is not invariant to rescalings of the parameter . So, just as one passes from Lieb’s inequality to the entropy power inequality, we may optimize over all such scalings to obtain the following multidimensional version of (3):
Remark 1.
Efroimovich’s work [4] contains a slightly stronger multidimensional form, stated in terms of determinants of Fisher information matrices. As defined, our Fisher information quantities and
correspond to traces of the same matrices, leading to a weaker inequality by the arithmeticgeometric mean inequality. Nevertheless, the two inequalities should really be regarded as essentially equivalent, as they are both direct consequences of the onedimensional inequality (where the two results coincide). See
[4, Proof of Theorem 5] for details. It is unclear whether a similar claim holds for nonGaussian in (6).We remark that (3) was discovered by Efroimovich in 1979, but does not appear to be widely known (we could not find a statement of the result outside the Russian literature). At the time of Efroimovich’s initial discovery of (3), the study of logarithmic Sobolev inequalities was just getting started, being largely initiated by Gross’s work on the Gaussian case in 1975 [8]. In particular, the derivation of (3) (and, less generally, the van Trees inequality) from the Gaussian logarithmic Sobolev inequality does not appear to have been observed previously. So, from a conceptual standpoint, one contribution of Theorem 1 is that it demonstrates how Efroimovich’s result (and the weaker van Trees inequality) emerges as one particular instance in the broader context of LSIs which, to our knowledge, have not found direct use in parametric statistics beyond their implications for measure concentration (see, e.g., [9]).
A nontrivial consequence of Theorem 1 is a general CramérRaotype bound on , assuming only that is logconcave. Specifically, our second main result is the following:
Theorem 2.
Assume the parametric family satisfies (5) for equal to Lebesgue measure. Let satisfy for some scalar , where is the identity matrix. Define , . It holds that
(8) 
where
Remark 2.
The onedimensional inequality (4) follows directly from Theorem 2 for
, combined with the entropy lower bound for logconcave random variables
due to Marsiglietti and Kostina [10]. Similar statements hold for general dimension, albeit with a correction factor that depends on dimension (no correction is needed if the hyperplane conjecture is true; see
[11]).The upper bound (8) should be viewed as a function of two nonnegative quantities: the products and . By the BrascampLieb inequality [12], we always have ; this quantity only depends on the prior and distills what quantitative information is known about its degree of logconcavity. In particular, if is only known to be logconcave, then gives . In the other extreme case, if (e.g., if is scaled standard Gaussian), we have the slightly improved bound . These bounds both essentially behave as for modestly large, so knowledge of (i.e., additional information about the measure ) only significantly affects the behavior of the upper bound (8) for small. To be precise, for near zero, the upper bound behaves as when , and if . Applications in asymptotic statistics consider a sequence of observations , conditionally independent given . In this case, grows linearly with , so that the logarithmic behavior of the bound dominates, regardless of what is known about .
Let us now make a brief observation on the sharpness of Theorem 2. To this end, consider the classical Gaussian sequence model , where is independent of . In this case, the typical quantity of relevance is the signaltonoise ratio , in terms of which we have the sharp upper bound
(9) 
Thus, in view of the previous discussion, we clearly see that Theorem 2 provides a sharp estimate in the regime where is moderately large. We do not yet know whether the bound is sharp for small and , but we believe that it should be.
Finally, we remark that all results have correct dependence on dimension, as can be seen by testing on product measures.
2.3 Remarks on Applications
Applications of CramérRaotype bounds to parameter estimation are numerous, and our results will generally apply in Bayesian settings. In particular, we believe corollaries such as (4) may be especially useful for proving lower bounds on Bayes risk when the prior is logconcave.
We note that our results are quite general in form, and therefore not restricted to applications in parametric statistics. To give one quick example, consider logconcave , normalized so that , and define , where are drawn i.i.d. according to . Then, an immediate corollary of Theorem 2 is that, for sufficiently large,
which is a sort of reverse entropy power inequality, holding for logconcave random vectors. This improves a result of Cover and Zhang
[13] for sufficiently large, in which the leading coefficient in parentheses on the right is . This inequality should also be compared to the formulation of the hyperplane conjecture recently put forth by Marsiglietti and Kostina [14].3 Proofs
This section contains the proofs of main results.
3.1 Proof of Theorem 1
We may assume that the RHS of equation (6) is finite; else the claim is trivially true. Let , and note that is the joint density of with respect to . Define , and , which is welldefined a.e. Now, since satisfies , we have for a.e.
where we write in place of for brevity. Integrating both sides with respect to the density , we have
Now, observe that
where the penultimate identity follows by the product rule for derivatives and expanding the square. The final cross term is integrable; indeed, CauchySchwarz yields
The exchange of integrals to obtain the last line is justified by Tonelli’s theorem. Therefore, by Fubini’s theorem,
where the last equality follows by the regularity assumption. Summarizing, we have
To finish, we observe that
which proves the claim.
3.2 Proof of Theorem 2
We require the following proposition, the proof of which is the most arduous part of the argument. The ideas of the proof are independent from Theorem 2, so it is deferred to the appendix.
Proposition 1.
Let be a probability density on , with convex.

For each , there exists a unique such that

For as in part (i), and each
To begin the proof, consider the logconcave density , where . For , let be the probability measure with density
where is a normalizing constant and is such that , which exists as a consequence of Proposition 1(i). Note that has density with respect to . Therefore, we may readily compute
By the BakryEmery theorem, satisfies , so it follows from Theorem 1 that
By Proposition 1(ii) and the inequality
holding by definition of , we have
(10) 
where are as defined in the statement of the theorem. Since the above holds for arbitrary , we now particularize by (optimally) choosing
if , and otherwise choosing
It can be verified that if , then this choice of ensures . On the other hand, if , then this choice of ensures . Hence, substitution into equation (10) and simplifying yields:
where is defined piecewise according to
This bound is actually better than what is stated in the theorem, but is clearly a bit cumbersome. Since , we note the simpler (yet, still essentially as good) bound holding for in the range , completing the proof
Acknowledgement
This work was supported in part by NSF grants CCF1704967, CCF0939370 and CCF1750430.
References
 [1] R. D. Gill and B. Y. Levit. Applications of the van Trees inequality: a Bayesian CramérRao bound. Bernoulli, 1(12):59–79, 1995.
 [2] H. L. van Trees. Detection, estimation, and modulation theory, part I: detection, estimation, and linear modulation theory. John Wiley & Sons, 1968.
 [3] A. B. Tsybakov. Introduction to Nonparametric Estimation. SpringerVerlag New York, 2009.
 [4] S. Y. Efroimovich. Information contained in a sequence of observations (in russian). Problems in Information Transmission, 15(3):24–39, 1979.
 [5] B. S. Clarke and A. R. Barron. Jeffreys’ prior is asymptotically least favorable under entropy risk. Journal of Statistical planning and Inference, 41(1):37–60, 1994.

[6]
Y. Wu.
Lecture notes for informationtheoretic methods for highdimensional statistics, July 2017.
 [7] D. Bakry and M. Émery. Diffusions hypercontractives. In Séminaire de Probabilités XIX 1983/84, pages 177–206. Springer, 1985.
 [8] L. Gross. Logarithmic Sobolev inequalities. American Journal of Mathematics, 97(4):1061–1083, 1975.
 [9] M. Ledoux. The concentration of measure phenomenon. Number 89. American Mathematical Soc., 2001.
 [10] A. Marsiglietti and V. Kostina. A lower bound on the differential entropy of logconcave random vectors with applications. Entropy, 20(3):185, 2018.
 [11] S. Bobkov and M. Madiman. The entropy per coordinate of a random vector is highly constrained under convexity conditions. IEEE Transactions on Information Theory, 57(8):4940–4954, 2011.
 [12] H. J. Brascamp and E. H. Lieb. On extensions of the BrunnMinkowski and PrékopaLeindler theorems, including inequalities for log concave functions, and with an application to the diffusion equation. Journal of Functional Analysis, 22(4):366–389, 1976.
 [13] T. M. Cover and Z. Zhang. On the maximum entropy of the sum of two dependent random variables. IEEE Transactions on Information Theory, 40(4):1244–1246, 1994.
 [14] A. Marsiglietti and V. Kostina. New connections between the entropy power inequality and geometric inequalities. In 2018 IEEE International Symp. on Information Theory (ISIT), pages 1978–1982. IEEE, 2018.
Appendix
This appendix contains the proof of the following extended version of Proposition 1. It may be of independent interest.
Lemma 1.
Let be a probability density on , with convex.

For each , there exists a unique such that

For each , the map
has a unique global maximum at .

The map is continuous on . In particular, for each , there is a neighborhood of and such that for all .

For as in part (i), and each ,
Remark 3.
An intuitive interpretation is as follows: If we convolve a logconcave density with a Gaussian of variance , then the point of maximum likelihood of the resulting density (call it ) is unique, and changes smoothly as we adjust . The last part of the lemma gives a lower bound on the likelihood at . The only real surprise is the fact that is also the barycenter of the density proportional to , which is part (i) of the claim.
The proof of Lemma 1 starts by showing that the map defined by
is a contraction with respect to the usual Euclidean metric. Then, the claims follow from the wellknown Banach fixedpoint theorem:
Lemma 2 (Banach Fixed Point Theorem).
Let be a complete metric space, and let satisfy for all , where . Then has a unique fixed point . Moreover, if and , , then
(11) 
So, to begin, let denote the probability measure with density proportional to . We note that cannot split off an independent Gaussian factor with variance . Indeed, if this were the case, then after suitable change of coordinates, we could assume splits off an independent Gaussian factor of variance in the first coordinate, so that
for some . Rearranging, this yields for some constant . This would imply is not integrable in coordinate , a contradiction. Thus, we must have
for some . This follows from the BrascampLieb inequality, and the fact that Gaussians are the only extremizers.
By differentiating the th coordinate of at , we see that
Hence, the Jacobian of has entries . Recalling the variance inequality above,
so that is a contraction as claimed. Hence, the desired existence and uniqueness of follows from the Banach Fixed Point Theorem.
To prove the second claim, note that for any and ,
The strict inequality holds since is a contraction and for . Thus, for any not equal to , the map is strictly increasing on , so that achieves a unique global maximum at as claimed.
Toward proving the third claim, we first note that (ii) proved above yields a uniform bound on for all . In particular,
Since
is logconcave, it has finite moments of all orders, and we conclude