Minimum Relative Entropy Inference for Normal and Monte Carlo Distributions

07/13/2020 ∙ by Marcello Colasante, et al. ∙ 0

We represent affine sub-manifolds of exponential family distributions as minimum relative entropy sub-manifolds. With such representation we derive analytical formulas for the inference from partial information on expectations and covariances of multivariate normal distributions; and we improve the numerical implementation via Monte Carlo simulations for the inference from partial information of generalized expectation type.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Inference is ubiquitous in financial applications: stress-testing and scenario analysis, such as in [Mina and Xiao, 2001], explore the consequences of specific market scenarios on the distribution of the portfolio loss. Similar, portfolio construction techniques such as [Black and Litterman, 1990]

inject views on specific factor returns into the estimated distribution of a broad market.

A general approach to perform inference under partial information based on the principle of minimum relative entropy (MRE) was explored in [Meucci, 2010]. In the original paper, the general theory was supported by two applications: an analytical solution under normality, and a numerical algorithm for distributions represented by scenarios, such as Monte Carlo, historical, or categorical.

Here we enhance both the analytical and the numerical implementations of [Meucci, 2010] drawing from results in [Colasante, 2019].

In Section 2 we state well-known results to set the notation and background.

In Section 3, we embed the analytical MRE problem under normality and information on expectations and covariances of arbitrary linear combinations into a broader analytical framework. In computing the solution, we find that the updated expectation in [Meucci, 2010] must be adjusted by a term implied by the information on the covariances.

In Section 4, we address the MRE problem numerically. Most numerical applications of MRE which involve Monte Carlo sampling methods, such as stochastic approximation, or sample path optimization algorithms, see [Schofield, 2007], could be inefficient. On the other hand, the scenario-based MRE algorithm in [Meucci, 2010]

does not entail drawing scenarios, and as such is efficient, but subject to the curse of dimensionality which may affect precision. Here we improve the original scenario-based MRE in

[Meucci, 2010] with an iterative procedure based on Hamiltonian Monte Carlo sampling [Chao et al., 2015], [Neal et al., 2011], thereby achieving more precision.

In Section 5 we present a case study that applies and compares the analytical solution and the numerical algorithm.

Finally, in Section 6 we list the main contributions.

2 Background

In this section we briefly review well-known results, refer to [Jaakkola, 1999],
[Cover and Thomas, 2006], [Amari and Nagaoka, 2000], [Amari, 2016] for more details.


be a target vector with a reference base distribution with support

, as represented by the probability density function (pdf)


that needs to be estimated via historical, maximum likelihood, GMM etc. Let be a random vector of inference input variables, on which we have new information. Without loss of generality, we can assume that the inference input variables are transformation of the target variables


for a suitable multivariate function . In applications, the number of target variables is typically much larger than the number of inference variables


Inference amounts to assessing the impact of some information, or subjective views, on the distribution of , which can be expressed as constraints on the distribution of the inference variables


which in general are violated by the base distribution (1).

The principle of minimum relative entropy (MRE) is a standard approach to inference with partial information. Let us denote the relative entropy between distributions as follows


Then, according to the MRE, the updated inferred distribution is the closest to the base (1)


which at the same time satisfies the information constraints (4) induced by the inference variables, or .

In particular, here we consider information (4) expressed in terms of expectation


where is a vector and is an arbitrary function. The equality conditions (7) cover a wide range of practical applications, such as information on volatilities, correlations, tail behaviors, etc. More general inequality constraints are also tractable, but beyond the scope of this article.

Then the MRE updated distribution (6) belongs to the exponential family class


which means the pdf reads


where is the log-partition function


According to (8) the sufficient statistics are the information functions specifying the inference input variables (2); the expectation parameters are the features quantifying the information constraints (7); and the natural parameters are the Lagrange multipliers of the MRE problem (6)-(7), which are related to the expectation parameters via the Legendre transform of the log-partition, or link function


The key to obtain the MRE updated distribution (8) are the Lagrange multipliers (11). However solving (11) is not feasible in general.

3 Analytical results

To obtain analytical results, we make two further assumptions:

  • The base distribution (1) is of an exponential family class


    for a reference measure , natural parameters within a parameter domain , sufficient statistics .

  • The information is of expectation type (7) and linear in the sufficient statistics


    for a matrix .

Then, the MRE updated distribution (8) is a “curved” sub-family of the same exponential family class as the base [A.1]


where the new natural parameters are an affine transformation (and thus not literally “curved”) of the optimal Lagrange multipliers


as long as .

3.1 Categorical distribution

For a trivial example of the result (15), let us consider for the base (1

) a scenario-probability distribution (or generalized categorical distribution)

, which belongs to a specific exponential family class (12)


where are joint scenarios for

; the canonical parameters are the multi-logit transformation of the scenarios probabilities

, which are positive and sum to one; and the sufficient statistics are the one-hot encoding functions, see e.g.

[Amari, 2016]. In this framework, any expectation conditions as in (7) can be expressed as linear statements in the sufficient statistics (15)


where .

Then, from (14), the MRE updated distribution (8) must be a scenario-probability distribution as the base (16)


but with new probabilities , as follows from (15)


for any . This leads to the numerical MRE algorithm for scenario-probability distributions in [Meucci, 2008], which we use in Section 4.

3.2 Normal distribution

For a non-trivial instance of the result (15), let us consider the special case of (12)-(13) that generalizes the parametric MRE in [Meucci, 2008] and corrects an error therein.

More precisely, let us assume that the base (1) is a normal distribution, which belongs to a specific exponential family class (12)


where the canonical coordinates are suitable transformations of the expectation vector and the covariance matrix


and where sufficient statistics are pure linear and quadratic functions


Then let us consider MRE inference as in (6)


under information on linear combinations of expectations and covariances


where is a full-rank matrix; is a vector; is a full-rank matrix; and is a symmetric and positive definite matrix.

The inference constraints in the MRE problem (23) are not of expectation type (13). However, we can use a two-step approach to leverage this result.

First, we consider all the possible expectation constraints (13) compatible with the information (24)


for any vector ; and the related MRE optimization


Because of the expectation constraints (13), for any the solution must be normal due to (14), and we can compute it analytically [A.2]


for a suitable function and same updated covariance matrix


where is a (right) pseudo-inverse matrix for


Second, we compute the optimal vector that minimizes the relative entropy


which turns out to be a simple quadratic programming problem in [A.4]. Then the updated distribution (23) must be normal as in (27) [A.4]


with updated expectation as follows


where is a (right) pseudo-inverse matrix for


and where is an vector defined as follows


In the special case of uncorrelated information variables under the base distribution (20)


the updated expectation (32) simplifies as [A.5]


where the last term on the right hand side is a correction to [Meucci, 2010].

4 Numerical results

We consider base distributions (1) whose analytical expression is known, possibly up to multiplicative constant term


for some known analytical function , which we call “numerator”.

Efficient Markov chain Monte Carlo (MCMC) techniques are available to draw scenarios from the broad class (

37), see [Chib and Greenberg, 1995] and [Geweke, 1999]


In particular, in our implementations we chose Hamiltonian Monte Carlo sampling [Chao et al., 2015], [Neal et al., 2011].

With general inference of expectation type (7), the MRE updated distribution (8) is an exponential tilt of the base distribution (8) and therefore it has again an analytical expression, up to a constant


for optimal Lagrange multipliers that solve (11). Therefore, if we we can compute or approximate , we can draw scenarios from the updated distribution [A.6].

An efficient algorithm to compute an approximate updated distribution and approximate Lagrange multipliers is the discrete MRE [Meucci, 2008]


The quality of the approximation (40) can be measured by the discrete relative entropy caused by the information perturbation, or, equivalently, the exponential of its negative counterpart, i.e. the effective number of scenarios in [Meucci, 2012]


The approximation in general is poor for problems of large dimensions : because the scenarios are the same as the base scenarios, when the information constraints (7) are strongly violated by the base distribution (37), the curse of dimensionality forces a few scenarios to carry most of the probability, which amounts to a too low effective number of scenarios (41). Instead, because of the low dimension of the information constraints (3), the approximate Lagrange multipliers are much more accurate. Here we show how to exploit this feature to obtain accurate representations of the updated distribution.

To this purpose, let us write the exact updated numerator (39) as


which can be interpreted as an MRE tilt as in (39), but with a new base


and a new Lagrange multipliers


As long as the information conditions (7) are fixed, the true MRE updated distribution (39) is the same if we replace the original base (37) with the new one (43) [A.7]


Moreover, when the information constraints (7) contradicts the base distribution (37), and hence , the new base (43) is closer to the target than the base (37)


because the numerical MRE multipliers (40) are close to the true ones .

Hence, we can generate new scenarios from the updated base (43)


and use the simulation output as input for the discrete MRE algorithm (40) to obtain new multipliers and new probabilities


The quality of the approximation (48) is better than the original output (40), because here the starting point is closer to the MRE updated distribution (46) and thus the curse of dimensionality is mitigated. Furthermore, the new output respects the inference constraints (7) exactly


unlike the simulation input (47).

Then we can update the Lagrange multipliers


and iterate (47)-(48). Convergence in the above routine occurs when the effective number of scenarios (41) falls above a given threshold


where .

We summarize the iterative MRE in the following table.

0. Initialize numerator
1. Generate new scenarios
2. Perform discrete MRE (48)
3. Update Lagrange multipliers
4. Update numerator (43)
5. Check convergence (51)
6. If convergence, output ; else go to 1
Table 1: Iterative MRE algorithm.

5 A case study

We consider target variables with normal base distribution (20)


and homogeneous expectations, standard deviations


and homogeneous correlations


Then we consider information constraints (4) as follows


Figure 1: MRE updated distribution under normal base (52) and inference constraints (55). In green the location-dispersion ellipsoid and simulations from the base distribution. In orange and red the location-dispersion ellipsoids stemming from the first and second step simulations via iterative approach (Table 1), respectively. In black the location-dispersion ellispoid of the analytical solution and the third-step simulations.

Also, we assume that the constraints on correlations (55

) do not alter the respective first and second moments of the variables

, so that we can rewrite the information (55) as expectation conditions (7)


We simulate scenarios with uniform probabilities (38) from the normal base distribution (52). Then, from the base scenarios and the information (56) we compute the MRE updated distribution (39) using the iterative numerical routine (1). The routine reaches convergence in three steps with a threshold (51).

Equivalently, we can express the information (56) as constraints on linear combinations of expectations and covariances as in (24), where:

  • is a matrix as follows

  • is a vector as follows

  • is a matrix as follows

  • is a matrix as follows


Then, from the base normal distribution (52) and the information (56), we compute analytically the MRE updated distribution (6), which is normal (31)


where the updated expectations (32) and standard deviations (28) read


and the updated correlations (28) read


In Figure 1 we report the results of numerical and analytical approaches, and in the following Table 2 we summarize the errors between the respective statistics.

Table 2: Iterative MRE: effective number of scenarios and errors.

6 Conclusions

In this article we showed how to solve analytically and numerically the MRE problem under exponential-family base distributions and partial information constraints of expectation type as in (13).

Under normal base distributions, we computed analytically the MRE solution (31) and fixed the formulation of the updated expectation originally proposed by [Meucci, 2010].

Under more general base distributions, we showed how to compute numerically the MRE solution via iterative Hamiltonian Monte Carlo simulations (Table 1) yielding a better approximation of the updated distribution than the original scenario-based algorithm in [Meucci, 2010].


  • [Amari and Nagaoka, 2000] Amari, S. and Nagaoka, H. (2000). Methods of Information Geometry. American Mathematical Society.
  • [Amari, 2016] Amari, S.-i. (2016). Information Geometry and Its Applications, volume 194. Springer.
  • [Black and Litterman, 1990] Black, F. and Litterman, R. (1990). Asset allocation: combining investor views with market equilibrium. Goldman Sachs Fixed Income Research.
  • [Chao et al., 2015] Chao, W.-L., Solomon, J., Michels, D., and Sha, F. (2015). Exponential integration for hamiltonian monte carlo. In

    Proceedings of the 32nd International Conference on Machine Learning (ICML-15)

    , pages 1142–1151.
  • [Chib and Greenberg, 1995] Chib, S. and Greenberg, E. (1995). Understanding the Metropolis-Hastings algorithm. The American Statistician, 49:327–335.
  • [Colasante, 2019] Colasante, M. (2019). Essays in Minimum Relative Entropy implementations for views processing. PhD thesis, Università di Bologna.
  • [Cover and Thomas, 2006] Cover, T. M. and Thomas, J. A. (2006). Elements of Information Theory. Wiley, 2nd edition.
  • [Geweke, 1999] Geweke, J. (1999). Using simulation methods for Bayesian econometric models: Inference, development and communication. Econometric Reviews, 18:1–126.
  • [Jaakkola, 1999] Jaakkola, T. (1999). Maximum entropy estimation. Machine learning seminar notes.
  • [Magnus and Neudecker, 1979] Magnus, J. R. and Neudecker, H. (1979). The commutation matrix: Some properties and applications. Annals of Statistics, 7:381–394.
  • [Meucci, 2008] Meucci, A. (2008). Fully Flexible Views: Theory and practice. Risk, 21(10), 97-102.
  • [Meucci, 2010] Meucci, A. (2010). The Black-Litterman approach: Original model and extensions. The Encyclopedia of Quantitative Finance, Wiley.
  • [Meucci, 2012] Meucci, A. (2012). Effective number of scenarios with Fully Flexible Probabilities. GARP Risk Professional, 45-46.
  • [Mina and Xiao, 2001] Mina, J. and Xiao, J. (2001). Return to RiskMetrics: The evolution of a standard. RiskMetrics publications.
  • [Neal et al., 2011] Neal, R. M. et al. (2011). Mcmc using hamiltonian dynamics. Handbook of Markov Chain Monte Carlo, 2(11).
  • [Schofield, 2007] Schofield, E. (2007). Fitting maximum-entropy models on large sample spaces. PhD thesis, Graz University of Technology Austria.

Appendix A Appendix

Here we discuss some technical results of Sections 3 and 4.

a.1 MRE with exponential-family base

Consider a base distribution (1) in the exponential family class as in (12), where , and hence with the following pdf


where denotes the log-partition function as in (10)


Then the updated distribution (9) reads


where in the second row we used the linearity of the inference functions with respect to the sufficient statistics  as in (13); and where we defined


as in (15). Then, as long as , the log-partition functions (65) satisfy


which implies our desired result (14).

a.2 MRE update with normal base and information on non-central moments

The pdf of the normal base distribution as in (20) can be written in canonical form within the exponential family class (20) as follows


where and identify the base canonical coordinates (21); and where log-partition function (65), with respect to the reference measure and sufficient statistics (22) reads