1 Introduction
Inference is ubiquitous in financial applications: stresstesting and scenario analysis, such as in [Mina and Xiao, 2001], explore the consequences of specific market scenarios on the distribution of the portfolio loss. Similar, portfolio construction techniques such as [Black and Litterman, 1990]
inject views on specific factor returns into the estimated distribution of a broad market.
A general approach to perform inference under partial information based on the principle of minimum relative entropy (MRE) was explored in [Meucci, 2010]. In the original paper, the general theory was supported by two applications: an analytical solution under normality, and a numerical algorithm for distributions represented by scenarios, such as Monte Carlo, historical, or categorical.
Here we enhance both the analytical and the numerical implementations of [Meucci, 2010] drawing from results in [Colasante, 2019].
In Section 2 we state wellknown results to set the notation and background.
In Section 3, we embed the analytical MRE problem under normality and information on expectations and covariances of arbitrary linear combinations into a broader analytical framework. In computing the solution, we find that the updated expectation in [Meucci, 2010] must be adjusted by a term implied by the information on the covariances.
In Section 4, we address the MRE problem numerically. Most numerical applications of MRE which involve Monte Carlo sampling methods, such as stochastic approximation, or sample path optimization algorithms, see [Schofield, 2007], could be inefficient. On the other hand, the scenariobased MRE algorithm in [Meucci, 2010]
does not entail drawing scenarios, and as such is efficient, but subject to the curse of dimensionality which may affect precision. Here we improve the original scenariobased MRE in
[Meucci, 2010] with an iterative procedure based on Hamiltonian Monte Carlo sampling [Chao et al., 2015], [Neal et al., 2011], thereby achieving more precision.In Section 5 we present a case study that applies and compares the analytical solution and the numerical algorithm.
Finally, in Section 6 we list the main contributions.
2 Background
In this section we briefly review wellknown results, refer to [Jaakkola, 1999],
[Cover and Thomas, 2006], [Amari and Nagaoka, 2000], [Amari, 2016] for more
details.
Let
be a target vector with a reference base distribution with support
, as represented by the probability density function (pdf)
(1) 
that needs to be estimated via historical, maximum likelihood, GMM etc. Let be a random vector of inference input variables, on which we have new information. Without loss of generality, we can assume that the inference input variables are transformation of the target variables
(2) 
for a suitable multivariate function . In applications, the number of target variables is typically much larger than the number of inference variables
(3) 
Inference amounts to assessing the impact of some information, or subjective views, on the distribution of , which can be expressed as constraints on the distribution of the inference variables
(4) 
which in general are violated by the base distribution (1).
The principle of minimum relative entropy (MRE) is a standard approach to inference with partial information. Let us denote the relative entropy between distributions as follows
(5) 
Then, according to the MRE, the updated inferred distribution is the closest to the base (1)
(6) 
which at the same time satisfies the information constraints (4) induced by the inference variables, or .
In particular, here we consider information (4) expressed in terms of expectation
(7) 
where is a vector and is an arbitrary function. The equality conditions (7) cover a wide range of practical applications, such as information on volatilities, correlations, tail behaviors, etc. More general inequality constraints are also tractable, but beyond the scope of this article.
Then the MRE updated distribution (6) belongs to the exponential family class
(8) 
which means the pdf reads
(9) 
where is the logpartition function
(10) 
According to (8) the sufficient statistics are the information functions specifying the inference input variables (2); the expectation parameters are the features quantifying the information constraints (7); and the natural parameters are the Lagrange multipliers of the MRE problem (6)(7), which are related to the expectation parameters via the Legendre transform of the logpartition, or link function
(11) 
3 Analytical results
To obtain analytical results, we make two further assumptions:
Then, the MRE updated distribution (8) is a “curved” subfamily of the same exponential family class as the base [A.1]
(14) 
where the new natural parameters are an affine transformation (and thus not literally “curved”) of the optimal Lagrange multipliers
(15) 
as long as .
3.1 Categorical distribution
For a trivial example of the result (15), let us consider for the base (1
) a scenarioprobability distribution (or generalized categorical distribution)
, which belongs to a specific exponential family class (12)(16) 
where are joint scenarios for
; the canonical parameters are the multilogit transformation of the scenarios probabilities
, which are positive and sum to one; and the sufficient statistics are the onehot encoding functions, see e.g.
[Amari, 2016]. In this framework, any expectation conditions as in (7) can be expressed as linear statements in the sufficient statistics (15)(17) 
where .
Then, from (14), the MRE updated distribution (8) must be a scenarioprobability distribution as the base (16)
(18) 
but with new probabilities , as follows from (15)
(19) 
for any . This leads to the numerical MRE algorithm for scenarioprobability distributions in [Meucci, 2008], which we use in Section 4.
3.2 Normal distribution
For a nontrivial instance of the result (15), let us consider the special case of (12)(13) that generalizes the parametric MRE in [Meucci, 2008] and corrects an error therein.
More precisely, let us assume that the base (1) is a normal distribution, which belongs to a specific exponential family class (12)
(20) 
where the canonical coordinates are suitable transformations of the expectation vector and the covariance matrix
(21) 
and where sufficient statistics are pure linear and quadratic functions
(22) 
Then let us consider MRE inference as in (6)
(23) 
under information on linear combinations of expectations and covariances
(24) 
where is a fullrank matrix; is a vector; is a fullrank matrix; and is a symmetric and positive definite matrix.
The inference constraints in the MRE problem (23) are not of expectation type (13). However, we can use a twostep approach to leverage this result.
First, we consider all the possible expectation constraints (13) compatible with the information (24)
(25) 
for any vector ; and the related MRE optimization
(26) 
Because of the expectation constraints (13), for any the solution must be normal due to (14), and we can compute it analytically [A.2]
(27) 
for a suitable function and same updated covariance matrix
(28) 
where is a (right) pseudoinverse matrix for
(29) 
Second, we compute the optimal vector that minimizes the relative entropy
(30) 
which turns out to be a simple quadratic programming problem in [A.4]. Then the updated distribution (23) must be normal as in (27) [A.4]
(31) 
with updated expectation as follows
(32) 
where is a (right) pseudoinverse matrix for
(33) 
and where is an vector defined as follows
(34) 
In the special case of uncorrelated information variables under the base distribution (20)
(35) 
the updated expectation (32) simplifies as [A.5]
(36) 
where the last term on the right hand side is a correction to [Meucci, 2010].
4 Numerical results
We consider base distributions (1) whose analytical expression is known, possibly up to multiplicative constant term
(37) 
for some known analytical function , which we call “numerator”.
Efficient Markov chain Monte Carlo (MCMC) techniques are available to draw scenarios from the broad class (
37), see [Chib and Greenberg, 1995] and [Geweke, 1999](38) 
In particular, in our implementations we chose Hamiltonian Monte Carlo sampling [Chao et al., 2015], [Neal et al., 2011].
With general inference of expectation type (7), the MRE updated distribution (8) is an exponential tilt of the base distribution (8) and therefore it has again an analytical expression, up to a constant
(39) 
for optimal Lagrange multipliers that solve (11). Therefore, if we we can compute or approximate , we can draw scenarios from the updated distribution [A.6].
An efficient algorithm to compute an approximate updated distribution and approximate Lagrange multipliers is the discrete MRE [Meucci, 2008]
(40) 
The quality of the approximation (40) can be measured by the discrete relative entropy caused by the information perturbation, or, equivalently, the exponential of its negative counterpart, i.e. the effective number of scenarios in [Meucci, 2012]
(41) 
The approximation in general is poor for problems of large dimensions : because the scenarios are the same as the base scenarios, when the information constraints (7) are strongly violated by the base distribution (37), the curse of dimensionality forces a few scenarios to carry most of the probability, which amounts to a too low effective number of scenarios (41). Instead, because of the low dimension of the information constraints (3), the approximate Lagrange multipliers are much more accurate. Here we show how to exploit this feature to obtain accurate representations of the updated distribution.
To this purpose, let us write the exact updated numerator (39) as
(42) 
which can be interpreted as an MRE tilt as in (39), but with a new base
(43) 
and a new Lagrange multipliers
(44) 
As long as the information conditions (7) are fixed, the true MRE updated distribution (39) is the same if we replace the original base (37) with the new one (43) [A.7]
(45) 
Moreover, when the information constraints (7) contradicts the base distribution (37), and hence , the new base (43) is closer to the target than the base (37)
(46) 
because the numerical MRE multipliers (40) are close to the true ones .
Hence, we can generate new scenarios from the updated base (43)
(47) 
and use the simulation output as input for the discrete MRE algorithm (40) to obtain new multipliers and new probabilities
(48) 
The quality of the approximation (48) is better than the original output (40), because here the starting point is closer to the MRE updated distribution (46) and thus the curse of dimensionality is mitigated. Furthermore, the new output respects the inference constraints (7) exactly
(49) 
unlike the simulation input (47).
Then we can update the Lagrange multipliers
(50) 
and iterate (47)(48). Convergence in the above routine occurs when the effective number of scenarios (41) falls above a given threshold
(51) 
where .
We summarize the iterative MRE in the following table.
5 A case study
We consider target variables with normal base distribution (20)
(52) 
and homogeneous expectations, standard deviations
(53) 
and homogeneous correlations
(54) 
Then we consider information constraints (4) as follows
(55) 
Also, we assume that the constraints on correlations (55
) do not alter the respective first and second moments of the variables
, so that we can rewrite the information (55) as expectation conditions (7)(56) 
We simulate scenarios with uniform probabilities (38) from the normal base distribution (52). Then, from the base scenarios and the information (56) we compute the MRE updated distribution (39) using the iterative numerical routine (1). The routine reaches convergence in three steps with a threshold (51).
Equivalently, we can express the information (56) as constraints on linear combinations of expectations and covariances as in (24), where:

is a matrix as follows
(57) 
is a vector as follows
(58) 
is a matrix as follows
(59) 
is a matrix as follows
(60)
6 Conclusions
In this article we showed how to solve analytically and numerically the MRE problem under exponentialfamily base distributions and partial information constraints of expectation type as in (13).
Under normal base distributions, we computed analytically the MRE solution (31) and fixed the formulation of the updated expectation originally proposed by [Meucci, 2010].
Under more general base distributions, we showed how to compute numerically the MRE solution via iterative Hamiltonian Monte Carlo simulations (Table 1) yielding a better approximation of the updated distribution than the original scenariobased algorithm in [Meucci, 2010].
References
 [Amari and Nagaoka, 2000] Amari, S. and Nagaoka, H. (2000). Methods of Information Geometry. American Mathematical Society.
 [Amari, 2016] Amari, S.i. (2016). Information Geometry and Its Applications, volume 194. Springer.
 [Black and Litterman, 1990] Black, F. and Litterman, R. (1990). Asset allocation: combining investor views with market equilibrium. Goldman Sachs Fixed Income Research.

[Chao et al., 2015]
Chao, W.L., Solomon, J., Michels, D., and Sha, F. (2015).
Exponential integration for hamiltonian monte carlo.
In
Proceedings of the 32nd International Conference on Machine Learning (ICML15)
, pages 1142–1151.  [Chib and Greenberg, 1995] Chib, S. and Greenberg, E. (1995). Understanding the MetropolisHastings algorithm. The American Statistician, 49:327–335.
 [Colasante, 2019] Colasante, M. (2019). Essays in Minimum Relative Entropy implementations for views processing. PhD thesis, Università di Bologna.
 [Cover and Thomas, 2006] Cover, T. M. and Thomas, J. A. (2006). Elements of Information Theory. Wiley, 2nd edition.
 [Geweke, 1999] Geweke, J. (1999). Using simulation methods for Bayesian econometric models: Inference, development and communication. Econometric Reviews, 18:1–126.
 [Jaakkola, 1999] Jaakkola, T. (1999). Maximum entropy estimation. http://people.csail.mit.edu/tommi/papers.html. Machine learning seminar notes.
 [Magnus and Neudecker, 1979] Magnus, J. R. and Neudecker, H. (1979). The commutation matrix: Some properties and applications. Annals of Statistics, 7:381–394.
 [Meucci, 2008] Meucci, A. (2008). Fully Flexible Views: Theory and practice. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1213325. Risk, 21(10), 97102.
 [Meucci, 2010] Meucci, A. (2010). The BlackLitterman approach: Original model and extensions. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1117574. The Encyclopedia of Quantitative Finance, Wiley.
 [Meucci, 2012] Meucci, A. (2012). Effective number of scenarios with Fully Flexible Probabilities. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1971808. GARP Risk Professional, 4546.
 [Mina and Xiao, 2001] Mina, J. and Xiao, J. (2001). Return to RiskMetrics: The evolution of a standard. RiskMetrics publications.
 [Neal et al., 2011] Neal, R. M. et al. (2011). Mcmc using hamiltonian dynamics. Handbook of Markov Chain Monte Carlo, 2(11).
 [Schofield, 2007] Schofield, E. (2007). Fitting maximumentropy models on large sample spaces. PhD thesis, Graz University of Technology Austria.
Appendix A Appendix
a.1 MRE with exponentialfamily base
Consider a base distribution (1) in the exponential family class as in (12), where , and hence with the following pdf
(64) 
where denotes the logpartition function as in (10)
(65) 
Then the updated distribution (9) reads
(66) 
where in the second row we used the linearity of the inference functions with respect to the sufficient statistics as in (13); and where we defined
(67) 
as in (15). Then, as long as , the logpartition functions (65) satisfy
(68) 
which implies our desired result (14).
a.2 MRE update with normal base and information on noncentral moments
The pdf of the normal base distribution as in (20) can be written in canonical form within the exponential family class (20) as follows
(69) 
where and identify the base canonical coordinates (21); and where logpartition function (65), with respect to the reference measure and sufficient statistics (22) reads
Comments
There are no comments yet.