Inference is ubiquitous in financial applications: stress-testing and scenario analysis, such as in [Mina and Xiao, 2001], explore the consequences of specific market scenarios on the distribution of the portfolio loss. Similar, portfolio construction techniques such as [Black and Litterman, 1990]
inject views on specific factor returns into the estimated distribution of a broad market.
A general approach to perform inference under partial information based on the principle of minimum relative entropy (MRE) was explored in [Meucci, 2010]. In the original paper, the general theory was supported by two applications: an analytical solution under normality, and a numerical algorithm for distributions represented by scenarios, such as Monte Carlo, historical, or categorical.
In Section 2 we state well-known results to set the notation and background.
In Section 3, we embed the analytical MRE problem under normality and information on expectations and covariances of arbitrary linear combinations into a broader analytical framework. In computing the solution, we find that the updated expectation in [Meucci, 2010] must be adjusted by a term implied by the information on the covariances.
In Section 4, we address the MRE problem numerically. Most numerical applications of MRE which involve Monte Carlo sampling methods, such as stochastic approximation, or sample path optimization algorithms, see [Schofield, 2007], could be inefficient. On the other hand, the scenario-based MRE algorithm in [Meucci, 2010]
does not entail drawing scenarios, and as such is efficient, but subject to the curse of dimensionality which may affect precision. Here we improve the original scenario-based MRE in[Meucci, 2010] with an iterative procedure based on Hamiltonian Monte Carlo sampling [Chao et al., 2015], [Neal et al., 2011], thereby achieving more precision.
In Section 5 we present a case study that applies and compares the analytical solution and the numerical algorithm.
Finally, in Section 6 we list the main contributions.
be a target vector with a reference base distribution with support
, as represented by the probability density function (pdf)
that needs to be estimated via historical, maximum likelihood, GMM etc. Let be a random vector of inference input variables, on which we have new information. Without loss of generality, we can assume that the inference input variables are transformation of the target variables
for a suitable multivariate function . In applications, the number of target variables is typically much larger than the number of inference variables
Inference amounts to assessing the impact of some information, or subjective views, on the distribution of , which can be expressed as constraints on the distribution of the inference variables
which in general are violated by the base distribution (1).
The principle of minimum relative entropy (MRE) is a standard approach to inference with partial information. Let us denote the relative entropy between distributions as follows
Then, according to the MRE, the updated inferred distribution is the closest to the base (1)
which at the same time satisfies the information constraints (4) induced by the inference variables, or .
In particular, here we consider information (4) expressed in terms of expectation
where is a vector and is an arbitrary function. The equality conditions (7) cover a wide range of practical applications, such as information on volatilities, correlations, tail behaviors, etc. More general inequality constraints are also tractable, but beyond the scope of this article.
Then the MRE updated distribution (6) belongs to the exponential family class
which means the pdf reads
where is the log-partition function
According to (8) the sufficient statistics are the information functions specifying the inference input variables (2); the expectation parameters are the features quantifying the information constraints (7); and the natural parameters are the Lagrange multipliers of the MRE problem (6)-(7), which are related to the expectation parameters via the Legendre transform of the log-partition, or link function
3 Analytical results
To obtain analytical results, we make two further assumptions:
where the new natural parameters are an affine transformation (and thus not literally “curved”) of the optimal Lagrange multipliers
as long as .
3.1 Categorical distribution
) a scenario-probability distribution (or generalized categorical distribution), which belongs to a specific exponential family class (12)
where are joint scenarios for
; the canonical parameters are the multi-logit transformation of the scenarios probabilities
, which are positive and sum to one; and the sufficient statistics are the one-hot encoding functions, see e.g.[Amari, 2016]. In this framework, any expectation conditions as in (7) can be expressed as linear statements in the sufficient statistics (15)
but with new probabilities , as follows from (15)
3.2 Normal distribution
where the canonical coordinates are suitable transformations of the expectation vector and the covariance matrix
and where sufficient statistics are pure linear and quadratic functions
Then let us consider MRE inference as in (6)
under information on linear combinations of expectations and covariances
where is a full-rank matrix; is a vector; is a full-rank matrix; and is a symmetric and positive definite matrix.
for any vector ; and the related MRE optimization
for a suitable function and same updated covariance matrix
where is a (right) pseudo-inverse matrix for
Second, we compute the optimal vector that minimizes the relative entropy
with updated expectation as follows
where is a (right) pseudo-inverse matrix for
and where is an vector defined as follows
4 Numerical results
We consider base distributions (1) whose analytical expression is known, possibly up to multiplicative constant term
for some known analytical function , which we call “numerator”.
Efficient Markov chain Monte Carlo (MCMC) techniques are available to draw scenarios from the broad class (37), see [Chib and Greenberg, 1995] and [Geweke, 1999]
With general inference of expectation type (7), the MRE updated distribution (8) is an exponential tilt of the base distribution (8) and therefore it has again an analytical expression, up to a constant
An efficient algorithm to compute an approximate updated distribution and approximate Lagrange multipliers is the discrete MRE [Meucci, 2008]
The quality of the approximation (40) can be measured by the discrete relative entropy caused by the information perturbation, or, equivalently, the exponential of its negative counterpart, i.e. the effective number of scenarios in [Meucci, 2012]
The approximation in general is poor for problems of large dimensions : because the scenarios are the same as the base scenarios, when the information constraints (7) are strongly violated by the base distribution (37), the curse of dimensionality forces a few scenarios to carry most of the probability, which amounts to a too low effective number of scenarios (41). Instead, because of the low dimension of the information constraints (3), the approximate Lagrange multipliers are much more accurate. Here we show how to exploit this feature to obtain accurate representations of the updated distribution.
To this purpose, let us write the exact updated numerator (39) as
which can be interpreted as an MRE tilt as in (39), but with a new base
and a new Lagrange multipliers
because the numerical MRE multipliers (40) are close to the true ones .
Hence, we can generate new scenarios from the updated base (43)
and use the simulation output as input for the discrete MRE algorithm (40) to obtain new multipliers and new probabilities
The quality of the approximation (48) is better than the original output (40), because here the starting point is closer to the MRE updated distribution (46) and thus the curse of dimensionality is mitigated. Furthermore, the new output respects the inference constraints (7) exactly
unlike the simulation input (47).
Then we can update the Lagrange multipliers
We summarize the iterative MRE in the following table.
5 A case study
We consider target variables with normal base distribution (20)
and homogeneous expectations, standard deviations
and homogeneous correlations
Then we consider information constraints (4) as follows
Also, we assume that the constraints on correlations (55
) do not alter the respective first and second moments of the variables, so that we can rewrite the information (55) as expectation conditions (7)
We simulate scenarios with uniform probabilities (38) from the normal base distribution (52). Then, from the base scenarios and the information (56) we compute the MRE updated distribution (39) using the iterative numerical routine (1). The routine reaches convergence in three steps with a threshold (51).
is a matrix as follows
is a vector as follows
is a matrix as follows
is a matrix as follows
and the updated correlations (28) read
In this article we showed how to solve analytically and numerically the MRE problem under exponential-family base distributions and partial information constraints of expectation type as in (13).
- [Amari and Nagaoka, 2000] Amari, S. and Nagaoka, H. (2000). Methods of Information Geometry. American Mathematical Society.
- [Amari, 2016] Amari, S.-i. (2016). Information Geometry and Its Applications, volume 194. Springer.
- [Black and Litterman, 1990] Black, F. and Litterman, R. (1990). Asset allocation: combining investor views with market equilibrium. Goldman Sachs Fixed Income Research.
[Chao et al., 2015]
Chao, W.-L., Solomon, J., Michels, D., and Sha, F. (2015).
Exponential integration for hamiltonian monte carlo.
Proceedings of the 32nd International Conference on Machine Learning (ICML-15), pages 1142–1151.
- [Chib and Greenberg, 1995] Chib, S. and Greenberg, E. (1995). Understanding the Metropolis-Hastings algorithm. The American Statistician, 49:327–335.
- [Colasante, 2019] Colasante, M. (2019). Essays in Minimum Relative Entropy implementations for views processing. PhD thesis, Università di Bologna.
- [Cover and Thomas, 2006] Cover, T. M. and Thomas, J. A. (2006). Elements of Information Theory. Wiley, 2nd edition.
- [Geweke, 1999] Geweke, J. (1999). Using simulation methods for Bayesian econometric models: Inference, development and communication. Econometric Reviews, 18:1–126.
- [Jaakkola, 1999] Jaakkola, T. (1999). Maximum entropy estimation. http://people.csail.mit.edu/tommi/papers.html. Machine learning seminar notes.
- [Magnus and Neudecker, 1979] Magnus, J. R. and Neudecker, H. (1979). The commutation matrix: Some properties and applications. Annals of Statistics, 7:381–394.
- [Meucci, 2008] Meucci, A. (2008). Fully Flexible Views: Theory and practice. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1213325. Risk, 21(10), 97-102.
- [Meucci, 2010] Meucci, A. (2010). The Black-Litterman approach: Original model and extensions. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1117574. The Encyclopedia of Quantitative Finance, Wiley.
- [Meucci, 2012] Meucci, A. (2012). Effective number of scenarios with Fully Flexible Probabilities. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1971808. GARP Risk Professional, 45-46.
- [Mina and Xiao, 2001] Mina, J. and Xiao, J. (2001). Return to RiskMetrics: The evolution of a standard. RiskMetrics publications.
- [Neal et al., 2011] Neal, R. M. et al. (2011). Mcmc using hamiltonian dynamics. Handbook of Markov Chain Monte Carlo, 2(11).
- [Schofield, 2007] Schofield, E. (2007). Fitting maximum-entropy models on large sample spaces. PhD thesis, Graz University of Technology Austria.
Appendix A Appendix
a.1 MRE with exponential-family base
where denotes the log-partition function as in (10)
Then the updated distribution (9) reads
where in the second row we used the linearity of the inference functions with respect to the sufficient statistics as in (13); and where we defined
which implies our desired result (14).
a.2 MRE update with normal base and information on non-central moments