One can see the huge articles on higher moments of portfolio optimization as a spectrum of assumptions which varies from the most general and weakest assumptions to the most restricted and strong assumptions. Classification of these articles can be done in different contexts. An important context is whether we use parametric modeling or nonparametric modeling. It is interesting that these two paradigms have also been discussed in other fields of research such as machine learning where kernel methods such as kernel support vector machine(kernel-SVM) or nonlinear kernel dimensionality reduction is a nonparametric approach while methods like variational auto-encoder(VAE) is an example of parametric approaches. These two different paradigms has its root in statistics where we have parametric and nonparametric approach to solve statistical problems. An abstract way of explaining these two paradigms is well understood of we consider parametric approach as a way to make a finite dimensional approximation of probability distributions of returns, while in the nonparametric approach, no explicit form of distribution is assumed and our ignorance on reality avoids us to make strong assumptions of reality. In reality, space of distributions on asset returns are extremely complex. Thus, using parametric distributions builds a finite dimensional structure on the space of return distributions and therefore on the moments of the returns which are used in the objective function of portfolio optimization. The importance of utility function is well understood specially if a taylor series expansion of utility around the expected return is written to realize the connection of utility with higher moments such as skewness and kurtosis. Thus,
is the variance of expected return, and is the third moment about the mean of expected return which is skewness. The following utility functions are used in the literature:
However, the third utility function is used more relatively for its nice properties. Taking the expectation of utility after Taylor expansion would produce:
Thus, the following optimization problem is produced:
where stands for Kronecker product and, , , (Glawischnig and Seidl, 2013) has used an iterative approach to solve this objective function without using polynomial goal programming(PGP). They start with equal to twenty and then reduce it in each step until two to avoid short selling of more than one hundred percent. The algorithm is multi-start to get robust results. The trick that (Glawischnig and Seidl, 2013) has used is that in the first iteration the skewness and kurtosis is calculated and then fixed to make the full optimization problem a valid quadratic optimization to be solved by quadratic solvers, but this iterative approach that changes the nature of the optimization problem and neglects the third order polynomials and 4th order polynomials could result a big difference. (Glawischnig and Seidl, 2013),(Jondeau and Rockinger, 2006) almost use similar ideas and their approach has an advantage over PGP since the weights of the optimization are naturally formed by the taylor approximation of utility rather than arbitrary selection that is used in PGP.
2 Parametric and nonparametric modeling
The motivation comes from many different intuitions such as constraining the structure of return distributions of financial securities and approximate the complex nature of asset returns. (Adcock, 2014)
uses coherent multivariate probability distribution assumption for asset returns to be able to use stein’s lemma to create the mean-variance-skewness efficient hyper-surface. So the assumption is that the asset returns are defined by the convolution of a multivariate elliptically symmetric distribution and a multivariate distribution of non-negative random variables so that the efficient portfolios could be computed using quadratic programming on the efficient surface. Many researchers are amazed by the Multivariate skew normal (MSN) which was first introduced in(Azzalini, 1996) and was applied in finance in (Adcock and Shutes, 1999) is the center of many portfolio selection methods. One of the motivations of using MSN is the simplicity of the maximization of utility, specially if utility is an exponential function of return as explained in (Adcock, 2005),(Landsman et al., 2019), the drawback of such an approach is explained in (Adcock, 2005) when the preference parameter is too high or very small. But the bigger disadvantage of such an approach is simply shaping the mean-variance-skewness efficient surface by choosing an exponential utility function. Another drawback is that the generalization of this approach for kurtosis and higher moments is not straightforward. A generalization of MSN could be seen in (SAHU et al., 2003) where an analytic forms of densities are obtained and there distributional properties are studied. These parametric modeling approaches to model distributions for returns are very diverse and many researchers have noticed that such as (Jondeaua and Rockinger, 2003) that uses a generalized Student-t distribution or (Mencia and Sentan, 2009)
which uses location scale mixture of normals and using maximum likelihood to infer the parameters. The log-normal distribution has also been used in the literature to model asset returns, but its skewness is a function of the mean and the variance, not a separate skewness parameter. An interesting example which combines parametric modeling(log normal) of portfolio distribution with goal programming is(Chang et al., 2008b). (Glawischnig and Seidl, 2013),Jondeau and Rockinger (2006) are examples of nonparametric approach to portfolio optimization in higher moments since no distribution is assumed. The problem of skewness term in the optimization is that it makes the optimization a non-convex problem and therefore not tractable. This is the motivation of (Konno et al., 1993) that without any assumption on the form of probability distribution of returns tries to approximate the third order term due to skewness by a piecewise linear approximation, however this approximation should be tested on more experimental data to see to what extent the approximation is valid. (Konno et al., 1998) uses a similar approach and resolves the third order nonlinearity of skewness by representing it with a difference of two convex functions and then using branch and bound to solve the mean-variance-skewness portfolio optimization problem. A more detailed and visual derivation of (Konno et al., 1998) can be seen in (Konno and Suzuki, 1995),(Konno and Yamamoto, 2005). The problem with ideas in (Konno et al., 1993),(Konno et al., 1998),(Konno and Suzuki, 1995),(Konno and Yamamoto, 2005) is that it can not be generalized to higher moments easily and the order of complexity will be high. Another example for nonparametric higher moment portfolio optimization is based on the concept of shortage function and the geometric representation of mean-variance-skewness portfolio is illustrated in (Kerstens et al., 2011)
3 A priori, interactive and posteriori
The literature on higher moment portfolio optimization can be classified in a different context. In this context, it is important how the preferences are presented in the optimization process and therefore three different paradigms are discussed in the literature namely a priori, interactive and posteriori. A priori methods such as goal programming and utility function method are used when the preference of investor are known beforehand. In goal programming, first all objective optimization problems are solved regardless of other objectives. Then a final scalar objective optimization is solved that has some weights that need to be fixed. Even if all possible weights are checked, still some Pareto efficient solutions may be missing. Examples of using goal programming for higher moments portfolio optimization are (Aksarayli and Pala, 2018),(Bergh and van Rensburg, 2008). Although it is a simple algorithm, using appropriate weights is a debate. It can also produce solutions that are not Pareto efficient, and some algorithm is needed to project them back to the Pareto efficient solutions. Goal programming has many variants as explained in (Aouni et al., 2014),(Tamiz et al., 2013). 1- Lexicographic 2- weighted, see for example (Chang, 2011) 3- polynomial, see for example (Chang et al., 2008a),(Chunhachinda et al., 1997),(Davies et al., 2009),(Lai, 1991),(Mhiri and Prigent, 2010),(Proelss and Schweizer, 2014) and also (Ghahtarani and Najafi, 2013)
if robustness with respect to some coefficients of the optimization is a concern as well. 4- stochastic 5- fuzzy Another paradigm is to use posteriori methods, where the target is finding all Pareto efficient frontier. Methods such as linear weighting method, weighted geometric mean and Normal Boundary Intersection (NBI)(Audet et al., 2008), Modified Normal Boundary Intersection (MNBI), Normal Constraint, Multiple-objective Branch-and-Bound,epsilon-constraints method and Pascoletti Serafini scalarization. Finally, there is interactive paradigm where the investor iteratively solves the optimization and gets feedback from solutions to find the Pareto optimal solutions. An example of using epsilon-constraints for portfolio optimization can be seen in (Xidonas et al., 2010a),(Xidonas et al., 2010b),(Xidonas et al., 2011) which also includes an interactive filtering process to consider investor preferences. Methods like NBI are computationally expensive since for each iteration, an optimization problem should be solved but on the other hand has a geometrical intuition which can also be used for other methods as is explained in (Ghane-Kanafi and Khorram, 2015)
4 Pascoletti Serafini Scalarization (SP)
. The novelty of their approach is on conditions to bound the parameter a on a restricted hyperplane, but it can only be used in two dimensions and therefore not applicable to our four dimensional objective function that covers skewness and kurtosis as well.(Khorram et al., 2014) attempts to find the restricted set for parameter a but only the trivial point of zero for parameter a is considered and also the simple EP cone is considered.
4.1 Shortage Function (SF)
(Briec et al., 2004) introduced the shortage function as a portfolio performance measure in the traditional mean-variance portfolio framework. The original shortage function is computed by solving the following problem:
where g is the direction vector and the effect of choosing g is well described in(Kerstens et al., 2012). A good explanation of how shortage function can be useful for mean-variance-skewness portfolio framework is explained in (Briec et al., 2007). The connection between shortage function and NBI method is explained in the present paper, however the connection between shortage function and polynomial goal programming (PGP) is well described in (Briec et al., 2013). Now, the following special case of shortage function is defined which is called modified shortage function (MSF) and is useful to show its connection with NBI method. subsection
Modified shortage function(MSF) is defined as the solution of the following optimization problem
2 NBI and SP
In this section, the equivalence of NBI and MSF are proved. One of the most important scalarization technique is normal boundary intersection method (NBI) which is as follows
Different optimization problems in (4.3) for different , having are solved. Here denotes the so-called ideal point and the matrix consists of the columns and the vector is defined as normal unit vector to the hyperplane directing to the negative orthant. The problem of this method is that not all minimal points can be found as a solution of NBI. The following lemma shows the direct connection between NBI and modified SP.
(Eichfelder, 2008) A point () is a maximal solution of NBI with , , if and only if is a minimal solution of with and
3 NBI and MSF
Another connection is between NBI and modified shortage function as is proved in the next proposition:
Modifed shortage function scalarization is equivalent to NBI by the following substitutions:
proof is straightforward by a simple substitution. ∎
4 SP and SF
Shortage function scalarization is equivalent to pascoletti serafini scalarization by the following substitutions:
Since and the reference point can be set as any point in , substituting the reference point(a) and the direction(r) in SP constaint which is and choosing the cone K to be trivial the proof is complete. ∎
5 NBI and goal programming
It is shown here that the popular goal programming(PGP) method which is widely used in portfolio optimization literature has close connection with NBI and is defined as follows: subsection
Polynomial goal programming (PGP) is defined as:
A solution to NBI problem is also a solution to PGP portfolio optimization problem.
Since is the solution of NBI it satisfies the first order KKT conditions:
where represents the multipliers corresponding to the 3 equality constraints namely return,variance and skewness constraint. On the other hand, the first oder KKT condition for PGP can be decomposed for two different sets of coordinates. Stationarity equations for the first set of coordinates results:
Now stationarity with respect to second set of coordinates yields:
Now expanding in 4.7 generates:
Simplifying 4.8 would produce:
Equivalently, is the solution of PGP problem and since F and X and in NBI problem are the same as Z and in PGP problem and for any solution and lagrange multipliers there exists equivalent ones by suitable substitutions for and like (4.12) the proof is complete. ∎
There are some quality measures that can be used to figure out which algorithm is better than the other. These measures are well explained in (Eichfelder, 2008) namely coverage error(), uniformity level() and cardinality(number of points)
These three measures are conflicting in nature and another multiobjective optimization on top of the main optimization problem may be formed for a rigorous analysis. Satisfying these type of measures are what most researchers refer to as adative considerations. In another context, there are two important aspects in any multiobjective optimzation namely accuracy and diversity. The former forces the solutions to converge to pareto frontier while the latter makes the efficient set equidistanced as much as possible. There are many other measures in the literature such as hypervolume indicator but implementing some of them makes the algorithm very slow and even convergence of them are not proved.
So far, mostly Pascoletti Serafini methods are discussed in the present paper but there are other ideas in the literature as depicted in Figure 2. Set oriented methods steer a set of solutions at each iterations such as (Hernandez et al., 2013) for cell mapping method or (Dellnitz et al., 2005) for subdivision algorithms. In the present paper, two adaptive algorithm for higher moment multiobjective portfolio optimization are given. The first one is an adaptive epsilon constraint method while the second one is not based on scalarization and has its root in (Hillermeier, 2001) but is recently developed by many researchers as in (Martín and Schutze, 2018) , (Schutze et al., 2020) and also generalized for problems having inequality constraints in (Beltran et al., 2020). Both of the proposed algorithms in this section are based on KKT conditions but the approaches are slightly different. Both methods are designed to produce equidistance pareto frontier points. The equidistance parameter in the first algorithm is while in the second algorithm it is called to mimick the variables in the related historical articles. So both methods are adaptive in the sense of equal distance points on efficient frontier but no other considerations are taken for the delta and cardinality quality measures since they are expected to produce good results, otherwise they add to the complexity of the algorithms.The first algorithm is based on Epsilon constraint which is a special case of Pascoletti Serafini while the second algorithm is based on continuation methods and the connections are well shown in Figure 2.
1 Adaptive Epsilon Constraint
Adaptive ECS is described as
The scalar optimization problem (5.1) can be formulated as
It is proved by Theorem 2.27 in (Eichfelder, 2008) how it is possible to relate SP to epsilon-constraint method via lagrange multipliers. So using the following substitutions for and r, if is a minimal solution of (5.2) . Thus, (5.2) is equivalent to SP(a,r) with the following substitutions for and r :
The full algorithm is shown in algorithm 1.
The simulation results for adapative epsilon constraint method is illustrated in Figure 3. As is explained in the previous section constraint method is just a special case of SP and the following algorithm is used for multiobjective portfolio optimization having three objectives namely return, variance and skewness.
2 Adaptive Multi-start Pareto Tracer
There are two main ideas in this approach: 1-KKT condition in single objective is generalized to multiobjective as is described in (Hillermeier, 2001). 2-A predictor corrector idea which first predicts the next move in decision space and then corrects it by a multiobjective gradient descent. The algorithm 2 is a modification of (Martín and Schutze, 2018) by doing it in a multi start way and shaping the objective space distribution and customizing it for portfolio optimization of three objectives namely mean, variance and skewness. Consider the multiobjective optimization problem defined bellow:
where F is a vector of objectives . F in 5.4 is actually a three dimensional vector of mean,variance and skewness and decision space has dimension n which refers to number of assets or factors in a multifactor investment framework. h in 5.4 is the constraint that the sum of allocations to different assets should be one. A predictor corrector method is developed in (Hillermeier, 2001) by considering
The set of KKT points of 5.4 is contained in the null set of which is the idea behind many continuation methods along as written in 5.5. So a simple representation for the tangent vectors to Pareto set can be written as
Thus the vector in 5.6 can be expressed as
where in (5.7) is defined as
and J is defined by
Now d has a special meaning that expresses the first order approximated movement in objective space for infinitesimal step sizes and is defined below
Now any possibility for selecting d will generate a different distribution on pareto front. Since the resulting are tangents to Pareto set and the aim is making even spread of points on Pareto front, directions d should be selected such that it makes orthonormal basis of tangent space to Pareto front at F(x). One of the natural ways to do this is utilizing QR factorization of and by selecting
can be solved. Now it is possible to obtain the predictor to make an evenly distributed set of solutions along the Pareto front the following approximation for t can be chosen
The predictor part explained so far cares about going through the tangent direction in Pareto set to create an equidistanced set of Pareto front points. The next part is the corrector which cares about convergence issues and is explained in (Fliege et al., 2009)and the present paper implements it for the corrector part of the algorithm, but there are other methods that can be used for this step such as (Povalej, 2014) which approximates the second derivative matrices instead of evaluating them although both methods have superlinear rate of convergence. The corrector part implemented in the present paper is based in minimization of the following optimization problem.
Thus, the full predictor-corrector algorithm is shown in Algorithm 2.
The paradigms in higher moment multiobjective optimization are critically reviewed and the connection between some of them are explained in the present paper. It has been proved that shortage function method can be seen as a Pascoletti Serafini scalarization. Finally, two algorithms for portfolio optimization are suggested. The first one is based on scalarization paradigm and is called adaptive epsilon constraint method while the second one is a type of continuation method and is called adaptive multistart Pareto Tracer which bundles different local solutions to provide a global Pareto Front by both exploration and exploitation.
7 Future Works
. So instead of a fixed cone K for ordering, a variable cone is considered and each point could have a different ordering corresponding to a different cone. The second suggested algorithm can be further developed by hybridizing it with evolutionary algorithms such as genetic algorithm to make it faster. Another line of research which could be theoretically interesting is to see if the continuation method suggested in the second algorithm could be seen as generalization of Pascoletti Serafini ,since the continuation framework is directly working on KKT conditions for a multiobjective optimization problem while scalarization takes advantage of KKT condition of a mono objective problem.
- Mean-variance-skewness efficient surfaces. European Journal of operational research 234, pp. 392–401. Cited by: §2.
Portfolio selection based on the multivariate skew normal distribution. Financial Modeling 11, pp. 167–177. Cited by: §2.
- Exploiting skewness to build an optimal hedge fund with a currency overlay. The European Journal of Finance 11, pp. 445–462. Cited by: §2.
- A polynomial goal programming model for portfolio optimization based on entropy and higher moments. Expert systems with applications 94, pp. 185–192. Cited by: §3.
- Financial portfolio management through the goal programming model: current state-of-the-art. European Journal of Operational Research 234, pp. 536–545. Cited by: §3.
- Multiobjective optimization through a series of single objective formulations. Journal of optimization 19, pp. 188–210. Cited by: §3.
- The multivariate skew normal distribution. Biometrika 83, pp. 715–726. Cited by: §2.
- The pareto tracer for general inequality constrained multi-objective optimization problems. Mathematical and Computational Applications 25 (4). Cited by: §5.
- Hedge funds and higher moment portfolio selection. Journal of Derivatives and Hedge Funds 14, pp. 269–297. Cited by: §3.
- Mean variance skewness portfolio performance gauging: a general shortage function and its dual approach. Management science 53, pp. 135–149. Cited by: §4.1.
- Single period markowitz portfolio selection, performace gauging and duality, a variation on luenberger shortage function. Journal of optimization theory and applications 120, pp. 1–27. Cited by: §4.1.
- Portfolio selection with skewness: a comparison of methods and a generalized one fund result. European journal of operational research 230, pp. 412–421. Cited by: §4.1.
- Multi-choice goal programming with utility functions. European Journal of Operational Research 215, pp. 439–445. Cited by: §3.
- Effect of intervalling and skewness on portfolio selection in developed and developing markets. Applied Financial Economics 18, pp. 1697–1707. Cited by: §3.
- Optimum allocation of weights to assets in a portfolio: the case of nominal annualization versus effective annualization of returns. Applied Financial Economics 18, pp. 1635–1646. Cited by: §2.
- Portfolio selection and skewness: evidence from international stock markets. Journal of banking and finance 21, pp. 143–167. Cited by: §3.
- Fund of hedge funds portfolio selection a multiple-objective approach. Journal of Derivatives and Hedge Funds 15, pp. 91–115. Cited by: §3.
- Covering pareto sets by multilevel evolutionary subdivision techniques. Journal of Optimization Theory and Applications 124, pp. 113–136. Cited by: §5.
- Adaptive scalarization methods in multiobjective optimization. springer. Cited by: Lemma 2.1, §4, §1, §5.
- Numerical procedures in multiobjective optimization with variable ordering structures. Journal of Optimization Theory and Applications 42, pp. 1–26. Cited by: §7.
- Variable ordering structures in vector optimization. springer. Cited by: §7.
- Newton’s method for multiobjective optimization. SIAM Journal on Optimization 20, pp. 602–626. Cited by: §2.
- Robust goal programming for multi objective portfolio selection problem. Economic Modelling 33, pp. 588–592. Cited by: §3.
- A new scalarization method for finding the efficient frontier in non-convex multi-objective problems. applied mathematical modelling 39, pp. 7483–7498. Cited by: §3.
- Portfolio optimization with serially correlated, skewed and fat tailed index returns. CEJOR 21, pp. 153–176. Cited by: §1, §2.
- Simple cell mapping method for multi-objective optimal feedback control design. International Journal of Dynamics and Control 1, pp. 231–238. Cited by: §5.
- Nonlinear multiobjective optimization: a generalized homotopy approach. Vol. 25, International Series on Numerical Mathematics. Basel: Birkhauser. Cited by: §2, §5.
- Optimal portfolio allocation under higher moments. European Financial Management 12, pp. 29–55. Cited by: §1, §2.
- Conditional volatility, skewness, and kurtosis: existence, persistence, and comovements. Journal of Economic Dynamics and Control 27, pp. 1699–1737. Cited by: §2.
- Geometric representation of the mean–variance–skewness portfolio frontier based upon the shortage function. European journal of the operational research 210, pp. 81–94. Cited by: §2.
- Benchmarking mean-variance portfolios using a shortage function: the choice of direction vector affects rankings. Journal of the operational research society 63, pp. 1199–1212. Cited by: §4.1.
- A numerical method for constructing the pareto front of multi-objective optimization problems. Journal of Computational and Applied Mathematics 261, pp. 158–171. Cited by: §4.
- A mean-absolute deviation-skewness portfolio optimization model. Annals of operational research 45, pp. 205–220. Cited by: §2.
- A mean-variance-skewness portfolio optimization model. Journal of operations research 38, pp. 173–187. Cited by: §2.
- A branch and bound algorithm for solving mean-risk-skewness portfolio models. Optimization methods and software 10, pp. 297–317. Cited by: §2.
- A mean-variance-skewness model: algorithm and applications. International Journal of Theoretical and Applied Finance 8, pp. 409–423. Cited by: §2.
- Portfolio selection with skewness. Review of quantitative finance and accounting 1, pp. 293–305. Cited by: §3.
- Analytic solution to the portfolio optimization problem in a mean-variance-skewness model. The European Journal of Finance 1, pp. 165–178. Cited by: §2.
- Pareto tracer: a predictor–corrector method for multi-objective optimization problems. Engineering Optimization 50, pp. 516–536. Cited by: §2, §5.
- Multivariate location–scale mixtures of normals and mean–variance–skewness portfolio allocation. Journal of Econometrics 153, pp. 105–121. Cited by: §2.
- International portfolio optimization with higher moments. International journal of economics and finance 2, pp. 157–169. Cited by: §3.
- Scalarizing vector optimization problems. Optimization Theory and Applications 42, pp. 499–524. Cited by: §4.
- Quasi-newton’s method for multiobjective optimization. Journal of Computational and Applied Mathematics 255, pp. 765–777. Cited by: §2.
- Polynomial goal programming and the implicit higher moment preferences of us institutional investors in hedge funds. Financial Markets and Portfolio Management 28, pp. 1–28. Cited by: §3.
- A new class of multivariate skew distributions with applications to bayesian regression models. The Canadian Journal of Statistics 31, pp. 129–150. Cited by: §2.
- Pareto explorer: a global/local exploration tool for many-objective optimization problems. Engineering Optimization 52, pp. 832–855. Cited by: §5.
- On selecting portfolio of international mutual funds using goal programming with extended factors. European Journal of Operational Research 226, pp. 560–576. Cited by: §3.
- Equity portfolio construction and selection using multiobjective mathematical programming. J Glob Optim 47, pp. 185–209. Cited by: §3.
- IPSSIS: an integrated multicriteria decision support system for equity portfolio construction and selection. European Journal of Operational Research 210, pp. 398–409. Cited by: §3.
- Portfolio construction on the athens stock exchange: a multiobjective optimization approach. Optimization 59, pp. 1211–1229. Cited by: §3.