Strategic Bayesian Asset Allocation

by   Vadim Sokolov, et al.

Strategic asset allocation requires an investor to select stocks from a given basket of assets. Bayesian regularization is shown to not only provide stock selection but also optimal sequential portfolio weights. The perspective of the investor is to maximize alpha risk-adjusted returns relative to a benchmark index. Incorporating investor preferences with regularization is related to the approach of Black (1992) and Puelz (2015). Tailored MCMC algorithms are developed to calculate portfolio weights and perform selection. We illustrate our methodology with an application to stock selection from the SP100, and the top fifty holdings of Renaissance Technologies and Viking Global hedge fund portfolios. Finally, we conclude with directions for future research.



page 1

page 2

page 3

page 4


Switching Portfolios

A constant rebalanced portfolio is an asset allocation algorithm which k...

Deep Learning: A Bayesian Perspective

Deep learning is a form of machine learning for nonlinear high dimension...

Volumetric Dimensioning of Strategic Stock; SOP, a step further toward a better flow control

This paper is presenting a real case study. It focuses on a drug stock m...

A Model for Daily Global Stock Market Returns

Most stock markets are open for 6-8 hours per trading day. The Asian, Eu...

Model selection in sparse high-dimensional vine copula models with application to portfolio risk

Vine copulas allow to build flexible dependence models for an arbitrary ...

Discovering Bayesian Market Views for Intelligent Asset Allocation

Along with the advance of opinion mining techniques, public mood has bee...

Information Coefficient as a Performance Measure of Stock Selection Models

Information coefficient (IC) is a widely used metric for measuring inves...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Strategic asset allocation requires an investor to select stocks from a given basket of assets wth a goal of outperforming a benchmark index by a margin alpha. We propose a Bayesian regularization based on returns and volatility that simultaneously performs assets selection and optimal portfolio allocaiton. Our approach builds on the sequential asset allocation framework of Black and Litterman (1992) which incorporates investors’ preferences on the returns and the stock selection approach by Puelz et al. (2015) who uses the methodology of decoupling shrinkage and selection (DSS) Hahn and Carvalho (2015) to select stocks using regularized linear model.

Our methodology differs from traditional stock selection techniques. First, we recast the portfolio selection problem as an input-output regularization problem with a goal to optimize expected returns subject to regularization such as required number of stocks in the portfolio. Second, we use sparsity inducing prior distributions such as spike-and-slab horseshoe. The output from our algorithm provides a natural ordering of assets to include into the optimal portfolio and this can be dynamically tracked in time.

A related approach relies on regularization with penalties which has been well studied in the portfolio context. -based approaches have their shortcomings, including over-shrinkage and the inability to recover sparse signals for highly dependent data. To illustrate this issue, several authors have proposed non-convex approaches. Gasso et al. (2009) and Giuzio and Paterlini (2018) use penalties to address the issue of highly dependent data and allocate portfolios during a crisis. Other non-convex penalties include smoothly clipped absolute deviations (SCAD) Fan and Li (2001) and its linear approximation Zhang et al. (2009). Bridge (Polson et al., 2014) or penalty Frank and Friedman (1993) is a generalization of more widely used (LASSO) and (Ridge) penalties. We review those approaches in A.

Traditional mean-variance selection approach was proposed by

Markowitz (1952)

and is optimal under the assumption that historical means and variances will hold for the future. In practice, one estimates those moments from historical data. Markowitz portfolio selection approach is sensitive of the to errors in the estimated mean and variance, that makes its empirical applications limited. This is improved by Bayesian weight shrinkage

Polson and Tew (2000) and further improved by sparsity inducing priors that impose investors’ believes. Black and Litterman (1992) proposed using Gaussian prior to perform the shrinkage and frame this approach in the context of combining quantitative and subjective beliefs into predictive model.

Regularization is the central tool to allow investors to perform stock selection. This requires a selection of a norm, on the portfolio weights. Several authors have applied regularization to the problem of portfolio allocation. DeMiguel et al. (2009) build on the work of Jagannathan and Ma (2003) and Ledoit and Wolf (2004) and propose a general mean-variance portfolio allocation framework in which norm of portfolio weights is constrained. They show duality of constraint-based approach and Bayesian approach in which investor assigns prior distribution for each of the weights. Lobo et al. (2007) show that inclusion of transaction costs make regularized formulation to be non-convex and propose convex relaxations that can be efficiency solved. Brodie et al. (2009) showed that regularization techniques do improve predictive power of statistical models for stock portfolios. Fan et al. (2012) show that regularized mean-variance approach allows to achieve similar performance to the theoretically optimal portfolio while using covariance matrix estimated from a sample.

The contribution of this paper is both methodological and empirical. On methodological side, we develop a new Bayesian optimal asset selection and allocation using mean-variance formulation. On the empirical side we compute optimal asset allocation and selection for both static buy-and-hold and dynamic optimal re-balancing.

We propose and evaluate three sparsity inducing priors for portfolio allocation problem, namely Laplace, horseshoe (Carvalho et al., 2010), and spike-and-slab. We demonstrate empirical performance of those three approaches and use LARS algorithm Efron et al. (2004b) to find posterior mode for the Laplace model, MCMC for horseshoe Hahn et al. (2019), and Single Best Replacement (SBR) Polson and Sun (2017) algorithm for finding posterior mode of the spike-and-slab model.

Our approach addresses the problem of number of assets to be included in portfolio and the problem of over fitting by applying regularization techniques Kandel et al. (1995) used Bayesian analysis to address the problem of sampling error (distribution shifts). Carrasco and Noumon (2011) have applied regularization technique to address the problem of degeneracy of covariance matrix estimated for a large number of assets, using historical sample. Polson and Tew (2000) provide a Bayesian approach to address the problem of accurately estimating covariance matrices for large-scale portfolio problems.

1.1 Connection with Previous Work

Mean-variance portfolio analysis has a long-standing place in financial econometric with ground-breaking work done by De Finetti (1940) and Markowitz (1952). A number of practical considerations, such as preferences to include as few stocks as possible into portfolio still needs further research. Scalability of optimization algorithms and incorporation of transactions costs was addressed by Perold (1984) and a number of authors have provided Bayesian solutions (Barberis, 2000). From a statistical perspective, the predictive model of stock returns used in mean-variance approach over-fits.

Our work builds on other Bayesian stock selection strategies, such as those based on factor modeling, see Black and Litterman (1992); Carvalho et al. (2011); Aguilar and West (2000); Puelz et al. (2015); Getmansky et al. (2015). Our main assumption is predictability Barberis (2000); Kandel and Stambaugh (1996) of stock returns, which is the main justification of an advice that an investor should heavily invest into stocks.

A Bayesian approaches naturally allows an investor to incorporate uncertainty about mean-variance parameters. For example, Carvalho et al. (2011) addressed the problem of change in covariance estimates by dynamically updating it as new observations arrive (Jacquier and Polson, 2012; Polson and Tew, 2000). Robust minimax optimization techniques were recently proposed to account for uncertainty in covariance matrix and to solve for the worst-case scenario, see Ismail and Pham (2019). Puelz et al. (2015) use Bayesian techniques to design a mean-variance portfolio with small number of assets and analyzes the trade off between optimality and number of assets to be included. Kozak et al. (2018) design sparse factor models for analysis of large number of cross-sectional stock returns. Jacquier and Polson (2012) proposed decision-theoretic framework for asset allocations that relies on Bayesian analysis, see Jacquier and Polson (2010); Avramov and Zhou (2010); Polson and Tew (2000) for further discussion.

2 Strategic Bayesian Asset Allocation

2.1 Portfolio Regularization

A traditional mean-variance portfolio optimization problem assumes that returns at time of asset , of each of the assets in a portfolio follow a distribution with mean and covariance matrix . Therefore, the returns of each asset are a weak-stationary stochastic process. Let, be the price of asset at time . The investor’s objective is to minimize the variance (risk) of the portfolio, while having a guaranteed return

. A portfolio is defined by a vector of weight (allocations)

. Thus, the variance of the portfolio is given by and the mean is . Then, the optimal portfolio is found by solving the following optimization problem

subject to

where weight is the amount of asset held throughout the period.

Positivity constraint is added to guarantee that only long positions are to be included in the portfolio. Jagannathan and Ma (2003) show that adding non-negative weight constraint is equivalent to shrinking elements of the covariance matrix, which leads to reduced risk portfolio and more stable allocations. It follows directly from KKT (Karush-Kuhn-Tucker) optimality condition for the portfolio optimization problem with an additional constraints and

Here ’s and ’s are Lagrange multipliers. Thus, solving constrained problem is equivalent to solving the unconstrained problem with , where , and . Thus, the Lagrange multipliers associated with constraint are effectively shrink the elements of the covariance matrix . More specifically is reduced by , and is reduced by .

In order to apply Bayesian inference algorithms, we re-cast the optimization problem as a hierarchical Bayesian linear model. Let the return of the portfolio at time

is given by the return vector . Then, the empirical estimate for the variance of the portfolio is given by

The empirical risk is given by . We re-write risk minimization objective as a least-squares problem

subject to

Here .

The return matrix is typically ill-conditioned. This leads to unstable numerical solution of the above problem. It is a usual problem when assets are highly correlated, then columns of matrix become almost linearly dependent and the matrix becomes ill-conditioned. One approach to stabilize the solution and to find sparse portfolios is to add a regularization penalty to the objective function Brodie et al. (2009). Another, interpretation of the penalized objective Puelz et al. (2015) is that it allows to incorporate investor’s preference with regards to number of stocks to be included into portfolio

subject to

We do not include the positivity constraint into our regularized formulation and allow for short positions in the portfolio. A regularization penalty added to the objective function allows to stabilize portfolio Brodie et al. (2009). Thus, positive weight constrain can be excluded in a regularized formulation. In order to satisfy the constraint, modify the problem by subtracting first column of the matrix from other columns and then estimate liner coefficients of the modified problem and finally calculate .

To simplify notation, denote the penalized empirical risk as . The corresponding Lagrangian dual function associated with the optimization problem with constraint excluded is given by

The dual function yields lower bounds on the optimal portfolio . For any , we have

For a specific value of , we have . Since term does not depend on , we re-write the problem as


We then select the value of the dual variables and using cross-validation.

The first widely used model with a regularization term was proposed by Black and Litterman (1992). Black and Litterman (BL) model uses quadratic regularization term, which can be interpreted as a mechanism to integrate quantitative and traditional portfolio building strategies. The BL model assumes a normal prior over investor’s beliefs over future returns. The objective function then combines loss minimization with the regularization term that encodes investors’ beliefs. In other words, the BL model combines quantitative and traditional management approaches and allows to update currently held beliefs using observed data (returns) to form new opinions.

Brodie et al. (2009) analyzed the case where the penalty function is based on absolute value and showed that it leads to a stable solution. Puelz et al. (2015) on the other hand, viewed absolute value penalty as a way to incorporate investor’s desire for a simple portfolio. Puelz et al. (2015) takes a similar view as Black and Litterman and show that investor’s preference to allocated her wealth among a small number of assets

The penalty terms and in (2) can also be viewed as judgement of an investor that need to be incorporated into portfolio allocation decision making. To interpret those terms as prior judgement, we re-write the optimization problem as a Bayesian inference problem.

2.2 Stock Selection as a Bayesian Inference

The optimization problem (2) is equivalent to finding a mode of a posterior distribution for a linear Gaussian model with exponential prior on the parameters and sparsity prior (regularization).


Here and . Since , the mode of the log-posterior distribution over the coefficients of the above linear model is given by

is equal to the solution of the optimization problem given by Equation (2). The exponential prior corresponds to the equality constraint .

The exponential prior is conjugate and the posterior can be analytically calculated as follows

Thus, by combining the likelihood and exponential prior , we get the normal posterior with mean and covariance . Thus, the resulting linear model is then


where . The corresponding optimization problem is then


Sparsity-inducing prior prior and the corresponding penalty leads to stable numerical solution robust to estimation errors in covariance and allow for sparse portfolios. Sparse portfolios are a better choice for non-professional investors. Sparse portfolios allow to reduce transaction costs by eliminating certain stocks and to minimize the number of stocks that an investor need to follow and research.

2.3 Sparsity-Inducing Prior Distributions

Bayesian formulation of portfolio selection problem allows to gain insight and to provide an alternative interpretation of the constraints and the corresponding penalty terms. Additionally, it allows to quantify uncertainty over the portfolio weights. Fully Bayesian inference that relies on MCMC algorithms allows to calculated credible intervals and thus to assess uncertainty. Efficient MCMC algorithms can be constricted by exploiting latent variable tricks. For example, the

penalty corresponds to Laplace prior distribution . Latent variable trick allows to re-write this prior as a scale mixture of normals (Andrews and Mallows, 1974; West, 1987; Carlin and Polson, 1991). We introduce latent variable

with an exponential distribution,

There is an equivalence with the penalty obtained by integrating out

In general, many widely used priors can be represented as variance-mean mixtures, using latent variable. The resulting model is linear with heteroscedastic errors 

(Polson and Scott, 2013):

where is the density function of normal variable with mean and variance .

In his section we describe a number of regularization priors: horseshoe, spike-and-slab and together with MCMC strategies.

2.4 Horseshoe Priors

Horseshoe belongs to a class of global-local class of priors and is defined by global parameter that does not depend on index and local parameter which is different for each parameter . The prior is defined by

Global hyper-parameter shrinks all parameters towards zero, while the prior for the local parameter has a tail that decays slower than an exponential rate, and thus allows

not to be shrunk. A horseshoe prior assumes half-Cauchy distribution over


Being constant at the origin, the half-Cauchy prior has nice risk properties near the origin (Polson and Scott, 2009). Polson and Scott (2010) warn against using empirical-Bayes or cross-validation approaches to estimate , due to the fact that MLE estimate of is always in danger of collapsing to the degenerate  (Tiao and Tan, 1965).

A feature of the horseshoe prior is that it possesses both tail-robustness and sparse-robustness properties (Bhadra et al., 2017a); meaning that an infinite spike at the origin and very heavy tail that still ensures integrability. The horseshoe prior can also be specified as

The log-prior of the horseshoe cannot be calculated analytically, but a tight lower bound  (Carvalho et al., 2010) can be used instead


The motivation for the horseshoe penalty arises from the analysis of the prior mass and influence on the posterior in both the tail and behavior at the origin. The latter provides the key determinate of the sparsity properties of the estimator.

When Metropolis-Hasting MCMC is applied to horseshoe regression, it suffers from sampling issues. The funnel shape geometry of the horseshoe prior is makes it challenging for MCMC to efficiently explore the parameter space. Piironen et al. (2017)

proposed to replace Cauchy prior with half-t prior with small degrees of freedom and showed improved convergence behavior for NUTS sampler 

Hoffman and Gelman (2014). Makalic and Schmidt (2016) proposed using a scale mixture representation of half-Cauchy which leads to conjugate hierarchy and allows a Gibbs sample to be used. Johndrow et al. (2017) proposed two MCMC algorithms to calculate posteriors for horseshoe priors. The first algorithm addresses computational cost problem in high dimensions by approximating matrix-matrix multiplication operations. For further details on computational issues and packages for horseshoe sampling, see Bhadra et al. (2017b). An issue of high dimensionality was also addressed by Bhattacharya et al. (2016).

One approach is to replace the thick-tailed half-Cauchy prior over with half-t priors using small degrees of freedom. This leads to the sparsity-sampling efficiency trade-off problem. Larger degrees of freedom for a half-t distribution will lead to more efficient sampling algorithms, but will be less sparsity inducing. For cases with large degrees of freedom, tails of half-t are slimmer and we are required to choose large to accommodate large signals. However, priors with a large are not able to shrink coefficients towards zero as much.

2.5 Spike-and-slab Prior

Spike-and-slab is another sparsity inducing prior widely used in Bayesian analysis. It assumes that the prior is a mixture of point-mass

distribution and Gaussian distribution 

Polson and Sun (2017)

Here controls the overall sparsity in and allows for non-zero weights. By setting

, we get a Bernoulli-Gaussian mixture model for given by


Since and are independent, we can write the joint density function as a product

Here is the number of non-zero entries in the vector , and is the length of vector . It can be shown that finding the MAP estimator for the linear model given by Equation 4 with Spike-and-Slab prior is equivalent to solving the following optimization problem for and Soussen et al. (2011); Polson and Sun (2017)

Here is the matrix with columns that have index inside set , and is the set of “active explanatory variables” with and be their corresponding coefficients.

2.6 Laplace Prior

Double exponential (Laplace) prior distribution Carlin and Polson (1991) for each weight was previously shown to be an effective mechanism to regularize the portfolio (Brodie et al., 2009) and to incorporate investor’s preferences for the number of assets in the optimal portfolio (Puelz et al., 2015).

The log-posterior is then given by

For , the posterior mode is equivalent to the -penalized estimate with . Large variance of the prior is equivalent to the small penalty weight in the -penalized objective function.

Carlin and Polson (1991); Carlin et al. (1992); Park and Casella (2008) used representation of Laplace prior is a scale Normal mixture to develop a Gibbs sampler that iteratively samples from and

to estimate joint distribution over

. Thus, we so not need to apply cross-validation to find optimal value of , the Bayesian algorithm does it “automatically”. Given data , where is the matrix of standardized regressors and is the -vector of outputs. Implement a Gibbs sampler for this model when Laplace prior is used for model coefficients . Use scale mixture normal representation.

Then the complete conditional required for Gibbs sampling are given by

The formulas above assume that

is standardized, e.g. observations for each feature are scaled to be of mean 0 and standard deviation one, and

is assumed to be centered.

You can use empirical priors and initialize the parameters as follows

Here is number of rows (observations) and is number of columns (inputs) in matrix .

There are several efficient optimization algorithms to compute mode of the posterior distribution for a Laplace prior. Most widely used approaches are LARS (Efron et al., 2004a) and coordinate descent (Friedman et al., 2010). The advantage of LARS compared to other optimization techniques is that it provides a way to compute the sequence of solutions for different values of the penalty weight . Coordinate descent algorithm which updates one parameter at a time, holding the others fixed was shown to be more computationally efficient.

3 Application

In this section we apply our Bayesian sparse portfolio selection model to construct optimal portfolios. We evaluate selections made by three linear models with sparsity-inducing priors, namely Laplace, horseshoe (Carvalho et al., 2010), and spike-and-slab. We use LARS algorithm Efron et al. (2004b) to find posterior mode for the Laplace model, MCMC algorithm to generate samples form the horseshoe Hahn et al. (2019) model, and Single Best Replacement (SBR) Polson and Sun (2017) algorithm for finding posterior mode of the spike-and-slab model.

We demonstrate how Laplace regularized portfolio and corresponding LARS algorithms lead to an intuitive way to select an optimal portfolio and assign a selection order to stocks to be included into the portfolio. We use daily returns from three different portfolios. One portfolio corresponds to a widely used stock index (SP100) and two portfolios of stocks managed by two different hedge funds, namely Viking Global Investors, and Renaissance Technologies. We use top 50 holdings of each of the portfolio and applied our selection algorithms to design a sparse portfolio with minimal risk level while guaranteeing to perform as well as SP500 index. We used daily returns during the period from 2016-02-23 to 2018-02-15 (500 trading days) as our training data and returns for the period 2018-02-16 to 2019-02-22 were used for calculating out-of-sample performance of our portfolios. We calculated the penalty parameter using cross-validation.

3.1 Small Portfolio of 9 Stocks

First, we demonstrate how outputs of the LARS algorithm, which finds the posterior mode for Laplace model ( regularized formulation), can be used to rank the importance of individual stocks. LARS finds posterior mode for coefficients of a linear model with Laplace prior. At every step of the LARS algorithm a new variable enters the active set and thus it performs the same number of steps as the number of variables. The order in which LARS adds the variables to an active set corresponds to their importance. Meaning that variables added in the beginning lead to a model that fits the training data well and of a low variance.

We select top 9 holdings from SP100, Renaissance and Vikings portfolios. Then we then select portfolio using penalized formulation with LARS algorithm. LARS algorithm adds one stock at a time to the portfolio and evaluate the out-of-sample perforce each time. We select an optimal portfolio that leads to the best out-of-sample returns. Figure 1 shows the weights assigned by the LARS algorithm at each iteration and the step at which the optimal portfolio was achieved.

(a) Viking (b) Renaissance (c) SP100
Figure 1: Sequence of stocks to be added to the portfolio as the function of number of stocks allowed.

Visualizing outputs of the LARS algorithms gives an inventor a way to interpret the importance of each of the stocks in the portfolio and suggest modifications. If an investor wants a smaller portfolio she can remove stocks added later by the algorithm.

The next question is weather the LARS selected portfolio does perform better then naive equally weighted portfolio or traditional Markowitz portfolio selected by solving problem 1. We also compare our optimal portfolio to performance of the SP500 index. Figure 2 shows the cumulative return (growth of $1 invested) of the LARS selected optimal portfolio and compares it with the naive and SP500 portfolios.

(a) Viking (b) Renaissance (c) SP100
Figure 2: Out-of-sample cumulative return of an optimal LARS portfolio compared with naive portfolio and SP500

Table 1 shows the out-of-sample mean and standard deviation of the returns as well as sharpe ratio for portfolio selected with LARS algorithms, naive portfolio and Markowitz (QP).

Viking Renaissance SP100
Naive LARS QP Naive LARS QP Naive LARS QP SP500
0.066 0.12 0.12 0.1 0.099 0.065 -0.029 -0.011 -0.038 0.0086
4.033 3.67 3.65 2.8 2.383 2.349 3.779 3.467 3.476 2.6004
4.124 7.94 8.37 9.3 10.466 7.026 -1.95 -0.764 -2.787 0.8376
Table 1: Out of sample performance of LARS selected portfolio compared with Naive portfolio and SP500

The optimal subset of stocks selected by the LARS algorithms from top 8 Renaissance Technologies holdings contains 5 stocks (VRSN, PEP, DUK, HUM ans NVO). Compared to a naive portfolio, the LARS portfolio contains less stocks, it is less risky and has the same average return.

This small 8-stock portfolio example shows how the output of the LARS algorithm can be used to provide an incitive visualization to show ranking of the stocks in the portfolio and to allow investor to decide how to increase or decrease the number of positions in a portfolio. Further, the optimal allocations calculated by LARS lead to a portfolio with lower risk (standard deviation of 0.008) and of higher return (mean of 0.0005) when compared to SP500 and naive equal weights allocations.

3.2 Portfolio of 35 Stocks

To demonstrate further, how Bayesian portfolio allocaiton can be used for selecting from a larger sets of stocks, we compare models with Laplace (), Horseshoe and regularization. We compare the shrinkage effect and empirical out-of-sample performance of these three selection approaches.

The question we are to answers is weather the weight shrinkage introduced by the LARS algorithm effects the portfolio performance and weather Horseshoe or selectors lead to sparser portfolios. We select portfolios using LARS for Laplace prior (), MCMC algorithm for Horseshoe prior, and Single Best Replacement (SBR) algorithm for Spike-and-Slab prior (), and non-regularized least-squares approach. We apply all four algorithms to select portfolio from top 35 holdings of Viking and Renaissance hedge funds as well as from SP100 stocks. Table 2 shows the out-of-sample mean (), standard deviation (), and sharpe ratio () of the daily returns multiplied by 252 as well as the number of stocks selected ().

Viking Renaissance SP100
0.13 0.14 0.16 0.16 0.14 0.17 0.11 0.18 0.051 0.055 0.077 0.066
3.21 3.2 3.3 3.23 2.19 2.14 2.26 2.05 2.401 2.386 2.533 2.52
10.45 10.96 12.02 12.34 16.47 20.4 12.72 22.15 5.39 5.779 7.613 6.578
22 11 31 11 25 12 32 4 16 9 32 4
Table 2: Out of sample performance of LARS selected portfolio compared with Naive portfolio and SP500

Figure 3 shows the cumulative return.

(a) Viking (b) Renaissance (c) SP100
Figure 3: Out-of-sample cumulative return of an optimal LARS portfolio compared with naive portfolio and SP500

Horseshoe and selectors do out-perform LARS selector and lead to sparser portfolios. From practical standpoint, the selector not only leads to the best performing and the most sparse portfolio, it is also the easier to use when compared to the Horseshoe selector. requires investor to specify one parameter instead of two as in Horseshoe and the penalty tern in is arguably more interpretable then the one in Horseshoe.

4 Discussion

This paper presents a formulation of traditional Markowitz portfolio selection quadratic programming optimization problem as a hierarchical Bayesian linear model. We have shown how linear constraint of the optimization problem can be formulated as exponential priors of the corresponding liner model. The main advantage of the Bayesian formulation is the ability to incorporate investor’s subjective opinion about which assets to be included into the portfolio. Specifically, we demonstrated how sparsity priors, can be used to stabilize the portfolio selection and to select small number of stocks to be included into the portfolio. The sparsity priors correspond to investors’ preferences for portfolios with small number of stocks. We used our hierarchical Bayesian linear model formulation to demonstrate empirical performance of several sparsity-inducing priors. We have shown that Horseshoe and Spike-and-Slab ( penalty) priors not only lead to portfolios with smaller number of stocks but also have better out-of-sample performance when compared to Laplace prior ( penalty) and traditional Markowitz portfolio selection procedure. Inclusion of short positions in regularized portfolio leads to better performance and lower risk while maintaining stability of the portfolio (no extremely large weights).

From practical standpoint penalty leads to the the best performing portfolio and requires investor to specify only one parameter. On the other, hand Horseshoe prior and the corresponding MCMC algorithms allow to specify different priors on different assets.


  • Aguilar and West (2000) Aguilar, O. and M. West
    Bayesian dynamic factor models and portfolio allocation. Journal of Business & Economic Statistics, 18(3):338–357.
  • Andrews and Mallows (1974) Andrews, D. F. and C. L. Mallows

    Scale mixtures of normal distributions.

    Journal of the RSS, Series B, Pp.  99–102.
  • Avramov and Zhou (2010) Avramov, D. and G. Zhou
    Bayesian portfolio analysis. Annu. Rev. Financ. Econ., 2(1):25–47.
  • Barberis (2000) Barberis, N.
    Investing for the long run when returns are predictable. Journal of Finance, 55(1):225–264.
  • Bhadra et al. (2017a) Bhadra, A., J. Datta, N. G. Polson, B. Willard, et al.
    The Horseshoe+ estimator of ultra-sparse signals. Bayesian Analysis, 12(4):1105–1131.
  • Bhadra et al. (2017b) Bhadra, A., J. Datta, N. G. Polson, and B. T. Willard
    Lasso meets Horseshoe. Statistical Science (to appear).
  • Bhattacharya et al. (2016) Bhattacharya, A., A. Chakraborty, and B. K. Mallick
    Fast sampling with Gaussian scale mixture priors in high-dimensional regression. Biometrika, 103(4):985–991.
  • Black and Litterman (1992) Black, F. and R. Litterman
    Global portfolio optimization. Financial Analysts Journal, 48(5):28–43.
  • Brodie et al. (2009) Brodie, J., I. Daubechies, C. De Mol, D. Giannone, and I. Loris
    Sparse and stable markowitz portfolios. Proceedings of the National Academy of Sciences, 106(30):12267–12272.
  • Candes et al. (2008) Candes, E. J., M. B. Wakin, and S. P. Boyd
    Enhancing sparsity by reweighted minimization. Journal of Fourier analysis and applications, 14(5-6):877–905.
  • Carlin and Polson (1991) Carlin, B. P. and N. G. Polson
    Inference for nonconjugate bayesian models using the gibbs sampler. Canadian Journal of statistics, 19(4):399–405.
  • Carlin et al. (1992) Carlin, B. P., N. G. Polson, and D. S. Stoffer
    A monte carlo approach to nonnormal and nonlinear state-space modeling. Journal of the American Statistical Association, 87(418):493–500.
  • Carrasco and Noumon (2011) Carrasco, M. and N. Noumon
    Optimal portfolio selection using regularization. Technical report, Citeseer.
  • Carvalho et al. (2011) Carvalho, C. M., H. F. Lopes, and O. Aguilar
    Dynamic stock selection strategies: A structured factor model framework. Bayesian Statistics, 9:1–21.
  • Carvalho et al. (2010) Carvalho, C. M., N. G. Polson, and J. G. Scott
    The Horseshoe estimator for sparse signals. Biometrika, 97(2):465–480.
  • De Finetti (1940) De Finetti, B.
    Il problema dei pieni. Istituto italiano degli attuari.
  • DeMiguel et al. (2009) DeMiguel, V., L. Garlappi, F. J. Nogales, and R. Uppal
    A generalized approach to portfolio optimization: Improving performance by constraining portfolio norms. Management Science, 55(5):798–812.
  • Efron et al. (2004a) Efron, B., T. Hastie, I. Johnstone, and R. Tibshirani
    Least angle regression. Ann. Statist., 32(2):407–499.
  • Efron et al. (2004b) Efron, B., T. Hastie, I. Johnstone, R. Tibshirani, et al.
    Least angle regression. The Annals of statistics, 32(2):407–499.
  • Fan and Li (2001) Fan, J. and R. Li
    Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association, 96(456):1348–1360.
  • Fan et al. (2012) Fan, J., J. Zhang, and K. Yu
    Vast portfolio selection with gross-exposure constraints. Journal of the American Statistical Association, 107(498):592–606.
  • Frank and Friedman (1993) Frank, L. E. and J. H. Friedman
    A Statistical view of some chemometrics regression tools. Technometrics, 35(2):109–135.
  • Friedman et al. (2010) Friedman, J., T. Hastie, and R. Tibshirani
    Regularization paths for generalized linear models via coordinate descent. Journal of statistical software, 33(1):1.
  • Gasso et al. (2009) Gasso, G., A. Rakotomamonjy, and S. Canu
    Recovering sparse signals with a certain family of nonconvex penalties and dc programming. IEEE Transactions on Signal Processing, 57(12):4686–4698.
  • Getmansky et al. (2015) Getmansky, M., P. A. Lee, and A. W. Lo
    Hedge funds: A dynamic industry in transition. Annual Review of Financial Economics, 7:483–577.
  • Giuzio and Paterlini (2018) Giuzio, M. and S. Paterlini
    Un-diversifying during crises: Is it a good idea? Computational Management Science.
  • Hahn and Carvalho (2015) Hahn, P. R. and C. M. Carvalho
    Decoupling shrinkage and selection in bayesian linear models: a posterior summary perspective. Journal of the American Statistical Association, 110(509):435–448.
  • Hahn et al. (2019) Hahn, P. R., J. He, and H. F. Lopes

    Efficient sampling for gaussian linear regression with arbitrary priors.

    Journal of Computational and Graphical Statistics, 28(1):142–154.
  • Hoffman and Gelman (2014) Hoffman, M. D. and A. Gelman
    The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo.

    Journal of Machine Learning Research

    , 15(1):1593–1623.
  • Ismail and Pham (2019) Ismail, A. and H. Pham
    Robust markowitz mean-variance portfolio selection under ambiguous covariance matrix. Mathematical Finance, 29(1):174–207.
  • Jacquier and Polson (2010) Jacquier, E. and N. Polson
    Bayesian econometrics in finance. Handbook of Bayesian econometrics. UK: Oxford University Press.
  • Jacquier and Polson (2012) Jacquier, E. and N. G. Polson
    Asset allocation in finance: A Bayesian perspective. Hierarchinal models and MCMC: a Tribute to Adrian Smith, Pp.  56–59.
  • Jagannathan and Ma (2003) Jagannathan, R. and T. Ma
    Risk reduction in large portfolios: Why imposing the wrong constraints helps. The Journal of Finance, 58(4):1651–1683.
  • Johndrow et al. (2017) Johndrow, J. E., P. Orenstein, and A. Bhattacharya
    Scalable MCMC for Bayes shrinkage priors. arXiv:1705.00841.
  • Kandel et al. (1995) Kandel, S., R. McCulloch, and R. F. Stambaugh
    Bayesian inference and portfolio efficiency. The Review of Financial Studies, 8(1):1–53.
  • Kandel and Stambaugh (1996) Kandel, S. and R. F. Stambaugh
    On the predictability of stock returns: an asset-allocation perspective. The Journal of Finance, 51(2):385–424.
  • Kozak et al. (2018) Kozak, S., S. Nagel, and S. Santosh
    Shrinking the cross section. Journal of Financial Economics (forthcoming).
  • Ledoit and Wolf (2004) Ledoit, O. and M. Wolf
    A well-conditioned estimator for large-dimensional covariance matrices.

    Journal of Multivariate Analysis

    , 88(2):365–411.
  • Lobo et al. (2007) Lobo, M. S., M. Fazel, and S. Boyd
    Portfolio optimization with linear and fixed transaction costs. Annals of Operations Research, 152(1):341–365.
  • Makalic and Schmidt (2016) Makalic, E. and D. F. Schmidt
    A simple sampler for the Horseshoe estimator. IEEE Signal Processing Letters, 23(1):179–182.
  • Markowitz (1952) Markowitz, H.
    Portfolio selection. The journal of finance, 7(1):77–91.
  • Park and Casella (2008) Park, T. and G. Casella
    The Bayesian lasso. Journal of the American Statistical Association, 103(482):681–686.
  • Perold (1984) Perold, A. F.
    Large-scale portfolio optimization. Management science, 30(10):1143–1160.
  • Piironen et al. (2017) Piironen, J., A. Vehtari, et al.
    Sparsity information and regularization in the Horseshoe and other shrinkage priors. Electronic Journal of Statistics, 11(2):5018–5051.
  • Polson and Scott (2009) Polson, N. G. and J. G. Scott
    Alternative global–local shrinkage rules using hypergeometric–beta mixtures. Technical report 14.
  • Polson and Scott (2010) Polson, N. G. and J. G. Scott
    Shrink Globally, Act Locally: Sparse Bayesian regularization and prediction. Bayesian statistics, 9:501–538.
  • Polson and Scott (2013) Polson, N. G. and J. G. Scott
    Data augmentation for non-gaussian regression models using variance-mean mixtures. Biometrika, 100(2):459–471.
  • Polson et al. (2014) Polson, N. G., J. G. Scott, and J. Windle
    The bayesian bridge. Journal of the Royal Statistical Society: Series B, 76(4):713–733.
  • Polson and Sun (2017) Polson, N. G. and L. Sun
    Bayesian -regularized least squares. Applied Stochastic Models in Business and Industry, (forthcoming).
  • Polson and Tew (2000) Polson, N. G. and B. V. Tew
    Bayesian portfolio selection: An empirical analysis of the S&P 500 index 1970–1996. Journal of Business & Economic Statistics, 18(2):164–173.
  • Puelz et al. (2015) Puelz, D., P. R. Hahn, and C. M. Carvalho
    Sparse mean-variance portfolios: A penalized utility approach. arXiv preprint arXiv:1512.02310.
  • Soussen et al. (2011) Soussen, C., J. Idier, D. Brie, and J. Duan
    From bernoulli–gaussian deconvolution to sparse signal restoration. IEEE Transactions on Signal Processing, 59(10):4572–4584.
  • Tiao and Tan (1965) Tiao, G. C. and W. Tan
    Bayesian analysis of random-effect models in the analysis of variance. I. posterior distribution of variance-components. Biometrika, 52(1/2):37–53.
  • West (1987) West, M.
    On scale mixtures of normal distributions. Biometrika, 74(3):646–648.
  • Weston et al. (2003) Weston, J., A. Elisseeff, B. Schölkopf, and M. Tipping
    Use of the zero-norm with linear models and kernel methods. Journal of Machine Learning Research, 3(Mar):1439–1461.
  • Zhang et al. (2009) Zhang, T. et al.
    Some sharp performance bounds for least squares regression with l1 regularization. The Annals of Statistics, 37(5A):2109–2144.

Appendix A Non-Convex Penalty Functions

To overcome limitations of penalties several authors proposed non-convex approaches Gasso et al. (2009). Giuzio and Paterlini (2018) use penalty to address the issue of highly dependent data and allocate portfolio during a crisis. Some of the previously used non-convex penalties include smoothly clipped absolute deviations (SCAD) Fan and Li (2001) and its linear approximation Zhang et al. (2009). Bridge or penalty Frank and Friedman (1993)is a generalization of more widely used (LASSO) and (Ridge) penalties and is given by

As approaches 0, this penalty approaches the penalty. Another smooth approximation to the penalty is the -penalty Weston et al. (2003); Candes et al. (2008) given by

which corresponds to t-student prior .

Some of the previously used non-convex penalties include smoothly clipped absolute deviations (SCAD) Fan and Li (2001) given by

and its linear approximation Zhang et al. (2009)

Bridge or penalty Frank and Friedman (1993)is a generalization of more widely used (LASSO) and (Ridge) penalties and is given by

As approaches 0, this penalty approaches the penalty. Another smooth approximation to the penalty is the -penalty Weston et al. (2003); Candes et al. (2008) given by

which corresponds to t-student prior .

(a) SCAD (b)
(a) (b) Linear SCAD
Figure 4: Non-convex penalty functions used for sparse estimations