1 Introduction
Portfolio optimization works best when the asset covariance matrix is optimally cleaned. The necessity to filter covariance matrices was recognized a long time ago in this context (Michaud 1989)
. Cleaning may be optimal in two respects: first, estimation noise has to be filtered out when the number of data points
is comparable to the number of assets(the socalled curse of dimensionality). This is often the case as the nonstationary nature of the dependence between asset price returns dictates to take as small a
as possible (Bongiorno and Challet 2020b). Many filtering methods have been proposed, either for covariance or correlation matrices themselves (see Bun et al. (2017) for a review) or for the portfolio optimization methods objectives (Markowitz 1959; Black and Litterman 1990; Duffie and Pan 1997; Hull and White 1998; Krokhmal et al. 2002; Roncalli 2013; Meucci et al. 2015). Secondly, a good filtering method should also be able to retain the most stable structure of dependence matrices. As a consequence, what filtering method is optimal may depend on asset classes and market conditions. For these reasons, using a flexible yet robust method brings a more consistent performance.A third ingredient to improve portfolio optimization is to account for stochastic volatility itself, i.e., to use both assetlevel volatility model and covariance matrix cleaning, as in Engle et al. (2019). Since this work is devoted to the influence of covariance cleaning itself, we will not use this ingredient.
Here we shall focus on covariance cleaning; we refer the reader to the extensive review of Bun et al. (2017)
. There are two main ways to clean covariance matrices: either to filter the eigenvalues of the corresponding correlation matrix and or to make assumptions on the structure of correlation matrices, i.e., to use an ansatz.
Eigenvalue filtering rests on the spectral decomposition of the covariance or correlation matrix into a sum of outer product of eigenvectors and eigenvalues. Remarkable recent progresses lead to the proof that provided if
and if the system is stationary, the Rotationally Invariant Estimator (RIE) (Bun et al. 2016) converges to the oracle estimator (which knows the realized correlation matrix) at fixed ratio and in the large system limit and . In practice, computing RIE is far from trivial for finite , i.e., for sparse eigenvalue densities; several numerical methods address this problem, such as QuEST (Ledoit et al. 2012), Inverse Wishart regularisation (Bun et al. 2017), or the crossvalidated approach (CV hereafter) (Bartz 2016). Note that these methods only modify the eigenvalues, and keep the empirical eigenvectors intact.The structurebased approach requires a wellchosen ansatz that needs to be suitable for the system under study. For example, linear shrinkage uses a target covariance (or correlation) matrix, and also has an eigenvalue filtering interpretation (Potters et al. 2005). Factor models belong to the structurebased approach. A particular case is hierarchical factor models, which have been shown to yield remarkably good Global Minimum Variance (GMV henceafter) portfolios (Tumminello et al. 2007a; Pantaleo et al. 2011; Tumminello et al. 2007b).
A problem of the hierarchical ansatz is its sensitivity to the bootstrapping of the original data, which does not yield many statistically validated clusters in correlation matrices of equity returns (Bongiorno et al. 2019). Very recently, we have leveraged this sensitivity to build a more flexible estimator, which consists in averaging filtered hierarchical clustering correlation or covariance matrices of bootstrapped data (BAHC) (Bongiorno and Challet 2020a). BAHC not only allows an imperfect hierarchical structure, i.e., a moderate overlapping among clusters, but also a probabilistic superposition of quite distinct hierarchical structures. When applied to GMV portfolios, BAHC yields similar or better realized risk than the optimal eigenvalues filtering methods but for a much smaller than its competitors, which gives portfolios that are much more reactive to changing market conditions. It can be further improved, as shown below.
This paper proposes to extend BAHC to account for the structure of the correlation matrix that is not described by BAHC, i.e., the residuals. The rationale is that the latter may also contain a structure that persists in the outofsample window, hence that they should not be erased by the filtering method. The idea is to apply to filter recursively the residuals and to average the filtered matrices of bootstrapped data.
The order of recursion, denoted by , is a parameter of the method, which we proposed to call BAHC. This new method is equivalent to BAHC when by convention. The higher , the finer the details kept by BAHC, which, as shown below, improves the outofsample GMV portfolios up to a point. When tends to infinity, the filtered correlation matrix converges to the unfiltered correlation matrix averaged over many bootstrap copies. This matrix is almost surely positive definite in the highdimensional regime , despite the fact that the empirical unfiltered correlation matrix is not positive definite Bongiorno (2020).
As shown below, the optimal average depends on the size of the insample window for a data set of US equities. It is generally an increasing function of the insample window length : for small , most of the variations of the empirical correlation matrices are due to estimation noise, which is best filtered by a small ; as increases, the relative importance of estimation noise decreases and thus a higher should be preferred.
2 Methods
Let us start with some notations of standard quantities: let be a matrix of price returns. Its covariance matrix, denoted by , has elements , where
(1) 
and where
is the sample mean of vector
. The Pearson correlation matrix has elements(2) 
As BAHC is an extension of BAHC, itself is a bootstrapped version of the strictly hierarchical filtering method of Tumminello et al. (2007a), let us start with the hierarchical clustering.
2.1 Hierarchical Clustering
Hierarchical clustering agglomerates groups of objects recursively according to a distance matrix taken here as with elements ; D respects all the axioms of a proper distance. Accordingly, the distance between clusters and , denoted by is defined as the average distance between their elements
(3) 
where and denote the , respectively elements of clusters and respectively.
Hierarchical agglomeration works as follows: one starts by giving each element its own cluster. Then, the two clusters with the smallest distance are merged into a new cluster which contains the elements . This algorithm is applied until all nodes form a single unique cluster. This defines a tree, called a dendrogram, which uniquely identifies the genealogy of cluster merges, denoted by .
2.2 Hierarchical Clustering Average Linkage Filtering (HCAL)
Defining a merging tree is not enough to clean correlation matrices. Tumminello et al. (2007a) propose to average all the elements of the subcorrelation matrix defined from the indices , i.e., to replace by
(4) 
where is the average distance between clusters and (see Eq. (3)), with set to 1. This defines the HCALcleaned correlation matrix , which corresponds to a hierarchical factor model (Tumminello et al. 2007a).
HCALfiltered matrices have two interesting properties: by construction, is positive definite when the correlation matrix is dominated by a global mode, i.e., when the average correlation is large, as in equity correlation matrices (Tumminello et al. 2007a). Secondly, is the simplest matrix that has the same dendrogram as the empirical correlation matrix ; this means that by applying the HCAL to both and , the resulting dendrograms will be identical. This, however, is also one of the main limitations of this approach as it prevents any overlap among clusters; in addition, the dendrogram of may not be the true one.
2.3 Bahc
The method we propose rests on two ingredients: a recursive HCAL filtering of the residuals of filtered correlation matrices, and bootstrapping the return matrix (timewise) in order to make the method more flexible, as in the BAHC method.
2.3.1 HCAL Filtering
Let us define the filtered matrix of order as . The residue matrix of order is then
(5) 
When , ; For any value of , we can apply the filtering procedure of sec. 2.2 to the residue matrix to obtain a filtered residue matrix . Then the HCAL filtered matrix is obtained with
(6) 
For example, correspond to HCALfiltered matrix. The recursive application of Eqs.(5) and (6) allows us to compute the filtered matrix at any order . It is worth noticing that by iterating Eqs.(5) and (6)
(7) 
since the residue become smaller and smaller. It is important to point out that, is not in general a semipositive definite matrix for , and in most cases, some small negative eigenvalues have been observed in our numerical experiments. These eigenvalues, according to Eq.(7), shrink to nonnegative values when approach infinity. For any order , we set the possibly negative eigenvalues to 0.
2.3.2 Bootstrapbased regularization
In the spirit of the BAHC method (Bongiorno and Challet 2020a), our recipe prescribes to create a set of bootstrap copies of the data matrix , denoted by . A single bootstrap copy of the data matrix is defined entrywise as , where is a vector of dimension obtained with random sampling by replacement of the elements of the vector . The vector , are sampled independently.
We compute the Pearson correlation matrix of each bootstrap of the data matrix, from which we derive the HCALfiltered matrix . Finally, the filtered Pearson correlation matrix is defined as the average over the filtered bootstrap copies, i.e.,
(8) 
While is a semipositive definite matrix, the average of these filtered rapidly becomes positivedefinite, as shown in Bongiorno (2020). This convergence is fast, and it is guaranteed almost surely if , but in most of the cases is reached for much smaller then .
Finally, BAHC filtered covariance is obtained from the sample univariate variance according to
(9) 
The main advantage of BAHC over HCAL is not to force to be embedded in a purely recursive hierarchical structure.
3 Results
3.1 Data
We consider the daily closetoclose returns from 19990104 to 20200331 of US equities, adjusted for dividends, splits, and other corporate events. More precisely, the data set consists of 1295 assets taken from the union of all the components of the Russell 1000 from 201006 to 202003. The number of stocks with data varies over time: it ranges from 497 in 19990218 to 1172 in 20180117.
3.2 Spectral Properties
One of the reasons why the original BAHC filtering achieves a similar or better realized variance than its competitors that focus on filtering the eigenvalues of the correlation matrix only, that the resulting eigenvectors have a larger overlap with the outofsample eigenvectors than the unfiltered empirical eigenvectors while still filtering eigenvalues nearly as well as the optimal methods (Bongiorno and Challet 2020a). This subsection is devoted to investigate how the eigenvector components change as is increased. It turnouts that the localization of eigenvectors is crucial in understanding the role of .
To understand why localization matters to portfolio optimization, it is worth recalling that Global Minimum Portfolios correspond to the optimal weights
(10) 
that is a sum by rows (or columns) of the inverted covariance matrix, then normalized to one. The inverted covariance matrix can be expressed in terms of spectral decomposition of as
(11) 
where and are respectively the th eigenvalue and and its associated eigenvector of . This means that the composition of the eigenvectors related to the highest eigenvalues is irrelevant and the portfolio allocation is dominated by the eigenvectors of the smallest eigenvalues. Let us assume that the eigenvalue are ordered, i.e., that . If the smallest eigenvalue is much smaller than all the others ones, i.e., , the largest part of investment will be on the stocks such that . Therefore, the localization of the nonzero elements of the eigenvectors is crucial to understand the portfolio allocation.
The statistical characteristics of the eigenvectors localization is typically described in term of Inverse Participation Ratio (IPR), defined as
(12) 
where the index refers the th eigenvalue. The smaller the value of , the more localized its associated eigenvector, the most localized case corresponding to IPR.
Figure 1(a)
reports the cumulative distribution function of
IPR of the eigenvectors for different recursion orders . The dependence on is obvious: BACH has the most localized eigenvectors and the larger the value of , the less localized the components of the eigenvectors. In the limit, one recovers the empirical, unfiltered, covariance matrix. In addition, the IPR of the latter two are hardly different from the random matrix null expectation obtained by shuffling price returns asset by asset in the data matrix.
Figure 1(b)
gives more details about the IPRs associated with the eigenvalues. It makes it obvious that IPRs are different for small eigenvalues, while no clear pattern emerges for the outliers
. Since the lowest eigenvalues are the ones that affect mainly GVM portfolio optimization, a filtering procedure that modifies the IPR of such eigenvalues will produce a substantial difference in the portfolio allocation.3.3 Global Minimum Variance portfolios
This part explores how the realized risk of GMV portfolios depends on the recursion order and compares it with the performance obtained from sample covariance and the CrossValidated (CV) eigenvalue shrinkage (Bartz 2016), which is a strong contender for the best realized risk (Bongiorno and Challet 2020a). Two types of tests are carried out: because our data covers many different market regimes and a variable number of assets, we first ask what is the average realized risk of each covariance cleaning method for a random collection of assets in a randomly chosen period of fixed length. This allows us to assess the performance of each cleaning scheme in a fair way and to control the effect of the calibration window length. In the second part, we compare the performance of these optimal portfolios with all available stocks at any given time.
3.3.1 Random assets, random periods
The experiments of this part are carried out in the following way: for each calibration window length we randomly choose a time between 20000103 and 20200330 that defines a calibration widow , and a test window with days. We then sample stocks over the available assets within the calibration and test windows. Finally, we compute the GMV portfolios with and without short positions using BAHC, the stateofart CrossValidated (CV) eigenvalue shrinkage (Bartz 2016) and the unfiltered empirical covariance matrix.
Figure 2 shows the realized risk of GMV portfolios obtained with the chosen filtering schemes and with the empirical (sample) covariance matrix. The BACH estimators outperform both CV and the sample covariance estimators for in the longshort case and for every for the longonly case (Figures 2(a) and 2(c)). The highest performance of CV is obtained for ; however, the highest absolute minimum is obtained for BAHC with , which requires much shorter calibration times and thus yields more reactive portfolios.
What values to take for (and for this data set) depends on . In the highdimensional regime (), i.e. for , the best results are obtained is for ; however, when increases the performance of becomes even worse than the sample covariance. From this analysis is clear that the larger the calibration window size, the larger the approximation order must be. Figures 2(b) and 2(d) show the average optimal that minimizes the realized risk as a function of for the longshort and longonly case. They confirm that a longer calibration window requires a higher approximation order both for the longshort and longonly cases; however, whereas for the longshort the increment seems linear with , this dependence for the longonly case is sublinear (and much noisier). It is worth remarking that the fits of the right plots of Figure 2 are obtained with : larger values of might further improve the performance for larger ; however, they would require a comparatively greater computational effort.
for different calibration windows; the error bars represent the standard deviation obtained by bootstrap resampling of the testperiod performance; the continuous black line comes from a linear regression. Upper plots correspond to the longshort portfolio, while the lower plots impose a longonly constraint.
3.3.2 Fulluniverse, full period backtest
In this section we performed a set of portfolio optimizations with monthly computations of new portolio weights (and relabancing) over the full timeperiod [20000102, 20200331] for all the considered covariance estimators and equally weighted portfolios (EQ hence after), and for different insample window lengths. The backtests include transaction costs set to 2 bps. A slight complication comes from the variable number of available assets. Thus, as any time, varies and generally increases with time at fixed calibration window length. In any case, it is worth keeping in mind that is relatively large, i.e., between 497 and 1172.
In particular, at each rebalancing timestep we considered all the available stocks listed in both the insample and outofsample periods. The present work investigates in detail short calibration windows: we chose a sequence of days by steps of days, i.e., about months. In addition, for sake of completeness we also included longer calibration windows . For each method and calibration window, we computed the realized annualized volatility, the realized annualized return, the Sharpe ratio, the grossleverage, the concentration of the portfolio, and the average turnover.
Figure 3 shows the behavior of equally weighted portfolios and GMV portfolios obtained by BAHC, the globally best value for the data set that we used. As expected, GMV reduce the realized risk with respect to EW portfolios, and cleaning the covariance matrix is clearly beneficial as well.
Let us start with realized risk, the focus of this paper. The realized risk of EQ portfolios is much larger than that resulting from the other methods, which is hardly surprising as the latter account for the covariance matrix (Tables Reactive Global Minimum Variance Portfolios with BAHC covariance cleaning and Reactive Global Minimum Variance Portfolios with BAHC covariance cleaning); BACH achieves the smallest realized risk, and the best value for increases as increases.
Although GMV does not guarantees a positive return, we also report the Sharpe ratios of the various filtering methods. Because computing Sharpe ratios with moments is not efficient for heavytailed variables, we use the efficient and unbiased momentfree SR estimator introduced in
Challet (2017) and implemented in Challet (2020). Sharpe ratios paint a picture similar to realized risk (see Tables Reactive Global Minimum Variance Portfolios with BAHC covariance cleaning and Reactive Global Minimum Variance Portfolios with BAHC covariance cleaning): BAHC outperforms all the other methods for medium to large values of for almost every both in the longonly and longshort cases, especially when the calibration window is smaller than a year, which corresponds to , i.e., quite deep in the highdimensional regime. In particular, in the longshort case, the highest SR equals for in the remarkably short calibration window days (about 5 months), and is significantly higher than the best performance of CV (SR) obtained for a much higher calibration window . This shows that reactive portfolio optimization is invaluable. In the longonly case, the improvement of BAHC is smaller: the best SR (1.13) is that of BAHC, whereas the highest SR of CV is .However, as shown from the cumulative performances in Figure 3, the relative performance changes over time. To overcome this limitation, we evaluated the SR every year, and we performed a dense ranking of all the methods after having rounded the related SRs to the second decimal (to ensure some equal ranks). Finally we associated a score to every method defined as the average dense rank over the years. The results for the longshort and longonly cases are summarized in Table 3.3.2. It is worth noting that different medium to large values of of BACH outperform all the other methods, and in particular the optimal performances are achieved with a calibration window length shorter than for CV by a factor of about four.
We checked that the portfolios obtained with BAHC are more concentrated than the other ones, which is consistent with the fact that the IPR of the relevant eigenvectors is smaller. The concentration of a portfolio can be measured with
(13) 
as proposed in Bouchaud and Potters (2003); However, as noticed in Pantaleo et al. (2011), this quantity does not have a clear interpretation when short selling is allowed. To overcome this issue, Pantaleo et al. (2011) introduced the metrics which measures the smallest number of stocks that amount for at least 90% of the invested capital. Accordingly, we used is for the longshort case and for the longonly one. Looking at Tables Reactive Global Minimum Variance Portfolios with BAHC covariance cleaning and Reactive Global Minimum Variance Portfolios with BAHC covariance cleaning, the number of stocks selected is systematically smaller for every and calibration window for BAHC for both longonly and longshort portfolios.
That said, BAHC has two drawbacks. First, the gross leverage is generally larger than for CV in the longshort case (see Table Reactive Global Minimum Variance Portfolios with BAHC covariance cleaning). However, if we compare the values of gross leverage corresponding to the larger SR for CV and BAHC for within one year, they differ only by ( for CV and for BAHC). On the other hand, without constraining the calibration window, the highest SR for CV is achieved for and the gross leverage reaches , which is larger than for other methods.
The other drawback of BAHC is that it requires a larger turnover for longshort portfolios. A natural turnover metrics, denoted by , was defined in Reigneron et al. (2020) as
(14) 
where is the number of rebalancing operations and is the initial time. measures the average changes in the portfolio allocation between two consecutive portfolio allocations. Table Reactive Global Minimum Variance Portfolios with BAHC covariance cleaning shows that BAHC has a typically twice as large as CV, except for for large for longshort portfolios. For the longonly case (Table Reactive Global Minimum Variance Portfolios with BAHC covariance cleaning) CV still outperforms BAHC in that respect, although not by much.All performance measures take into account into account the rebalancing costs. Note that the larger turnover comes from the fact that portfolios are more concentrated, i.e., select fewer assets. It is therefore more likely that the set of selected changes at every weight updates.
4 Discussion
By combining recursive hierarchical clustering average linkage and bootstrapping of the data matrix yields a globally better way to filtering asset price covariance matrices. We have shown that this method filters the eigenvectors associated with small eigenvalues of the covariance matrix by making them more concentrated, which in turn yields portfolios with fewer assets. Because BACH captures more of the persistent structure of covariance matrices with shorter calibration windows, it leads to better realized variance of Global Minimum Variance portfolios than even the best method that optimally filters the eigenvalues of the correlation matrix. Finally, it is able to achieve its best performance for significantly smaller calibration window lengths, which makes BAHC portfolios more reactive to changing market conditions. The main drawback is that it requires a larger turnover.
This is due, in part, to the fact that resulting portfolios are more concentrated, hence that the fraction of capital in which to invest change more rapidly than less specific methods. Whether this reflects a genuine change of market structure or a byproduct of the specific assumptions of BACH is an interesting open question.
Future work will investigate how BAHC may improve other kinds of portfolio optimization schemes and other financial applications of covariance matrices.
Acknowledgement(s)
This work was performed using HPC resources from the “Mésocentre” computing center of CentraleSupélec and École Normale Supérieure ParisSaclay supported by CNRS and Région ÎledeFrance (http://mesocentre.centralesupelec.fr/)
Funding
This publication stems from a partnership between CentraleSupélec and BNP Paribas.
References
 Crossvalidation based nonlinear shrinkage. arXiv preprint arXiv:1611.00798. Cited by: §1, §3.3.1, §3.3.
 Asset allocation: combining investor views with market equilibrium. Goldman Sachs Fixed Income Research 115. Cited by: §1.
 Covariance matrix filtering with bootstrapped hierarchies. arXiv preprint arXiv:2003.05807. Cited by: §1, §2.3.2, §3.2, §3.3.
 Nonparametric sign prediction of highdimensional correlation matrix coefficients. arXiv preprint arXiv:2001.11214. Cited by: §1.
 Nested partitions from hierarchical clustering statistical validation. arXiv preprint arXiv:1906.06908. Cited by: §1.
 Bootstraps regularize singular correlation matrices. arXiv preprint arXiv:2004.03165. Cited by: §1, §2.3.2.
 Theory of financial risk and derivative pricing: from statistical physics to risk management. Cambridge university press. Cited by: §3.3.2.
 Rotational invariant estimator for general noisy matrices. IEEE Transactions on Information Theory 62 (12), pp. 7475–7490. Cited by: §1.
 Cleaning large correlation matrices: tools from random matrix theory. Physics Reports 666, pp. 1–109. Cited by: §1, §1, §1.
 Sharper asset ranking from total drawdown durations. Applied Mathematical Finance 24 (1), pp. 1–22. Cited by: §3.3.2.
 SharpeRratio: momentfree estimation of sharpe ratios. Note: R package version 1.4.1 External Links: Link Cited by: §3.3.2.
 An overview of value at risk. Journal of derivatives 4 (3), pp. 7–49. Cited by: §1.
 Large dynamic covariance matrices. Journal of Business & Economic Statistics 37 (2), pp. 363–375. External Links: Document, Link, https://doi.org/10.1080/07350015.2017.1345683 Cited by: §1.

Value at risk when daily changes in market variables are not normally distributed
. Journal of derivatives 5, pp. 9–19. Cited by: §1.  Portfolio optimization with conditional valueatrisk objective and constraints. Journal of Risk 4, pp. 43–68. Cited by: §1.
 Nonlinear shrinkage estimation of largedimensional covariance matrices. The Annals of Statistics 40 (2), pp. 1024–1060. Cited by: §1.
 Portfolio selection: efficient diversification of investments. Vol. 16, John Wiley New York. Cited by: §1.
 Risk budgeting and diversification based on optimized uncorrelated factors. Available at SSRN 2276632. Cited by: §1.
 The Markowitz optimization enigma: is “optimized ” optimal?. Financial Analysts Journal 45 (1), pp. 31–42. Cited by: §1.
 When do improved covariance matrix estimators enhance portfolio optimization? an empirical comparative study of nine estimators. Quantitative Finance 11 (7), pp. 1067–1080. Cited by: §1, §3.3.2.
 Financial applications of random matrix theory: old laces and new pieces. Acta Physica Polonica B 36, pp. 2767. Cited by: §1.
 Agnostic allocation portfolios: a sweet spot in the riskbased jungle?. The Journal of Portfolio Management. Cited by: §3.3.2.
 Introduction to risk parity and budgeting. CRC Press. Cited by: §1.
 Hierarchically nested factor model from multivariate data. EPL (Europhysics Letters) 78 (3), pp. 30006. Cited by: §1, §2.2, §2.2, §2.
 Kullbackleibler distance as a measure of the information filtered from multivariate data. Physical Review E 76 (3), pp. 031123. Cited by: §1.
Comments
There are no comments yet.