1. Introduction
Multifactor investing strategies have gained wide adoption during the last decade, as they allow investors to have a better understanding of the risk drivers underlying a portfolio. Such strategies promise to promote diversification and thus limit drawdown during financial turmoils (Ilmanen and Kizer, 2012; Kremer et al., 2018). However, out of hundreds existing factors, only a small number is truly significant in explaining the crosssection of stock returns (Feng et al., 2020; Harvey and Liu, 2021). Therefore, an important open question across both financial research and industry concerns factors redundancy. In particular, the adoption by researchers of different testing approaches (e.g., panel vs. crosssectional regression), and the presence of model selection biases as those entailed by omitted variables, may contribute to the publication of new redundant factors.
Motivated by these observations, we aim at gaining insights into the underlying dynamics of risk factor interactions by leveraging recent advances in causal structure learning (Vowels et al., 2021). More precisely, starting from the known results about correlations among factors (Feng et al., 2020; Harvey and Liu, 2021), in the spirit of Reichenbach’s principle of common cause (Reichenbach, 1956), we investigate whether there is an underlying causal structure within the universe of considered financial factors, and how this structure evolves over time.
Specifically, our computational task is as follows: we are given as input a dataset composed by time series of length of factors returns, and we want to learn a graph (as the one in Figure 1) representing the causal relations among the factors values along time. More in details we are interested in understanding whether admits a functional representation in which each (i.e., the value of the factor at timestamp ) is determined by at most past values of a set of other factors named parents:
(1) 
Here represents the noise term, while indicates the set of risk factors that cause with lag . When , may contain as well. Conversely, as we cannot observe causal effects from present to past, can neither contain nor if , otherwise it would be impossible to define the direction of the causal relation. More in general, we require the set of Equations (1) to be acyclic, which means that feedback loops among variables are forbidden. This assumption plays a key role in causal inference since it allows to set causes apart from effects.
In case the aforementioned functional forms are assumed to be linear, by using matrix notation, Equations (1) can be compactly written as a
structural vector autoregressive model
(SVAR)(2) 
where is a column vector constituted by the observations at time , are the matrices of lagged causal effects with lag up to maximum lag , such that iff . In addition, the matrix of instantaneous causal effects respects the acyclicity requirement mentioned above. Finally, is the column vector of random disturbances at time .
We remark that Equation 2 should not be read as a usual system of equations, but rather as a set of functions describing how certain factors determine others. Indeed, the model is said to be structural since it allows to compute variables (effects) by means of linear functions of other endogenous variables (causes), by taking into account both instantaneous and lagged relations (also referred to as interlayer connections). In particular, the previous model can be thought of as a combination of a structural equation model (SEM (Peters et al., 2017b)) and a vector autoregressive model (VAR (Sims, 1980)).
Finally, note that Equation 2 entails a directed acyclic graph (DAG) in which there is a weighted edge from (with ) to : this is the causal graph of factors along time as depicted in Figure 1.
In order to study the evolution of such nonstationary system, we estimate
Equation 2by adopting a sliding window approach and performing a regression analysis on the inferred causal networks. Our data covers
risk factors concerning the US equity market, over a period of 29 years at daily frequency. Our main results can be summarized as follows:
Causal interactions between factors exhibit a statistically significant sparsification trend along time, with anomalies in periods of financial turmoil.

We expose a relationship between the density of the causal networks and investor sentiment. In particular, we show that periods of worsening sentiment and business cycle phase (see Section 3.2) are associated with a densification of the causal network. This phenomenon is driven by a growth in the outdegree of the market risk factor.

Finally, we conduct a comparative study between causation and correlation among factors. Our findings highlight how causal analysis better describes the importance of the market factor among the considered risk factors. Besides, while according to correlation analysis, the business cycle indicator is not related to the evolution of the factorial system, the study of causal structures reveals a statistically significant relationship with a confidence level.
The rest of the paper is organized as follows. Section 2 provides the background about factor investing and causal discovery. Section 3 describes the data used to conduct the analysis and presents our methodology. Section 4 reports the results of our analysis. Finally, Section 5, provides a discussion on our results together with some direction for future investigation.
2. Related Work
An equity risk factor is a variable able to explain the crosssection of expected stock returns, i.e., how the expected return varies among stocks. Its significance is usually assessed via the usage of linear regression models
(Cochrane, 2009). Factor investing is well rooted in finance, in particular it dates back to the first asset pricing model, the Capital Asset Pricing Model (CAPM), introduced by Sharpe (1964). CAPM looks at stock returns through the exposure to one factor, the market beta, and introduces a precise definition of risk and how it drives expected stock returns. Successively, Fama and French (1993) found out that the exposure to small cap stocks (SMB) and cheaper, underpriced stocks (HML) provide two additional sources of risk not captured by CAPM. Based on these evidences, an enormous amount of research has been produced on factor investing, and hundreds of potential factors have been proposed (Cochrane, 2011; Harvey et al., 2015; McLean and Pontiff, 2016; Hou et al., 2017). This abundance of proposed factors has opened an important question across both financial research and industry concerning their redundancy. As already mentioned, Feng et al. (2020) and Harvey and Liu (2021) recently pointed out that only a small number of the existing factors is truly significant in explaining stock returns crosssection. Our study of risk factor interactions fits into this stream of research, and in particular tackles the problem by exploiting recent advances in causal learning (Pearl, 2009).The possibility of examining a system under interventions (i.e., counterfactual analysis) makes causal analysis a powerful tool for studying complex systems (Peters et al., 2017a). However, in many cases, the underlying causal structure is unknown, and it is not possible to carry out randomized experiments in order to study the system at hand under distribution changes. Therefore, the interest in inferring causal structures from observational data, also known as causal structure learning, has been significantly growing during recent years.
Causal structure learning algorithms can be classified into three main families: (i)
constraintbased approaches, which make use of conditional independence tests to establish the presence of an edge between two nodes (Spirtes et al., 2000; Huang et al., 2020); (ii) scorebased methods, which use several search procedures in order to optimize a certain score function (Heckerman et al., 1995; Chickering, 2002; Huang et al., 2018); (iii) structural causal models, which express a variable at a certain node as a function of its parents (Shimizu et al., 2006; Hoyer et al., 2008; Shimizu et al., 2011; Peters et al., 2014; Bühlmann et al., 2014). Additionally, as highlighted in Section 1, whenever we are dealing with a process that evolves over time, temporal ordering drives the causal inference procedure. Nevertheless, the main issue concerns the estimation of instantaneous effects, which must satisfy the acyclicity requirement. With regards to Equation (2), this means that must entail the structure of a DAG.However, it is common to deal with nonGaussian data. For instance, this is the case of equity time series which show heteroscedasticity
(i.e., the variance of the stock returns varies over time) and
volatility clustering (i.e., large (small) swings in stock prices tend to group together) (Bollerslev, 1986).This observation implies that there is additional information, not described by the covariance matrix, that can be exploited to retrieve . Consequently, by leveraging a nongaussianity assumption of , a series of linear nonGaussian methods to estimate the model in Equation 2 have been proposed in the past years (Hyvärinen et al., 2010; Moneta et al., 2013). In particular, in this study we employ VARLiNGAM (Hyvärinen et al., 2010), as provided by the python package lingam^{1}^{1}1https://github.com/cdt15/lingam made available by authors.
Econophysics, is an interdisciplinary effort devoted at analyzing the risk contagion among financial institutions by representing the financial system as a network (Bardoscia et al., 2021)
, Our analysis differs from Econophysics as we do not focus on financial institutions, and instead aim to understand the dynamics of the factorial network, which captures sources of risk largely accepted and widely used by investors. In addition, we look for causal relationships between the observed variables by means of a machine learning causal model that, differently from existing work
(Billio et al., 2012), allows us to evaluate the presence of instantaneous causal effects as well.3. Data and Methodology
In this section, we first introduce the financial risk factors and provide some information about the factor dataset . Then, we focus on the additional variables selected as indicators of changes in both investor expectation and business cycle evolution. Finally, we present the adopted causal inference approach and introduce the generalized linear model (GLM) regression we employ to analyze the model results.
3.1. Financial factors
We consider risk factors at daily frequency concerning the US equity market. The choice to use equity factors from the latter market is driven by the greater availability of data and the higher presence of results in the financial literature that can be compared to those we obtain. More precisely, we deal with daily observations spanning 29 years, from 2 January 1991 to 31 December 2019. We include in our analysis the following published risk factors, gathered directly from the websites of authors: Excess Market Return (MktRF) (Jensen et al., 1972), Small Minus Big (SMB) and High Minus Low (HML) (Fama and French, 1993), Momentum (UMD) (Carhart, 1997), HML Devil (HML Dev) (Asness and Frazzini, 2013), Robust Minus Weak (RMW) and Conservative Minus Aggressive (CMA) (Fama and French, 2015), HXZ Investment (RIA) and HXZ Profitability (RROE) (Hou et al., 2015), Betting Against Beta (BAB) and Quality Minus Junk (QMJ) (Asness et al., 2019).
MktRF  SMB  HML  RMW  CMA  RIA  RROE  BAB  HMLdev  UMD  QMJ  
Avg Comp. Ret. (%)  7.68  0.67  2.10  3.89  2.14  2.23  5.35  9.43  0.65  5.17  4.46 
Volatility (%)  17.46  9.17  9.61  7.29  6.53  6.60  7.43  11.01  10.46  13.39  7.97 
Risk Adj. Ret. (%)  0.44  0.07  0.22  0.53  0.33  0.34  0.72  0.86  0.06  0.39  0.56 
Sortino Ratio (%)  0.62  0.10  0.32  0.79  0.47  0.48  1.04  1.21  0.09  0.53  0.83 
Skew  0.15  0.22  0.43  0.26  0.44  0.72  0.11  0.34  0.52  0.23  0.20 
Kurtosis  8.34  3.91  9.05  7.71  11.44  15.92  5.84  11.72  11.47  24.57  8.70 
1st %ile (%)  2.98  1.46  1.67  1.28  1.10  1.08  1.41  2.20  1.73  2.52  1.30 
5th %ile (%)  1.72  0.91  0.83  0.66  0.57  0.57  0.68  0.97  0.88  1.20  0.71 
Min  8.95  4.71  4.39  3.02  5.94  6.88  3.96  6.29  7.00  9.46  3.74 
Max  11.35  3.78  4.83  4.49  2.53  2.75  3.26  7.94  6.35  14.53  5.04 
Publication year  1972  1993  1993  2015  2015  2014  2014  2014  2013  1997  2013 
Summary statistics are provided in Table 1. All factors display positive annualised average compounded returns over the considered period. Moreover, the volatility value significantly varies among factors. In particular, MktRF shows the highest value whereas the investment attitude relating factors (CMA and RIA) display the lowest ones. Overall, according to riskadjusted returns, BAB and RROE factors turn out to be the highestperforming risk premia. The previous observation is also supported by the Sortino ratio.^{2}^{2}2
A metric to evaluate the riskadjusted performance of a portfolio discounting for its downside standard deviation.
We also report the release date of each factor, and note that the majority of the factors were published after 2010.Finally, by inspecting the crosscorrelation function (CCF), we observe significant values, especially within factors which aim to capture the same anomaly in stock returns. Moreover, crosscorrelation tends to be higher at lag and then to significantly drop in almost all cases. As an example, Figure 2 depicts the CCF for the pair of previouslymentioned factors, CMA and RIA. In particular, the CCF shows a peak at lag of about and then plummets for higher order lags. Thus, previous observations concerning the CCF behaviour suggest that there is an important ongoing associative dynamic among factors.
3.2. Fear and business cycle indicators
In order to relate the evolution of risk factor interactions to both stock market volatility and macroeconomic downturn, in addition to factors data, we consider two indicators defined on the VIX Index and the yield spread, respectively. The former index measures 30dayahead investors expectation of US equity market volatility. In particular, the VIX Index is widely referred as the fear index, since it is an indicator of market stress and financial turmoil. The latter spread is a macroeconomic indicator which is largely used to predict recessions (Estrella and Mishkin, 1996). In particular a growth in the difference between 3month and 10year US Treasury yield is linked to a worsening of the macroeconomic environment (Estrella and Hardouvelis, 1991). We obtain data concerning these indexes from the CBOE and FRED repositories, respectively.
Starting from these two indexes, we build fear and business cyclezscores as follows. We first define the VIX historical expected shortfall for the th inference sample period : , defined over , where is the 95percentile value of the percent change between VIX Index closing and opening daily values. By computing we quantify the extreme values of the daily volatility swing over the inference period . Next, we measure the extent to which such a value is unusual with respect to past observations. Therefore, we define:
with and being the 10year rolling average and standard deviation of VIX respectively.
For the business cycle zscore, we first evaluate the extreme values of the 3M10Y yield spread for sample , i.e., defined over , where, is the 95percentile value of the difference between the 3month and 10year US Treasury daily rates. Thus, we define the zscore as:
where and represent the 10year rolling average and standard deviation of 3M10Y respectively.
3.3. Causal inference procedure and regression model
As mentioned earlier, to study the dynamic of the causal structure along time, we adopt a sliding window approach. More precisely, we divide the overall analysis period into windows of length months each, with a sliding step of 3 months, obtaining 111 inference periods .
As described in Section 2, we apply the VARLiNGAM algorithm to infer the causal model. In particular, the algorithm first fits a VAR model on the data, and then estimates on the regression residuals a linear nonGaussian causal inference method, the DirectLiNGAM (Shimizu et al., 2011). Other existing linear nonGaussian approaches leverage independent component analysis (ICA (Hyvarinen, 1999)) to estimate the matrix of instantaneous causal effect . The DirectLiNGAM model was proposed to solve the potential convergence issues of ICAbased methods (Himberg et al., 2004). After the fit of VAR model on data, suppose to regress the residuals associated with factor on those of factor , . Then, the residual is exogeneous to the system if it is independent of the regression residual . The algorithm starts with an empty causal ordering set and, iteratively, appends the variable which is the most independent of its residual. The procedure stops when insertions have been made.
In each sample period, we apply the model by selecting the number of lags according to the BIC criterion (Schwarz and others, 1978): the resulting maximum lag is equal to 1 for every sample. Subsequently, we validate the estimated causal coefficients by running a permutation test, i.e., resampling with replacement, with a significance level equal to . In addition, the total number of permuted samples per period is 5000, the length of the generated samples equals that of the inference period (18 months) and we do not apply any thresholding to the resulting significant coefficients. Since we are interested in comparing the information provided by causal inference with that coming from correlation analysis, the same methodology is used to estimate correlation networks, by replacing the estimation of Equation 2 with the Pearson correlation coefficient.
Once we retrieve both causal and correlation network structures, we analyse their temporal evolution by means of a regression analysis. We set as dependent variable the number of network edges and as covariates the following three variables: time (measured in days), fzscore, and bczscore. Moreover, since , we employ a Poisson loglinear model specified by the following GLM (Agresti, 2018):
(3) 
from which we have , where are the regressors mentioned before. Therefore, according to Equation 3, a unit increase in the independent variable is associated with a multiplicative effect on . As a consequence, if , the growth of does not affect that of . Furthermore, if then increases as grows, and conversely if it decreases.
4. Results
In this section we provide the results of the analysis. The inferred networks for causal and correlation structures are shown in Figure 3. For readability, only three different inference samples are shown: (i) before the publication of the majority of the factor models; (ii) during the Global Financial Crisis (GFC), located between 2007 and 2009; (iii) after the publication of all factor models.
The networks are constituted by 22 nodes, split in accordance to time ordering. The factors are sorted vertically according to their publication date, from the oldest to the newest. In addition, edges associated with a positive weight are shown in grey, while those with a negative one are given in red. Thicker lines indicate a higher weight of the edge, and thus a stronger association of the factors. Finally, in order to make figures easier to read, throughout the paper results relating to correlation are shown in purple and those relating to causation in blue.
We notice a significant variability in both correlation and causal network structures across the periods. As time goes by, the number of edges decreases and the networks become sparser. Such phenomenon is more pronounced for interlayer relationships. Figure 3 highlights the key role of market factor, which impacts 9 factors out of 11 during the GFC.
As far as the stability of the inferred networks is concerned, Figure 4 shows the evolution over time of Jaccard score between two networks estimated on adjacent periods. Such a metric is used in order to quantify the matching ratio between the edge sets and of two consecutive structures. In particular, it is defined as:
and it holds . In order to smooth the score, we apply a 1year rolling average filter: as we use a step size of 3 months, the average of the last four values is shown. The Jaccard score is already normalized and is not affected by the size of the sets, however when the numbers are very small it could be misleading. To better interpret the plot, the cardinality of the edge set is given on the right yaxis of each chart. When the correlation networks are considered, the aforementioned score is more stable over time and is much higher than the one returned by causal structures. In both cases, we observe the presence of a sparsifying trend over time. Interestingly, for causal networks, this trend breaks down during the GFC, and the Jaccard score sharply increases.
Relation type  variable  coef.  std. err.  pvalue 

Overall  intercept  4.7419  0.024  0.000 
time  7.054e05  4.3e06  0.000  
fzscore  0.0402  0.009  0.000  
bczscore  0.0039  0.010  0.687  
Instantaneous  intercept  3.7993  0.035  0.000 
time  4.07e06  5.72e06  0.477  
fzscore  0.0340  0.012  0.004  
bczscore  0.0038  0.012  0.761  
Lagged  intercept  4.3698  0.034  0.000 
time  0.0002  6.76e06  0.000  
fzscore  0.0718  0.014  0.000  
bczscore  0.0229  0.015  0.131 
We further analyze the temporal trend of network density by means of a regression analysis. In particular, we relate the number of edges to time, fear, and business cycle indicators through the estimation of Equation 3. Results related to correlation structures are reported in Table 2.
Considering the overall relations, both time and fzscore are statistically significant at level and are related to the sparsification of the network. It is worth noticing that time is measured in days, and thus the magnitude of the estimated coefficient is expected to be very small. Relating the value of the coefficient to the length of the considered time window, we obtain that every 18 months the number of edges in the correlation structure is reduced by approximately . With regard to investors future expectation, a growth of one standard deviation in fzscore is associated with a reduction of almost in the number of edges as well. On the contrary, the bczscore is never statistically significant. Thus, the results suggest that the correlation structure does not show a link with changes in macroeconomic conditions.
In addition to overall relations, we analyse instantaneous and lagged interactions separately. In particular, while the fzscore remains significant in both cases, time is only relevant for lagged relations. Regression results and Figure 4(a) (in which we provide the fit of the observational data) show that, even though we observe a slight decrease during stress periods, the level of instantaneous association is pretty stable over time. Thus, the overall statistical significance of time is due to the disappearance of interlayer links.
Relation type  variable  coef.  std. err.  pvalue 

Overall  intercept  2.8670  0.067  0.000 
time  0.0002  1.29e05  0.000  
fzscore  0.1732  0.027  0.000  
bczscore  0.0782  0.030  0.010  
Instantaneous  intercept  10.5020  4.188  0.012 
time  0.0008  0.000  0.102  
fzscore  1.0139  0.357  0.005  
bczscore  0.4617  0.415  0.266  
Lagged  intercept  2.8790  0.067  0.000 
time  0.0002  1.29e05  0.000  
fzscore  0.1640  0.027  0.000  
bczscore  0.0727  0.030  0.017 
As far as causal networks are concerned, Table 3 provides the results of the regression analysis. By focusing on the overall relations, we see that both time and fear index are statistically significant at level. Here, the network sparsification is faster, i.e., every 18 months the total number of connections decreases by about . On the contrary, an increase of one standard deviation in the fear index is associated with a growth of almost in the number of edges. This finding explains the break in the sparsifying trend during periods of financial stress shown in Figure 4. Regarding the business cycle indicator, it is significant at level, with the sign of the regression coefficient in accordance with that shown by the fear index. Therefore, the analysis of causal structures shows that a worsening in the macroeconomic environment is also linked to an increase of causal relations among factors. However, the strength of the effect is much smaller than that of the fear index.
By analysing instantaneous relations apart from lagged ones, we notice that the statistical significance of the fear index is preserved, whereas time is only relevant for lagged connections. The estimated coefficient indicates that the network of lagged interactions thins out with the same intensity as that of all the connections. Such a phenomenon is illustrated in Figure 4(b): the trend of the causal structure mainly consists of lagged causal interactions while instantaneous effects only appear in recent years. Interestingly, the temporal trend of network sparsification is similar to the one shown above for lagged connections in correlation networks.
Finally, we focus on the role of the market risk factor node within the network of causal relations among risk factors. As shown in Figure 6, the outdegree of the corresponding node remains in the range of 2 most of the time, then dramatically increases during periods of crisis, as observed during the GFC and more recently during 2018. In the latter example the stock market suffered heavy losses: first a volatility shock occurred in early February; subsequently the market plummeted in the last quarter due to both USChina trade war and the slowdown in economic growth.
variable  coef.  std. err.  pvalue 

intercept  1.0963  0.144  0.000 
time  0.0001  2.57e05  0.000 
fzscore  0.4382  0.058  0.000 
bczscore  0.0790  0.063  0.210 
In order to investigate the evolution over time of market node outdegree, Table 4 reports the results of a regression analysis. We set as a dependent variable the aforementioned outdegree and as covariates time, fzscore, and bczscore. We use again the Poisson loglinear model described by Equation 3. The association with the fear indicator is statistically significant at level: an increase of one standard deviation in the latter is linked to a strong growth of almost in the node outdegree. Furthermore, time turns out to be a significant feature, which is consistent with the already underlined presence of the sparsifying trend in causal relations.
5. Discussion
The results shown in the present work highlight the continuouslychanging nature of risk factors interactions. Taking such behaviours into account is of paramount importance for implementing an effective risk management process in multifactor investing: given the high nonstationarity of the system at hand, it is fundamental to develop robust causal structure learning models for dealing with small samples, and to analyze the causal interactions of risk premia at a finer grain.
Sparsification. As far as the changing in factors relationships over time is concerned, our results support, from a causal perspective, the evidence of factor redundancy that have been provided by recent findings (Feng et al., 2020; Harvey and Liu, 2021). The analysis of the inferred causal networks suggests that, during past years the factorial system was driven by onelag causal interactions whereas, more recently, instantaneous relationships among those risk factors have appeared. Therefore, a proper factor causal model needs to take into account instantaneous effects as well. In addition, both causation and correlation analyses support the presence of a sparsifying trend in interlayer connections. In particular, we hypothesize that the loss of memory of the system may be due to an increase in the sophistication of the market participants, who are able to react faster to external stimuli (Brogaard et al., 2014; Chordia et al., 2018).
Factor unveiling. To better characterize the evolution over time of the system under consideration, we checked whether the phenomenon of unveiling a factorial model would affect the causal structure. The rationale behind such a test is that, after a factor becomes known, investors start betting on it, and then factor relations might be altered. However, we did not find any statistically significant evidence for the link between unveiling a factor and a change in the causal structure around it. In order to identify the drivers of the aforementioned sparsifying trend, it might be useful to analyze the system at hand by using higher frequency data and, in addition, to inspect possible links with the recent commodification of factorial strategies within the US market.
Financial and economic stress. With regard to the behavior of the factorial system during stress periods, inspecting both causation and correlation results provides a richer view of the underlying dynamics. Our findings show that, during financial turmoil, the level of instantaneous association among factors slightly decreases, and the factorial system becomes driven by the market factor. As a consequence, the causal structure becomes denser and more stable due to the increase of the outdegree of the market node, i.e., the influence that the market has on all the other factors. This result is of paramount importance for investors, since it shows that, during stress periods, the exposure to different factors reverts to a simple exposure to market risk. Our finding echoes recent ones which provide evidence for the market factor being by far the dominant one (Giglio and Xiu, 2017; Harvey and Liu, 2021). Furthermore, the reported relationship between the change in the VIX Index and both correlation and causal structures reinforces existing evidence in financial literature concerning the link between the VIX Index and factors returns (Durand et al., 2011).
Finally, we have analyzed the relation between the change in macroeconomic conditions, as measured by the spread of the yield curve, and the evolution of the factorial system. Also in this case we can appreciate the benefit of looking at causation. According to correlation analysis alone, the business cycle indicator is not associated with the dynamics of the analyzed system. Conversely, the regression analysis which relates the business cycle indicator to the causal structure of the factorial system displays a statistically significant relationship with a confidence level. Indeed, similarly to the results of the volatility analysis, the worsening of macroeconomic environment is associated with a growth in the number of network arcs. Therefore, looking at the causation allows to better inspect the evolution of the system during negative economic phases.
Future work. The results in this paper contribute to make some progress in understanding the relationships among risk factors. However, several questions remain open. As an example, it would be interesting to study the interactions among risk factors belonging to different equity markets. Moreover, our analysis concerns only the equity asset class. Thus, enlarging the considered set of factors, by including risk premia concerning other asset classes as well, could help in taking into consideration also interasset class dynamics. Finally, by construction, causal networks enable studying the response of the system at hand under interventions. Therefore, it would be interesting to exploit the attained results to setup a suitable stress testing procedure for multifactor portfolios.
Acknowledgements.
The authors acknowledge the support from Intesa Sanpaolo Innovation Center. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.References
 An introduction to categorical data analysis. John Wiley & Sons. Cited by: §3.3.
 The devil in hml’s details. The Journal of Portfolio Management 39 (4), pp. 49–68. Cited by: §3.1.
 Quality minus junk. Review of Accounting Studies 24 (1), pp. 34–112. Cited by: §3.1.
 The physics of financial networks. arXiv preprint arXiv:2103.05623. Cited by: §2.
 Econometric measures of connectedness and systemic risk in the finance and insurance sectors. Journal of financial economics 104 (3), pp. 535–559. Cited by: §2.
 Generalized autoregressive conditional heteroskedasticity. Journal of econometrics 31 (3), pp. 307–327. Cited by: §2.
 Highfrequency trading and price discovery. The Review of Financial Studies 27 (8), pp. 2267–2306. Cited by: §5.
 CAM: causal additive models, highdimensional order search and penalized regression. Annals of statistics 42 (6), pp. 2526–2556. Cited by: §2.
 On persistence in mutual fund performance. The Journal of finance 52 (1), pp. 57–82. Cited by: §3.1.
 Optimal structure identification with greedy search. Journal of machine learning research 3 (Nov), pp. 507–554. Cited by: §2.
 Rent seeking by lowlatency traders: evidence from trading on macroeconomic announcements. The Review of Financial Studies 31 (12), pp. 4650–4687. Cited by: §5.
 The crosssection: capm and multifactor models. In Asset pricing (Revised edition), pp. 435–449. Cited by: §2.
 Presidential address: discount rates. The Journal of finance 66 (4), pp. 1047–1108. Cited by: §2.
 Fear and the famafrench factors. Financial Management 40 (2), pp. 409–426. Cited by: §5.
 The term structure as a predictor of real economic activity. The journal of Finance 46 (2), pp. 555–576. Cited by: §3.2.
 The yield curve as a predictor of us recessions. Current issues in economics and finance 2 (7). Cited by: §3.2.
 Common risk factors in the returns on stocks and bonds. Journal of financial economics 33 (1), pp. 3–56. Cited by: §2, §3.1.
 A fivefactor asset pricing model. Journal of financial economics 116 (1), pp. 1–22. Cited by: §3.1.
 Taming the factor zoo: a test of new factors. The Journal of Finance 75 (3), pp. 1327–1370. Cited by: §1, §1, §2, §5.
 Inference on risk premia in the presence of omitted factors. Technical report National Bureau of Economic Research. Cited by: §5.
 … And the crosssection of expected returns. The Review of Financial Studies 29 (1), pp. 5–68. Cited by: §2.
 Lucky factors. Journal of Financial Economics. External Links: ISSN 0304405X, Document, Link Cited by: §1, §1, §2, §5, §5.

Learning bayesian networks: the combination of knowledge and statistical data
. Machine learning 20 (3), pp. 197–243. Cited by: §2.  Validating the independent components of neuroimaging time series via clustering and visualization. Neuroimage 22 (3), pp. 1214–1222. Cited by: §3.3.
 Digesting anomalies: an investment approach. The Review of Financial Studies 28 (3), pp. 650–705. Cited by: §3.1.
 Replicating anomalies. Technical report National Bureau of Economic Research. Cited by: §2.
 Nonlinear causal discovery with additive noise models. Advances in neural information processing systems 21, pp. 689–696. Cited by: §2.
 Generalized score functions for causal discovery. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1551–1560. Cited by: §2.
 Causal discovery from heterogeneous/nonstationary data. Journal of Machine Learning Research 21 (89), pp. 1–53. Cited by: §2.
 Estimation of a structural vector autoregression model using nongaussianity.. Journal of Machine Learning Research 11 (5). Cited by: §2.

Fast and robust fixedpoint algorithms for independent component analysis.
IEEE transactions on Neural Networks
10 (3), pp. 626–634. Cited by: §3.3.  The death of diversification has been greatlyexaggerated. The Journal of Portfolio Management 38 (3), pp. 15–27. Cited by: §1.
 The capital asset pricing model: some empirical tests. In Studies in the Theory of Capital Markets, Cited by: §3.1.
 Risk minimization in multifactor portfolios: what is the best strategy?. Annals of Operations Research 266 (1), pp. 255–291. Cited by: §1.
 Does academic research destroy stock return predictability?. The Journal of Finance 71 (1), pp. 5–32. Cited by: §2.
 Causal inference by independent component analysis: theory and applications. Oxford Bulletin of Economics and Statistics 75 (5), pp. 705–730. Cited by: §2.
 Causality. Cambridge university press. Cited by: §2.
 Elements of causal inference: foundations and learning algorithms. The MIT Press. External Links: ISBN 0262037319 Cited by: §2.
 Elements of causal inference: foundations and learning algorithms. The MIT Press. External Links: ISBN 0262037319 Cited by: §1.
 Causal discovery with continuous additive noise models. Journal of Machine Learning Research. Cited by: §2.
 The direction of time. University of California Press. Cited by: §1.
 Estimating the dimension of a model. Annals of statistics 6 (2), pp. 461–464. Cited by: §3.3.
 Capital asset prices: a theory of market equilibrium under conditions of risk. The journal of finance 19 (3), pp. 425–442. Cited by: §2.
 A linear nongaussian acyclic model for causal discovery. J. Mach. Learn. Res. 7, pp. 2003–2030. External Links: ISSN 15324435 Cited by: §2.
 DirectLiNGAM: a direct method for learning a linear nongaussian structural equation model. The Journal of Machine Learning Research 12, pp. 1225–1248. Cited by: §2, §3.3.
 Macroeconomics and reality. Econometrica: journal of the Econometric Society, pp. 1–48. Cited by: §1.
 Causation, prediction, and search. MIT press. Cited by: §2.
 D’ya like dags? a survey on structure learning and causal discovery. arXiv preprint arXiv:2103.02582. Cited by: §1.