This paper combines state-of-the-art algorithmic techniques from data science, machine learning, causality analysis and financial engineering to expose risks in financial markets via explicit adversarial scenarios constructed from historical data. It is more precise than traditional risk analysis via VaR and computationally more efficient than Monte Carlo simulation-based approaches. Thus, whereas it has been studied how to identify “factors” that negatively affect a financial asset (or portfolio) and how they are correlated, algorithms to generate a causally plausible adverse temporal trajectories of events have remained an unsurmountable challenge. Monte Carlo simulations using Bayes Network have found wide-spread use, but they do not produce an explainable framework, nor do they have attractive computational complexity. Human-expert generated scenarios can be augmented with rational explanation, but they lack consistency and scalability. Notwithstanding these challenges, risk management has become a central part of world finance in the past decades as financial regulators demand more quantitative risk assessments of financial entities. For example, the Basel Committee under International Bank of Settlements recommends that all banks maintain a minimum capital reserve.
The proposed quantitative assessments are designed and implemented to mitigate the risk of insolvency: namely, the depletion of capital of financial entity to the point that it has to stop its operations. In accounting terms, for any financial entity, its account consists of assets, liabilities and net equity, where the famous accounting identity holds: Asset Liabilities Net Equity. The task of quantitative risk management is to calculate the amount of equity that has to be reserved so that the net equity will not drop to negative when potential risks materialize into actual losses duffie2005risk . In other words, this excess capital reserve serves as a ‘risk buffer’ that will absorb potential losses and prevent the financial entity from bankruptcy. Before the catastrophic financial crisis in 2008, the risk assessments were in general statistical risk measures like Value-at-Risk manganelli2001var
. The central idea behind statistical risk measure is: we assess the statistical distribution of our portfolio or balance sheet, and estimate the statistically large adverse moves. If our capital reserve is enough to cover such losses, then we can safely assume that we are statistically free from insolvancy. Value-at-Risk is the most widely used measure, which assesses usually the worst 1% loss. Depending on different financial entities, hedge funds, banks or clearing houses, and on different financial instruments, stocks, bonds, or derivatives in their balance sheets, the specific methods of calculating such risk measures like VaR may vary, but generally they can be reasonably estimated by methods like Monte Carlo Simulationraychaudhuri2008montecarlo .
However, such conventional approaches became discredited when the recent events led to major financial catastrophes. For example, in the recent 2008 financial crisis, the reserves calculated the risk by using methods such as VaR which proved to be painfully inadequate. In a recent analysis of VaR before and during the financial crisis conducted by the Federal Reserve Board, the average bank profit-and-loss(PnL) did not exceed the bank VaR from December 2003 to April 2007, while the bank average PnL exceeded VaR six times from June 2007 to March 2008 FedVaRBackTest . In other words, banks which maintained capital reserves equal to their Value-at-Risk would face on average six near bankruptcies during the crisis. What most intrigued the economists and political and social scientists, was the sheer lack of a single plausible causal explanation of these events – which is thought to have ranged from (i) “One eyed Scottish idiot!” (Jeremy Clarkson); (ii) “Complex financial products; undisclosed conflicts of interest; the failure of regulators, the credit rating agencies, and the market itself.” Levin-CoburnReport , Interest Rate Spreads, Emerging Markets: e.g., BRICS” – and so on. Both the unusual abruptness and intuitive implausibility earned such scenarios the name, “Black Swan Events
” – and many more. It also raised the question whether there is a theory of “causality” that can rigorously explain such events empirically from data – we suggest that the machinery of model checking for a suitably expressive logic (e.g., PCTL Probabilistic Computational Tree Logic, a branching time propositional modal logic) provides just the right capabilities to succinctly specify and efficiently verify statements about such scenario. It derives its power from the way it combines logic, probability and reasoning about time.
More informally, these approaches could address the deeply-felt need for better regulation (and intervention) in the form of stress testing. Stress testing refers to the analysis or simulation of the response of financial instruments or institutions, given intensely stressed scenarios that may lead to a financial crisis claessensi2013crisis . For example, narrowly speaking, stress testing may model the response of a portfolio when Dow Jones suddenly drops by 5%. The difference between stress testing and conventional risk management is that stress testing deliberately introduces an adversarial, albeit plausible event, which may be highly improbable but not implausible – e.g., afore-mentioned black swan event triggering an unforeseen scenario. Thus, stress testing must be capable of observing the response of financial instruments or institutions under extremely rare scenarios. Such scenarios must be deemed to be unlikely to be observed in conventional risk management, where the simpler system may fail to estimate a percentile of the loss distribution, and subsequently leading to a claim that, with 99% confidence level, a specific portfolio will perform well, giving a false sense of security.
1.1 Our Contribution
Our core contribution is a stress testing method built on Suppes’ causality structure and a novel algorithm to create and traverse Suppes Bayes Causal Networks (SBCN). Note that we had originally developed and applied this causality framework to study cancer progression, but had not explored its combination with ML (machine Learning), as here, for generating rare adversarial scenarios for stress testing. The integration of causality analysis with machine learning results in a novel and practical (albeit approximate) approach to risk analysis that is currently lacking in data science.
This paper evolves from Rebonato’s use of Bayesian Networkrebonato2010coherent as the core modeling technique, and extends beyond his method to combine the three stress testing scenario generation methods. Our scenario generation method samples from a conditioned Bayesian Network learned from historical data that is able to capture the causality structure between risk factors and financial assets. The advantage of this method is that the clear causal structure makes interpretation of stress testing results more intuitive. The method also incorporates machine learning tools to identify scenarios that are most detrimental to specific portfolios and reduce the computational complexity of sampling.
Our second contribution is in augmenting traditional factor models with causality analysis – a challenging area in data science, especially, in finance and econometrics. For instance, after the 2008 Sub-prime Mortgage Crisis, many attempts have been taken to explore the causes of the gigantic crisis. Many attributed the crisis to very ‘direct’ causes – as suggested earlier, they would include: low-quality mortgage loans whose risk is concealed by securitization; derivatives like credit-default-swaps which helped support lending. Others presented more ‘indirect’ causes: banks’ capital requirements by Basel Accord, which encouraged securitization; long-term record low interest rates encouraged reckless borrowing. CrisisCause2008Dennis People’s interests in causality have grown tremendously since the crisis, since it is easier to understand cause and effects, than association or correlation, just like in natural sciences. However, causality structure is more than just cause and effects. In the explanations of the crisis mentioned above, we can already see the different, ‘direct’ or ‘indirect’ causes convolute together: capital requirements encouraged securitization and securitization hid risk; long-term low interest rate encouraged reckless borrowing, which led to the existence of low-quality mortgages. The past attempts admitted the ‘convoluted interactions’ between causes, but failed to explore the actual complex causality structure. Nevertheless, the true discovery of the causality structure is crucial not only to the understanding of the interactions between causes and effects, but also to the generation of sound hypothetical scenarios. The algorithmic framework presented here takes us one step closer to understanding various latent causal structures at play in a complex financial market.
Our final contribution is in providing a practical and scalable implementation of financial causality analysis building on a theoretical foundation of causality with a rich and deep philosophical history. The start of modern causality theory is Scottish philosopher David Hume’s regularity theory. The core of his theory is temporal priority, which means that causes always come before their effects, or in other words, causality follow a pattern of succession in time hume1793inquiry . Following Hume, Judea Pearl’s notion of intervention has laid the foundation of many modern computer algorithms for causal network inference. Intervention by Pearl implies that if we manipulate and nothing happens, then cannot be cause of , but if a manipulation of leads to a change in , then we may conclude that is a cause of , although there might be other causes as well pearl2003causality . Unrelatedly, Patrick Suppes proposed his notion of prima facie cause which extends the ideas of Hume and Pearl, and this paper efficiently automates discovery of Suppes’ prime facie causation to construct the causal Bayesian network of financial factors and assets.
To iterate an earlier point, the work presented here builds on our earlier work on cancer progression, but also addresses many practical challenges – unique to financial data – where computational efficiency is paramount, but nontrivial.
This paper is organized as follow. Next section describes the background and related work. The section, following immediately, addresses theoretical foundations of our method and, in particular, it shows how combining the expressivity of Suppes-Bayes Causal Networks together with classical classification approaches can effectively capture the dynamics of financial stress testing. Section provides results describing the accuracy (specificity and sensitivity) of our algorithm for the efficient inference and traversal of SBCNs from financial data and discusses its performance in-depth; it shows on realistic simulated data how our approach is preferable in comparison to the standard Bayesian methods. Section concludes the paper.
2 Literature Review
The emerging area of financial stress testing is still an embryonic field and has a relatively meager literature – traditional data science approaches are not directly applicable; automation of the manual methods relying on domain expertise is mostly unformalized.
2.1 Stress Testing Literature in Finance
Before the financial crisis, stress testing only enjoyed interest among advanced financial practitioners like risk managers and central bankers. Nevertheless, the severity of the global financial crisis and its unexpected nature suggested that a more extensive and rigorous use of stress testing methodologies would be crucial to reduce the occurrence of similar catastrophes. StressTestMario Stress testing first emerged as banks’ internal self-assessment of their financial soundness in the early 1990s. StressTestHistoryBOE These stress tests were small-scale tests for individual banks to assess their own trading activities and balance sheets. Later, in 1996, the Basel I Market Risk Amendment required banks to develop stress tests as part of their internal models for the calculation of capital requirements for market risk. StressTestMario In 2004, Basel II introduced requirement for credit risk stress testing by banks. Most recently, in 2011, the Federal Reserve began Comprehensive Capital Analysis and Review (CCAR) program which incorporated an annual bank stress test. StressTestHistoryBOE The start of CCAR marks a nation-wide implementation of stress testing as a regular financial stability assessment. Stress testing thereby became one of the much debated topics in financial regulation and risk management.
2.2 Stress Testing Literature in Data Science
Recently, many different approaches have been developed to implement stress testing. In general, a stress testing procedure consists of two steps: generation of stress scenarios, and stress projections. The first step generates the adversarial, albeit plausible stress scenario. The second step projects financial portfolios or banks’ balance sheets onto the stress scenario and estimates the potential loss. In terms of stress scenario generation, the most direct method is the historical one, in which observed events from the past are used to test contemporary portfolios stress_testing_methods . Some example historical scenarios used by practioners are: Black-Monday in 1987, Asian Crisis in 1997, and Financial Crisis in 2008. StressTestHistoryBOE As an alternative, the event-based method has been proposed in order to quantify a specific hypothetical stress scenario subjectively, by domain experts, and then estimate the possible consequence of such event using macroeconomic and financial models stress_testing_methods . To ensure a scenario is damaging to the portfolio, a portfolio-based method has been also studied in order to link scenarios directly with the portfolio stress_testing_methods . To this extent, portfolio-based methods rely on Monte Carlo Simulation to identify the movements of risk factors that stress the given portfolio most severely.
However, all of these scenario generation methods have their own limitations. The historical approach is objective since it is based on actual events, but it is not necessarily relevant under the present conditions. The event-based hypothetical method is more relevant, but it relies intensively on expert judgment on whether a hypothetical event will be severely-damaging, albeit still plausible to occur. Sometime such judgment becomes difficult when the relationship between the underlying risk factors and the portfolio is unknown. Hypothetical methods have been blamed for their high degree of uncertainty. Practitioners sometimes find it hard to interpret the result of stress testing on hypothetical events since the probability of occurrence of the event is uncertain rebonato2010coherent , and the construction of the hypothetical events are subjective. The portfolio-based method relies heavily on Monte Carlo Simulation, but brute force Monte Carlo Simulation is computationally inefficient especially when dealing with many risk factors. Also, portfolio-based methods are difficult to implement for nation-wide inter-bank stress testing like CCAR.
To solve this problem, Rebonato et al. proposed a sampling approach based on Bayesian networks in rebonato2010coherent , which naturally relied on correlations, but not causation. Our work, presented here addresses this shortcoming.
The underlying stress testing method builds on several ideas from a diverse sets of fields: Finance, Machine Learning, Causal Data Science and Algorithmics; we discuss these building blocks successively.
3.1 Finance Theory
Traditionally markets are thought to be efficient and follow CAPM (Capital Asset Pricing Models), which assumes that return of an asset may be defined as follows:
where is the return of the asset, the risk free return (usually measured in terms of government treasury returns) and the market factor (measured as value-weighted market portfolio, similar to stock indexes). Such a model is of little interest in terms of stress testing of a portfolio of assets as all assets are equally correlated to the market and are affected similarly by any scenario.
For our purposes, it is more meaningful to assume that the stocks are affected differently by different econometric factors and are causally intertwined. For example, we may adopt a common stock factor model, the Fama French Five Factor Model fama1996multifactor , where the return of the asset is defined as follows:
In the equation,
is the return of the asset;
is the risk free return, usually measured in terms of government treasury returns;
stands for market factor, measured as value-weighted market portfolio, similar to stock indexes;
(Small Minus Big) stands for company size factor, measured by return on a diversified portfolio of small stocks minus the return on a diversified portfolio of big stocks;
(High Minus Low) stands for company book-to-market ratio factor, measured by difference between the returns on diversified portfolios of high and low stocks, where is the ratio between company’s book value to market value;
(Robust Minus Weak) stands for company operating profitability factor, measured by the returns on diversified portfolios of stocks with robust and weak profitability and
(Conservative Minus Aggressive) stands for company investment factor, difference between the returns on diversified portfolios of low and high investment stocks, called conservative and aggressive fama1996multifactor .
Consequently, the factors may be assumed to evolve temporally following embedded causal relationships. Their effects on the portfolio of stocks may be inferred by linearly regressing historical returns
Consequently, the factors may be assumed to evolve temporally following embedded causal relationships. Their effects on the portfolio of stocks may be inferred by linearly regressing historical returns, onto the five factors. However, we will also need to infer from data the temporal and probability-raising relations among the pairs of factors, which would indicate potentially genuine causal relations that affect the dynamics of the financial market. These provide the key ingredients of the plausible adversarial trajectories.
3.2 Machine Learning Theory
We start with Machine Learning using Bayesian Graphical Models koller2009probabilistic , popularly known as Bayesian networks, as a framework to assess stress testing, as previously done in this context by rebonato2010coherent . Bayesian networks have long been used in biological modeling such as -omics data analysis, cancer progression or genetics beerenwinkel2007conjunctive ; loohuis2014inferring ; ramazzotti2015capri
, but their application to financial data analysis has been rare. Roughly speaking, Bayesian networks attempt to exploit the conditional independence among random variables, whether the variables represent genes or financial instruments. In this paper we adopt a variation of the traditional Bayesian networks as done inramazzotti2016modeling ; ramazzotti2016learning , There Mishra and his co-authors have shown how constraining the search space of valid solutions by means of a causal theory grounded in Suppes’ notion of probabilistic causation suppes1970probabilistic can be exploited in order to devise better statistical inference algorithms. Also, by accounting for Suppes’ notion of probabilistic causation, we ensure not only conditional independence but also prima facie causal relations among variables, leading us to a better definition of the actual factors leading to risk. Moreover, through a maximum likelihood optimization scheme which makes use of a regularization score, we also attempt to only retain edges in the Bayesian network (graphically depicted as a directed acyclic graph, DAG) that correspond to only genuine causation, while eliminating all the spurious causes caravagna2015algorithmic .
Yet, given the inferred network, we can sample from it to generate plausible scenarios, though not necessarily adversarial or rare. In the case of stress testing, it is crucial to also account for rare configurations, for this reason, we adopt auxiliary tools from machine learning to discover random configurations that are both unexpected and undesired.
Here, we expand the concept sketched above, starting with a background discussion of our framework, by describing the adopted Bayesian models and causal theories and we then show how classification – proviso an inferred causal model like SBCN is available – can effectively guide stress testing simulations.
3.2.1 Traditional Bayesian networks
Informally, Bayesian networks are defined as a directed acyclic graph (DAG) , in which each node represents a random variable to which is associated a conditional probability table, and each arc
models a binary dependency relationship. The nodes induce an overall joint distribution that can be written as a product of the conditional distributions associated with each variablekoller2009probabilistic . In this paper, without any loss of generality, we restrict our attention to Bernoulli random variables with support in . Specifically, we will consider as inputs for our analyses a dataset of observations over Bernoulli variables; we refer to the next subsections for a detailed description of the meaning of such variables. More details about Bayesian networks may be found in koller2009probabilistic .
Let us now consider as an example the Bayesian network shown in Figure 1, where , , , and are random variables (e.g., an econometric factor’s value relative to a threshold) represented by four nodes, and the dependencies among the nodes are modeled by directed arcs. Thus, a pertinent network could encode certain binary relations, such as correlations or causality, among Fama French Factors such as (market factor, akin to stock indexes), (SMB, for company sizes), (HML,for company book-to-market ratio), , (RMW, for company operating profitability), and (CMA, for company investment factor). The graph is defined by and . Loosely speaking, the link indicates that the knowledge of (the parent) influences the probability of (the child), or and are statistically dependent. Furthermore, for node , node is called ’s parent and nodes and are called ’s children. More precisely, in the conditional probability tables related to the afore mentioned Bayesian network, the rows for node specifies how the knowledge of affects the probability of being observed. For example, let and be both binary random variables with support over . Table 1 specifies the distribution of under the condition of , and we can see clearly the effect of the parent on the child in this example.
|A = 0||A = 1|
|B = 0|
|B = 1|
One of the most significant feature of Bayesian network is the notion of conditional independence. Simply speaking, for any node in a Bayesian network, given the knowledge of node ’s parents, is conditionally independent of all nodes that are not its children, or all its predecessors koller2009probabilistic . For example, in the Bayesian network in Figure 1, node is conditionally independent of node , when conditioned on node being fixed. The possibility of exploiting conditional dependencies when computing the induced distribution of the Bayesian network is a powerful property since it simplifies the conditional probability table tremendously. For example, the conditional probability table of node , will not contain entries since , or node is independent of conditioned on : .
In the context of stress testing, Rebonato rebonato2010coherent suggests a subjective approach to constructing Bayesian networks. After carefully selecting a set of random variables as the nodes of the network, Rebonato proposes to subjectively connect the variables and assign the relevant conditional probability tables with the help of risk managers or other experts. Then with the inferred Bayesian network, reasoning about stressed events or simulation can be conducted. Please see rebonato2010coherent for details.
3.2.2 Suppes-Bayes Causal Networks & Our Approach
Our framework builds upon many of Rebonato’s intuitions but exploits our recent works on causality to address all the key problems, of which the subjective approach falls short. The subjective approach is handy under the condition of expert knowledge of the causal relationships of some variables. However, such reliance becomes unnatural when experts are confronted with random variables that are clearly beyond their expertise: for example, the relationship of unemployment and stock market performance, or more simply, the relationship of a pair of arbitrarily chosen stocks. Therefore, instead of completely abandoning the role of data in the construction of Bayesian network, here we adopt statistical inference algorithms that can learn both the structure and the conditional probability table of the Bayesian network from the data, which, in turn, can be further augmented by expert knowledge if deemed necessary.
Thus, unlike rebonato2010coherent , our stress testing approach builds on the foundation of Suppes-Bayes Causal Networks (SBCNs), which are not only more strictly regularized than the general Bayesian networks but also enjoys many other attractive features such as interpretability and refutability. SBCNs exploit the notion of probabilistic causation, originally proposed by Patrick Suppes suppes1970probabilistic .
In suppes1970probabilistic , Suppes described the notion of prima facie causation. A prima facie causal relation between any event and its effect is verified when the following two conditions hold: temporal priority (TP), i.e., a cause happens before its effect and probability raising (PR), i.e., the presence of the cause raises the probability of observing its effect.
Definition 1 (Probabilistic causation, suppes1970probabilistic ).
For any two events and , occurring respectively at times and , under the mild assumptions that , the event is called a prima facie cause of if it occurs before and raises the probability of , i.e.,
where is the Boolean complement of and corresponds to the event “not .” Our reformulation222Note that: follows straightforward logic.
The mathematical underpinnings of Probabilistic Causation are easily expressible in the logic below, which also allows efficient model checking in general. Thus enumerating complex prima facie
causes from data or probabilistic state transition models becomes feasible. Thus, starting with a discrete time Markov chain (DTMC)333
A DTMC is a sequence of random variables following the Markov property, i.e., the probability distribution of future states only depends upon the present statemarkov1954algorithm . – a directed graph with a set of states, , it is endowed (via labeling functions) with the atomic propositions true within them. It is possible to make the labeling probabilistic, so that one may express that “high market optimism” may be false due to the fact that an adverse election results may be revealed with some small probability (e.g., depending on the status of a certain investigation). The states are related pairwise by the transition probability. We also have an initial state from which we can begin a path (trajectory) through the system. Each state has at least one transition to itself or another state in with a non-zero probability.
A general framework for causality analysis is provided by model checking algorithms in PCTL (Probabilistic Computational Tree Logic) and has been explored in details by Mishra and his students kleinberg2009temporal . We start with a brief discussion on how Suppes’ prima-facie causality can be formulated in PCTL, but then develop an efficient, albeit simplified, approach to financial stress testing using factor-models and SBCN (with pair-wise causality represented as edges in a graphical model) – originally introduced by Mishra and his colleagues as a simplification. See kleinberg2009temporal ; caravagna2015algorithmic ; bonchi2015exposing More general, and computationally expensive (though tractable), approaches using PCTL will be explored in future research.
Definition 2 (Probabilistic Computational Tree Logic, PCTL ciesinski2004probabilistic ).
The types of formulas that can be expressed in PCTL are path formulas and state formulas. State formulas express properties that must hold within a state, determined by how it is labeled with certain atomic propositions, while path formulas refer to sequences of states along which a formula must hold.
All atomic propositions are state formulas.
If and are state formulas, so are
If and are state formulas, and is a nonnegative integer or , then is a path formula.
If is a path formula and , then is a state formula.
The syntax and the logic builds on standard propositional Boolean logic, but extends with various modes: the key operator is the metric “until” operator: here, use of “until” means that one formula must hold at every state along the path until a state where the second formula becomes true, which must happen in less than or equal to time units. Finally, we can add probabilities to these “until”-like path formulas to make state formulas. Path quantifiers analogous to those in CTL may be defined by: [Inevitably ]; [Possibly ]; [Globally ], and [Eventually ]. Formal semantics of the PCTL formulæ may be found in hansson1994logic .
One can then say event “probabilistic causes” , iff
for some suitable hyper-parameters probability and duration. Additional criteria (e.g., regularization) are then needed to separate spurious causality from the genuine ones – as shown below. SBCN, thus, provides a vastly simplified, and yet practical, approach to causality, especially when explicit time is not recorded in the data.
The notion of prima facie causality was fruitfully exploited for the task of modeling cancer evolution in loohuis2014inferring ; ramazzotti2015capri ; caravagna2015algorithmic , and the SBCNs were finally described for the first time in bonchi2015exposing but, many of the basic ideas are already implicit incaravagna2015algorithmic .
Definition 3 (Suppes-Bayes Causal Network).
Let us consider an input cross-sectional dataset of Bernoulli variables and samples, the Suppes-Bayes Causal Network subsumed by is a directed acyclic graph such that the following requirements hold:
[Suppes’ constraints] for each arc involving a prima facie relation between nodes , under the mild assumptions that :
[Sparsification] let be the set of arcs satisfying the Suppes’ constraints as before; among all the subsets of , the set of arcs is the one whose corresponding graph maximizes the likelihood of the data and of a certain regularization function :
Intuitively, the advantage of SBCNs over general Bayesian networks is the following. First, with Temporal Priority, SBCN accommodates the time flow among the nodes. There are obvious cases where some nodes occur before the other and it is generally natural to state that nodes that happen later cannot be causes (or parents) of nodes that happen earlier. Second, when learning general Bayesian networks, arcs and may sometimes be equally acceptable, resulting in an undirected arc (this situation is called Markov Equivalence koller2009probabilistic ). For SBCNs, such a situation does not arise because of the temporal flow being irreversible. Third, because of the two constraints on the causal links, the SBCN graph is generally more sparse (has fewer edges) than the graph of general Bayesian networks with the final goal of disentangling spurious arcs, e.g., due to spurious correlations pearson1896mathematical , from genuine causalities.
3.3 Machine Learning and Classification
Even if SBCNs typically yield sparser DAGs than when we use Bayesian networks, the relations modeled involve both positive and negative financial scenarios, but only in the latter financial stress may arise. Thus, the extreme events which are of key relevance for stress testing are still rare in the data and unlikely to be simulated in naively generated stress scenarios by sampling from the SBCN directly. Therefore, in this work we improve this basic model with several key ideas of classic machine learning, namely, feature classification. Recall that, in stress testing, we wish to target the unlikely, but risky scenarios. Specifically, when generating random sample from an SBCN to obtain possible scenarios, each node in the SBCN can take any value in its support according to its conditional probability table, generating different branches of scenarios. To narrow down the search space, we can classify each possible branch as leading toprofitable or lossy scenarios, and if, the branch is classified as profitable, then random sampling is guided to very likely avoid that branch, thus focusing on events and causal relations that can be adversarial and risky, though uncommon. In this way, computation can be reduced significantly to discover the extreme events (see the next Sections for details).
3.4 An efficient Implementation
The algorithm below, Algorithm 1, encapsulates the earlier discussions.
Algorithm 1 summarizes the inference approach adopted via SBCN. Given the above inputs, Suppes’ constraints are verified (Lines -) to first construct a DAG. Then, the likelihood fit is performed by gradient ascent (Lines -), an iterative optimization technique that starts with an arbitrary solution to a problem (in our case an empty graph) and then attempts to find a better solution by incrementally visiting the neighborhood of the current one. If the new candidate solution is better than the previous one, it is considered in place of it. The procedure is repeated until the stopping criterion is matched.
In our implementation, the Boolean variable StoppingCriterion is satisfied (Line ) in two situations: the procedure stops when we have performed a sufficiently large enough number of iterations or, it stops when none of the solutions in is better than the current , where denotes all the solutions that are derivable from by removing or adding at most one edge.
3.5 Our Contribution
In this section we have shown how we integrated our earlier works on causality theory to produce an efficient implementation of a financial stress testing framework. However, since the implementation involves several hyper-parameters and different methods for regularization, the final embodiment requires additional empirical studies, which we describe earlier. For this purpose, tested and optimized it rigorously with a carefully selected synthetic financial model.
Next we describe our extensive comparative studies aimed at evaluating the statistical power of the frameworks that encompass the approaches of Rebonato (BNs) and ones proposed here (SBCNs) to perform stress testing. The other manual (expert-driven) approaches are out of the scope of this comparative studies for obvious reasons.
Thus, the primary engines for stress testing are built with the generative models, which for our purposes are chosen to be one of of two kinds: Bayes Net (BN) or Causal Net (SBCN), but expected to behave differently based on the methods of model regularization: BIC (Bayesian Information Criterion) or AIC (Akaike Information Criterion) – further constrained with or without bootstrapping. Thus constructed, the resulting stress testing algorithms may be investigated for performance, while paying specific attention to the problem of false discoveries (positive or negative). These results are succinctly visualized using ROC (Receiver Operating Characteristics). The data used for the analysis are simulated, as explained later.
We summarize in Figure 2
the results of this analysis by interpolating and then smoothing out some kinks444Because of the data sparsity, the interpolation does not always lead to a smooth monotonic curves. in order to obtain an ROC Space, whose axis represent the False Positive Rate and axis the True Positive Rate. ROC Space depicts the performance of the different methods we discussed on different sample sizes. By examining the plot, one can conclude that AICs generally have high true positive rates but also high false positive rates, as a result of its less stringent complexity penalty. In contrast, BICs generally have smaller false positive rates, but its true positive rates are also lower. Comparing the algorithms with and without bootstrapping, one can notice that the bootstrap procedure shifts the curves to the left. Still, the best performance lies in the data with the assumption of sparse relationships. Based on these results, we can conclude that with Bootstrapping and the assumption of sparse relationships, our algorithm is capable of recovering accurately the causal relationships in the data.
In order to provide further insights into these results, we describe in greater details: our simulation model that allows us to test the inferred results against the ground truth, false-discovery analysis, influence of information criteria (AIC and BIC) and influence of bootstrapping. Finally, we describe the effect of Machine Learning in trajectory generation and projection from SBCN.
4.1 Training Data: Simulation and Evaluation with SBCN
To assess the performance of the algorithm to infer the SBCNs and the quality of inferred Bayesian networks, a set of training data is developed with embedded causal relationships 555The vacuous case of “no causality” was not explored as it is not meaningful in the context of SBCN; this case was relegated to more general model checking approaches based on PCTL.. If the algorithms, after ‘learning’ a model from the training data, are capable of accurately recovering the causal relationships embedded in them, then comparable accuracy is to be expected on real data. To simulate the training data, we adopt the stock factor model, the Fama French Five Factor Model fama1996multifactor , introduced earlier.
To simulate the training data with embedded causal relationship, we linearly regress historical returns , onto the five factors, and obtain the distribution of each factor coefficient and the empirical residual. We notice that a key characterization of an SBCN is an underlying temporal model of the causal relatas implicit in the network, namely the temporal priority between any pair of factors (represented by nodes of the SBCN) which are involved in a causal relationship. Therefore, the five factors described in our generative model are lagged with respect to the historical returns to comply with the temporal priority. Thus,
where is the th factor’s value at a time, properly “lagged.”
Then, the simple training data is simulated by randomly drawing the factor coefficients and residuals from the distribution we obtained from the linear regression, and apply these coefficients and residuals on a set of new factor data. Such historical data consists of daily series of five factors and returns of portfolios also constructed by Fama French, and of days. We use the first for regression and the other for simulation.
In reality, many factors will present causal relationships among themselves. For example, some factors do not directly influence the asset, but affect the asset indirectly by its impact on other factors. Therefore, the simulated training data can be complicated by embedding spurious relationships also among factors. We linearly regress some factors on the other factors and simulate the training data in the same way. The choice of factors is arbitrary. In this paper, as an example, we regress the other four factors , , and on the market factor .
Therefore, the causal relationships which are described in the simulated training data can be simplified as shown in Figure 3.
Next we show results on independent random simulations generated on networks of nodes, i.e., stocks and factors with the generative model discussed in the previous Section. Each node represents a Bernoulli random variable taking binary values in , where represents the stock or factor going up and , the stock or factor going down. Specifically, the input of our learning task is a dataset , an binary matrix. Starting with such an input, we attempted to experiment with our learning algorithms previously described in ramazzotti2015capri and bonchi2015exposing . In particular, as in bonchi2015exposing , we lacked explicitly observed time in the data, which are only cross-sectional. To overcome this problem we gave as a further input to our algorithm a topological ranking providing information about the temporal priority among the nodes. In interpreting these experiments, we set ranking as a proxy of time precedence among the factors influencing the stocks, i.e., in our model factors can cause stock moves but not the other way around. This strategy results in removal of implausible spurious arcs going from stocks to factors, but without affecting any genuine constraints in the arcs among factors or among stocks.
4.1.1 The Problem of False Discovery
We first tested the performance of Algorithm 1 on a training data of portfolios, factors and observations. On such settings, the algorithm was capable of recovering almost the whole set of embedded causal relationships with only false negatives, roughly, of total arcs; however, the number of false positives were unacceptably larger, reaching around of the total causal arcs obtained, thus requiring more attention to how the model was regularized.
The explanation for this trend can be found in how the algorithm implements the regularization via Bayesian Information Criterion (BIC) schwarz1978estimating , that is:
where is the number of arcs in the SBCN (i.e. number of causal relationships), is the number of observations of the data, and is the likelihood. The algorithm searches for the Bayesian network that minimizes the BIC.
For large number of observations, the maximum likelihood optimization ensures that asymptotically all the embedded relationships are explored and the most likely solution is recovered. However, maximum likelihood is known to be susceptible to over-fitting koller2009probabilistic , especially when, as in our case, it deals with small sample size in the training data. Furthermore, in the training data, all the portfolios are assumed to depend on the same five factors, although with different coefficients, but very likely some portfolios will have very similar coefficients, resulting in co-movements across the portfolios. This co-movement will often induce correlations that affect the probability raising and thus the spurious prima facie causal relations, making these settings an interesting, and yet a very hard test case. See Figure 4.
4.1.2 Sample Size and Information Criterion
To reduce the spurious causalities, we recall some intrinsic properties of the information criteria. The Bayesian Information Criterion , not only maximizes the likelihood, but also penalizes the complexity of the model by the term . For small sample sizes, BIC is generally biased towards simple models because of the penalty. However, for large sample size, BIC is willing to accept complex models. For additional discussion, see the details in koller2009probabilistic .
In our simulations we adopted a sample size of
which is considerably large relative to the degree of freedom of the score function, thus inducing BIC to infer a relatively complex model with a number of unnecessary spurious arcs. Counter-intuitively, we could improve the solutions by using smaller sized data and letting the complexity penalty take a bigger effects in BIC score. This strategy also addresses the non-stationarity in the data, an endemic problem for financial data. Following this intuition, we performed further experiments by reducing the original sample size ofsamples, which describes around years of data, in turn to and , and we observed a significant reduction in the number of false positives, to and of total arcs respectively. However, at the same time, because of smaller sample size, the number of false negatives inevitably increased to more than of total arcs.
To reconcile this dilemma, we next considered an alternative information criterion, the AIC, Akaike Information Criterion akaike1998information , defined as in the following:
We notice that for AIC, the coefficient of is set to , leading to definitely smaller factor than of BIC when the sample size is large. For this reason, AIC supports the trend of accepting more complex models for given sample sizes than BIC. Applying AIC, the number of false negatives typically decreases, while the number of false positive gets larger.
4.1.3 Improving Model Selection by Bootstrapping
So far we have described the different characteristics of two state-of-the-art likelihood scores while aiming to minimize the number of resulting false positive and false negative arcs in the inferred model. Specifically, we showed a trade-off where, because of their characteristics, the best results on large sample sizes is obtained using BIC, while for small sample sizes AIC is more effective, but neither of the two regularization schemes display a satisfactory trend. To improve their performance, we then examined a bootstrap efron1981nonparametric procedure for model selection.
The idea of bootstrap is the following: we first learn the structure and parameters of the SBCN as before, but we perform subsequently a re-sampling procedure where we sample with repetitions data from the dataset in order to generate a set of bootstrapped datasets, e.g., times, and then we calculate the relative confidence level of each arc in the originally inferred SBCN, by performing the inference from each of the bootstrapped dataset and counting how many times a given arc is retrieved. In this way, we obtained a confidence level for any arc in the SBCN.
We once again tested such an approach on our simulations and we observed empirically that the confidence level of spurious arcs are typically smaller than the confidence level for true causal relations. Therefore, a simple method of pruning the inferred SBCN to constrain for a given minimum confidence level is here applied. Such a threshold reflects the number of false positives that we are willing to include in the model, with higher thresholds ensuring sparser models. Here, we test our approach by requiring a minimum confidence level of , i.e., any valid arc must be retrieved at least half of the times.
the contingency tables resulting from our experiments both for Algorithm1 (Table 3) and for the standard likelihood fit method to infer Bayesian Networks (Table 2):
Table 3 presents the results in terms of false positives (FP) and false negatives (FN) by Algorithm 1 with the various methods on the training data with different information criteria, sample sizes, and whether Bootstrapping is applied. The trade-off between false positive rates and false negative rates usually is case-specific. We observe that, in general, the objective of such an approach is to correctly and precisely recover the true distribution underlying the training data. For this reason, unless differently specified for specific uses, there is not an overall preference toward either lower false positive or lower false negative. Therefore, we evaluate our methods by considering the sum of both false positive and false negative rates. This metric is biased toward a combination of relatively low FP and FN rather than the combination of very low FP and high FN and so on. By analyzing the results shown in Table 3, we can clearly observe a trend where AIC with Bootstrapping on small sample datasets (i.e., ) and BIC with Bootstrapping on large sample datasets (i.e., ) produces the best results, which is in agreement with the discussion of the previous Section. Also, we observe that both for AIC without any bootstrapping on sample sizes of and BIC without any bootstrapping on sample size of , the false positive rates are reduced without a significant increase in the false negative rates.
4.1.4 Assumption of Sparse Relationships
The resulting false positive rate may still seem relatively high. But, one important assumption is worth mentioning. In the training data, such high false positive rate derives from the fact that portfolios are dependent on the common factors, which very likely will induce co-movements. However, in the real data, such nested dependencies do not always occur, while a feature of sparse relationships appears frequently, and portfolios depend on distinctively small sets of factors. This assumption of sparsity can significantly improve the performance of the algorithm.
To implement the assumption of sparsity, we deviate from the original Fama French five factor model. For simplicity, we generate data with sparse relationships using a random linear model with factor variables and stock variables. With probability, each stock variable is dependent linearly on one of the factor variables, so on average, each stock variable will be dependent on
factor variables, which will likely be distinct from the dependent factor variables of other stock variables. Then we sample factor variable data from a normal distribution and compute the corresponding stock data using the linear model.
Implementing this sparsity on a new set of purely random training data we obtain with Algorithm 1 much better results, and, e.g., following the BIC with Bootstrapping method mentioned above, we obtain on small sample size ( samples) false positive and false negative rates, while on large sample size ( samples), we obtain false positive and false negative rates.
4.2 Practical Stress Testing
In this Section we present how to assess stress testing scenarios given the inferred Suppes-Bayes Causal Network and we present the results on the simulated data.
4.2.1 Risk Management by Simulations
After the inference of the SBCN, we perform Monte Carlo Simulation in the same way as conventional risk management, by drawing large number of samples to discover the worst scenarios as the value at risk (VaR). Nevertheless, here in stress testing, we are targeting the most extreme events, which have very low but nonzero probability of occurrence. Thus, they still can occur, for example, the financial crisis or the most recent market reactions to BREXIT. Therefore, when drawing samples from the network, we would like to reject the normal scenarios, and place more importance on the extreme events. To achieve this goal, when conducting random sampling, we classify each possible branch as profitable or risky, and if the branch is classified as profitable, then we will avoid that branch.
Figure 5 represents a simple binary classification where for this factor only Factor.i with value is considered risky and, hence, this scenario is the only one to be sampled. In this way, we target the extremely risky events and reduce computation. But, unlike conventional risk management, this approach does not allow us to estimate the probability of occurrence of the sampled extreme events, therefore we will not conclude a value at risk with certain confidence level.
The simple binary classification with certain features is a standard machine learning problem. Here we explore a simple solution of such a task based on decision trees safavian1990survey . A decision tree is a predictive models that maps features of an object to a target value of the object. Here, the features are the factors of interest, and the target value is whether the portfolio is prone to profit or loss. To perform classification, we first draw
sample trajectories from the inferred SBCN. Then we construct a simple portfolio, which is long on all the stocks in the SBCN by the same amount, and calculate the Profit and Loss (P/L) of each observation. Here however, because the underlying SBCN depicts binary variables, exact Profit and Loss (P/L) statistics cannot be obtained. Instead, since the toy portfolio is long on all stocks by the same amount, the ratio of stocks that goes up is an approximate measure of risk. Of course, for continuous Bayesian network, Profit and Loss can be calculated directly. In the next step, we sort this measure, and denote the bottom-mostscenarios as risky, and the rest as profitable. The ‘risky’ scenarios contain at least stocks that fall. Then we consider samples each of them labeled as ‘risky’ or ‘profitable.’ In our experiments, we used the R ‘tree’ package tree_r_package .
Using the SBCN learned from the simulated training data, we obtain the following decision tree shown in Figure 6.
In the decision tree of Figure 6, denotes factor ; denotes Market ; denotes ; denotes and denotes . Here we show only the left part of the entire computed decision tree, the subtree with is omitted, since the entire subtree with is classified as ‘Profitable,’ which is not of interest for stress testing. In the tree, we identify two paths that are classified as ‘Risky.’ Path , , , , and Path . The paths classified are intuitive, since our example portfolio is long with equal amount invested over all stocks. Since stocks are generally positively dependent on the factors, most factors with values will likely induce a ‘Risky’ path. For more complicated portfolios and real factors, such intuition cannot be easily found so we have to rely on the result of classification.
4.2.2 Scenario Generation and Results
Given the tree of Figure 6, we then used the bnlearn R package scutari2009learning to sample from the SBCN. Given the network, we can simulate random scenarios, however, we wish not to simulate all of them, which will prove to be inefficient, but following the informations provided by the classification tree we choose the configurations which are likely to indicate risk to drive our sampling. For instance, we may pick the first path in the tree, which is , , , , , and constrain the distribution induced by the SBCN. In order to avoid sampling the scenarios which are not in accordance with the path, we adjust the conditional probability table of the SBCN. Since we want paths with all five factors taking value 0, we set the conditional probability of these five factors taking value to , and the conditional probability of factors taking value to . In this way, the undesirable paths will be unlikely to be simulated, while the intrinsic distribution of how factors affect the stocks is still modeled. More sophisticated implementation based on this intuition are possible: e.g., using branch-and-bound, policy valuation, tree-search, etc, but we will leave this to future research.
Comparing the results of the simulations using the original SBCN and the one taking into account the decision tree, we show the distribution of the risk measure, the number of stocks that go up in the Figure 7.
The number of stocks going up from samples generated by the original SBCN is roughly evenly distributed. At the same time, the samples generated by the modified SBCN contain no scenarios with more than stocks going up, and out of the samples have at most
stock going up. We can clearly see that the modified SBCN places far more importance on the stressed scenarios, and in turn confirms the result of the classification algorithm by the decision tree. In this way, computational complexity involved in generating stressed scenarios can be improved tremendously. This kind of computational efficiency issues will be more critical when we move from a simple Bernoulli random variable to multi-categorical variables or continuous random variable. Therefore, with the same computing power, the modified SBCN makes it possible to generate more stressed scenarios, and observe how portfolios or other assets respond to stressed factors.
In summary, in this paper we develop a novel framework to perform stress testing combining Suppes-Bayes Causal Networks and machine learning classification. We learn SBCNs from data using Algorithm 1 and assess the quality of the learned model by switching information criteria based upon sample sizes and bootstrapping. We then simulate stress scenarios using SBCNs, but reduce computation by classifying each branch of nodes in the network as ‘profitable’ or ‘risky’ using classification trees. For simplicity, the paper implements SBCNs with Bernoulli variables and simulates data using Fama French’s Five Factor Model, but the logic of the problem is easily extended to deal with more practical situations. First of all, the SBCNs can accommodate more complicated variables (nodes). In addition to the factor based portfolios considered here, other factor models, or directly other financial and economic factors like foreign exchange rates, can also be included, and the accuracy of the model can ensure that the true causal relationships among the factors are discovered. In practice, variables like stock prices are continuous, thus, one can easily extend to these situations by adopting a hybrid SBCN, where the variables can take either discrete or continuous values, making it possible to represent precisely the values of the variables we are interested in.
To use the model, the role of experts is still important. After learning the SBCN from data and applying classification, we can identify a number of stressed scenarios. However, we expect that some of these to be unacceptable for various unforeseen reasons, e.g., as those known to domain experts. These scenarios may be thought of as highly stressed with respect to the corresponding portfolio but they could prove to be less useful in practice. Therefore, experts can select from the identified stressed scenarios only the plausible ones, and discard the ones deemed to be flawed. Even in this case, we can perform simulations following the selected stressed trajectories in the SBCN and observe the reactions of the portfolios in these stressed scenarios of interest, and thus adjust the portfolios based on the reactions. Another direct usage of our approach is when experts have a particular candidate stress scenario in mind, which can be justified a priori; in this case one can skip the process of classification and directly adjust the SBCN mutatis mutandis. Therefore, simulations of the adjusted SBCN will also offer the reactions of the portfolio to this particular stressed scenario.
We believe, based on our empirical analysis, that we have devised an efficient automated stress testing method using machine learning and causality analysis in order to solve a critical regulatory problem, as demonstrated by the algorithm’s ability to recover the causal relationships in the data, as well as its efficiency, in terms of computation and data usage. We plan to test our algorithms on real data to compare against human experts in a commercial setting, and based on our promising results with the simulated data, we are confident that the resulting platform will find a significant fraction (if not all) of the adversarial scenarios.
G. Gao, B. Mishra, D. Ramazzotti,
Efficient simulation of
financial stress testing scenarios with suppes-bayes causal networks, in:
International Conference on Computational Science, ICCS 2017, 12-14 June
2017, Zurich, Switzerland, 2017, pp. 272–284.
- (2) A. J. McNeil, R. Frey, P. Embrechts, Quantitative Risk Management: Concepts, Techniques and Tools, Princeton University Press, 2010.
- (3) S. Manganell, R. F.Engle, Value at risk models in finance, European Central Bank Working Paper Series.
- (4) S. Raychaudhuri, S. J. Mason, R. R. Hill, L. Mönch, O. Rose, T. Jefferson, J. W. Fowler, Introduction to monte carlo simulation, in: 2008 Winter Simulation Conference, 2008.
- (5) J. O’Brien, P. J. Szerszen, An evaluation of bank var measures for market risk during and before the financial crisis, Finance and Economics Discussion Series.
- (6) C. Levin, T. Coburn, Wall Street and the Financial Crisis: Anatomy of a Financial Collapse, United States Senate Permanent Subcommittee on Investigations, 2011.
- (7) S. Claessens, M. A. Kose, Financial crises: Explanations, types and implications, IMF Working Paper Series.
- (8) R. Rebonato, Coherent Stress Testing: a Bayesian approach to the analysis of financial stress, John Wiley & Sons, 2010.
- (9) D. McCuistion, D. Grantham, Causes of the 2008 financial crisis.
- (10) D. Hume, An inquiry concerning human understanding, Vol. 3, 1793.
- (11) J. Pearl, Causality: models, reasoning and inference, Econometric Theory 19 (675-685) (2003) 46.
- (12) M. Quagliariello, Stress-testing the Banking System : Methodologies and Applications, Cambridge University Press, 2009.
- (13) K. Dent, B. Westwood, Stress testing of banks: An introduction, Bank of England Quarterly Bulletin 2016 Q3.
- (14) Committee on the Global Financial System, Stress testing at major financial institutions: survey results and practice (2005).
- (15) E. F. Fama, K. R. French, Multifactor explanations of asset pricing anomalies, The journal of finance 51 (1) (1996) 55–84.
- (16) D. Koller, N. Friedman, Probabilistic graphical models: principles and techniques, MIT press, 2009.
- (17) N. Beerenwinkel, N. Eriksson, B. Sturmfels, Conjunctive bayesian networks, Bernoulli (2007) 893–909.
- (18) L. O. Loohuis, G. Caravagna, A. Graudenzi, D. Ramazzotti, G. Mauri, M. Antoniotti, B. Mishra, Inferring tree causal models of cancer progression with probability raising, PloS one 9 (10) (2014) e108358.
- (19) D. Ramazzotti, G. Caravagna, L. Olde Loohuis, A. Graudenzi, I. Korsunsky, G. Mauri, M. Antoniotti, B. Mishra, Capri: efficient inference of cancer progression models from cross-sectional data, Bioinformatics 31 (18) (2015) 3016–3026.
- (20) D. Ramazzotti, A. Graudenzi, G. Caravagna, M. Antoniotti, Modeling cumulative biological phenomena with suppes-bayes causal networks, arXiv preprint arXiv:1602.07857.
- (21) D. Ramazzotti, M. S. Nobile, M. Antoniotti, A. Graudenzi, Learning the probabilistic structure of cumulative phenomena with suppes-bayes causal networks, arXiv preprint arXiv:1703.03074.
- (22) P. Suppes, A probabilistic theory of causality, North-Holland Publishing Company Amsterdam, 1970.
- (23) G. Caravagna, A. Graudenzi, D. Ramazzotti, R. Sanz-Pamplona, L. De Sano, G. Mauri, V. Moreno, M. Antoniotti, B. Mishra, Algorithmic methods to infer the evolutionary trajectories in cancer progression, Proceedings of the National Academy of Sciences 113 (28) (2016) E4025–E4034.
- (24) A. Markov, N. Nagorny, Algorithm theory, Trudy Mat. Inst. Akad. Nauk SSSR 42 (1954) 1–376.
S. Kleinberg, B. Mishra, The temporal logic of causal structures, in: Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, AUAI Press, 2009, pp. 303–312.
- (26) F. Bonchi, S. Hajian, B. Mishra, D. Ramazzotti, Exposing the probabilistic causal structure of discrimination, International Journal of Data Science and Analytics 3 (1) (2017) 1–21.
- (27) F. Ciesinski, M. Größer, On probabilistic computation tree logic, in: Validation of Stochastic Systems, Springer, 2004, pp. 147–188.
- (28) H. Hansson, B. Jonsson, A logic for reasoning about time and reliability, Formal aspects of computing 6 (5) (1994) 512–535.
- (29) K. Pearson, Mathematical contributions to the theory of evolution.–on a form of spurious correlation which may arise when indices are used in the measurement of organs, Proceedings of the royal society of london 60 (359-367) (1896) 489–498.
- (30) G. Schwarz, et al., Estimating the dimension of a model, The annals of statistics 6 (2) (1978) 461–464.
- (31) H. Akaike, Information theory and an extension of the maximum likelihood principle, in: Selected Papers of Hirotugu Akaike, Springer, 1998, pp. 199–213.
B. Efron, Nonparametric estimates of standard error: the jackknife, the bootstrap and other methods, Biometrika 68 (3) (1981) 589–599.
- (33) S. R. Safavian, D. Landgrebe, A survey of decision tree classifier methodology.
B. Ripley, Tree: Classification
and Regression Trees, r package version 1.0-37 (2016).
- (35) M. Scutari, Learning bayesian networks with the bnlearn r package, arXiv preprint arXiv:0908.3817.