Machine Learning Algorithms for Financial Asset Price Forecasting

by   Philip Ndikum, et al.

This research paper explores the performance of Machine Learning (ML) algorithms and techniques that can be used for financial asset price forecasting. The prediction and forecasting of asset prices and returns remains one of the most challenging and exciting problems for quantitative finance and practitioners alike. The massive increase in data generated and captured in recent years presents an opportunity to leverage Machine Learning algorithms. This study directly compares and contrasts state-of-the-art implementations of modern Machine Learning algorithms on high performance computing (HPC) infrastructures versus the traditional and highly popular Capital Asset Pricing Model (CAPM) on U.S equities data. The implemented Machine Learning models - trained on time series data for an entire stock universe (in addition to exogenous macroeconomic variables) significantly outperform the CAPM on out-of-sample (OOS) test data.



There are no comments yet.


page 10


MegazordNet: combining statistical and machine learning standpoints for time series forecasting

Forecasting financial time series is considered to be a difficult task d...

Machine Learning for Forecasting Mid Price Movement using Limit Order Book Data

Forecasting the movements of stock prices is one the most challenging pr...

Machine learning based forecasting of significant daily returns in foreign exchange markets

Asset value forecasting has always attracted an enormous amount of inter...

How is Machine Learning Useful for Macroeconomic Forecasting?

We move beyond "Is Machine Learning Useful for Macroeconomic Forecasting...

Machine Learning for Financial Forecasting, Planning and Analysis: Recent Developments and Pitfalls

This article is an introduction to machine learning for financial foreca...

Reviewing Data Access Patterns and Computational Redundancy for Machine Learning Algorithms

Machine learning (ML) is probably the first and foremost used technique ...

Company classification using machine learning

The recent advancements in computational power and machine learning algo...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction - Motivations

The prediction and forecasting of asset prices in international financial markets remains one of the most challenging and exciting problems for quantitative finance practitioners and academics alike [henrique_bruno_2019, heaton_polson_2016]

. Driven by a seismic increase in computing power and data researchers and investment firms point their attention to techniques found in Computer Science namely the promising fields of Data Science, Artificial Intelligence (AI) and Machine Learning (ML). Each day humans generate and capture more than 2.5 quintillion bytes of data. More than

of the data generated in recorded human history was created in the last few years and it is estimated that this amount will exceed

Zettabytes or trillion gigabytes by [Dobre_2014, Sivarajah_2016].

This increase in data presents an enormous opportunity to leverage techniques and algorithms developed in the field of Machine Learning especially it’s sub-field Deep Learning. Machine and Deep Learning prediction algorithms have been specifically designed to deal with large volume, high dimensionality and unstructured data making them ideal candidates to solve problems in an enormous number of fields [chen_big_2014, Najafabadi_2015]

. Companies around the world have made breakthroughs successfully commercializing Machine Learning R&D (Research and Development) and notable progress has been made in the fields of medicine, computer vision (in autonomous driving systems) as well as robotics

[li2017deep, Obermeyer_2016]. Studies estimate the annual potential value of Machine Learning applied in banking and the financial services sector as much percent of global revenues which approximates to bn [Buchanan_2019, Mckinsey_2018]. When contrasted with traditional finance models (which are largely linear) Machine Learning algorithms present the promise of accurate prediction utilising new data sources previously not available. Investment professionals often refer to this non traditional data as “alternative data" [monk_2018]. Examples of alternative data include the following:

  • Satellite imagery to monitor economic activity. Example applications: Analysis of spatial car park traffic to aid the forecasting of sales and future cash flows of commercial retailers. Classifying the movement of shipment containers and oil spills for commodity price forecasting

    [Orfanidis_2018]. Forecasting real estate price directly from satellite imagery [bency_spatial_2017].

  • Social-media data streams to forecast equity prices [bollen2011twitter, zhang2011predicting] and potential company acquisitions [Xiang_2012].

  • E-commerce and credit card transaction data [2016_butaru] to forecast retail stock prices [Darroch_2017].

  • ML algorithms for patent analysis to support the prediction of Merger and Acquisitions (M&A) [Wei_2009].

Performant Machine Learning algorithm should be able to capture non-linear relationships from a wide variety of data sources. Heaton and Polson [heaton_polson_2016] state the following about Machine Learning algorithms applied to finance:

“Problems in the financial markets may be different from typical deep learning 111For the purposes of simplicity this paper will use the terms Deep Learning and Neural Networks interchangeably. applications in many respects…In theory [however] a [deep neural network] can find the relationship for a return, no matter how complex and non-linear…This is a far cry from both the simplistic linear factor models of traditional financial economics and from the relatively crude, ad hoc methods of statistical arbitrage and other quantitative asset management techniques" [heaton_polson_2016, p. 1]

In recent years a greater number of researchers have demonstrated the impressive empirical performance of Machine Learning algorithms for asset price forecasting when compared with models developed in traditional statistics and finance [gu_2018, chen_2019, RePEc:grz:wpaper:2019-06, RePEc:arx:papers:1909.04497, heaton_2016]. Whilst this paper tests results on U.S equities data, the theory and concepts can be applied to any financial asset class - from real estate, fixed-income [Martn2018MachineLM] and commodities to more exotic derivatives such as weather, energy [Cramer_2017] or cryptocurrency derivatives [2018_Alessandretti, Lahmiri_2019]. The ability of an individual or firm to more accurately estimate the expected price of any asset has enormous value to practitioners in the fields of corporate finance, strategy, private equity in addition to those in the fields of trading and investments. In recent years central banks including Federal Reserve Banks in the U.S [Lemieux_2018] and the Bank of England [Andreas_2019, Chakraborty_2017] have also attempted to leverage Machine Learning techniques for policy analysis and macroeconomic decision making. Before we delve into the world of Machine Learning we shall first lay the ground work of traditional financial theories using the highly popular Capital Asset Pricing Model (CAPM) focusing on the theory from a practitioner’s perspective.

2 The Capital Asset Pricing Model (CAPM)

2.1 Introduction

The Capital Asset Pricing Model (CAPM) was independently developed in the ’s by William Sharpe [Sharpe_1964], John Lintner [Lintner_1965], Jack Treynor [Treynor_1961] and Jan Mossin [Mossin_1966] building on the earlier work of Markovitz and his


and market portfolio models [Markowitz_1959]. In Sharpe received a Nobel Prize for his work on the CAPM and the theory remains one of the most widely taught on MBA (Master of Business Administration) and graduate finance and economics courses [Womack_2001]. The CAPM is a one-factor model that assumes the market risk or beta is the only relevant metric required to determine a theoretical expected rate of return for any asset or project.

2.2 Assumptions and Model

The CAPM holds the following main assumptions:

  1. One-period investment model: All investors invest over the same one-period time horizon.

  2. Risk averse investors: This assumption was initially developed by Markovitz and asserts that all investors are rational and risk averse actors in the sense that when choosing between financial portfolios investors aim to optimize the following:

    1. Minimize the variance of the portfolio returns.

    2. Maximize the expected returns given the variance.

  3. Zero transaction costs: There are no taxes or transactional costs.

  4. Homogenous information: All investors have homogenous views and information regarding the probability distributions of all security returns.

  5. Risk free rate of interest: All investors can lend and borrow at the at a specified risk free rate of interest .

The CAPM provides an equilibrium relationship [Mossin_1966, Nielsen_1990] between investments ’s expected return and its market beta and is mathematically defined as follows:


If the subscript denotes the market portfolio then we have:

A survey [Graham_2001] of international companies demonstrated that the CAPM is one of the most popular models used by company CFO’s (Chief Financial Officers) to calculate the cost of equity. The cost of equity is one of the inputs for the classical Weighted Average Cost of Capital (WACC) of a firm. The pre-tax WACC of a levered firm [Farber_2005] is given by


where are the cost of equity and cost of debt respectively. and are the respective values of debt, equity and the net value of a respective firm or project. The WACC has broad array of applications and is often used by practitioners as the discount rate in present value calculations that estimate the value of firms in M&A [Arzac_2004], individual company projects, as well as options [Arnold_2004] based on forecasted future cash flows. Let us take the example of valuing a company using Discounted Cash Flow (DCF) analysis. The discrete form of the DCF model to value a company can be defined as follows 222The discrete DCF model can be found in most popular graduate finance textbooks such as [1990_Copeland, berk2017corporate]. The can be computed as follows .:


where denotes the forecasted future cash flow at year and denotes the discount rate which we can let be equivalent to the WACC (let WACC). When using DCF models the computed cost of equity from the CAPM and the subsequent WACC calculation will therefore have a significant impact on the estimated company value.

2.3 Criticism - Empirical Performance

The CAPM does not provide the practitioner with any insight on how to correctly apply the model and thus has lead to a broad array of empirical results and corresponding criticisms from researchers such as Fama and French [Fama_2004] in addition to Dempsey [Dempsey_2013] who argue that the model has poor empirical performance and should be abandoned entirely. In relation to technical implementations many researchers (including those in the early literature) relaxed some of the restrictive assumptions of the CAPM to produce impressive empirical results relative to the models simplicity [Blume_1970, Brennan_1971, Jagannathan_1994].

In [Brown_2013], Brown and Terry argue that given the broad CAPM model implementations, it is invalid to make arguments based on the computational evidence and firmly assert that the CAPM or at least one of its variant models will remain in the core finance literature for many years to come. Additionally Partington [Partington_2013] states the following in response to modern researchers criticizing the empirical results of the CAPM: “Empirical tests of the CAPM probably don’t tell us much one way or another about the validity of the CAPM, but they have revealed quite a lot about correlations between variables in relation to the cross-section of realized returns" [Partington_2013]. Full technical details of model implementation will be explored in sub section 4.2. Unlike the Data Science and Machine Learning algorithms we will explore in the next section the implementation of the Capital Asset Pricing Model historically can be seen as much of an art form as it is a science. The ML algorithms that are implemented can theoretically be applied to the same use cases as the CAPM - whether that be in expected returns prediction or corporate valuation.

3 Machine Learning Algorithms

Whilst the terms Machine Learning and Artificial Intelligence are ill-defined in the current literature [Ryll2019EvaluatingTP] we shall use classic definition provided by [marr1977artificial] which defines Artificial Intelligence as the “isolation of a particular information processing problem, the formulation of a computational theory for it, the construction of an algorithm that implements it, and a practical demonstration that the algorithm is successful".

In the context of financial asset price forecasting the information processing problem we are trying to solve is the prediction of an asset price time steps in the future - we are effectively trying to solve a non-linear multivariate time series problem. Our Machine Learning algorithms and techniques should extract patterns learned from historical data in a process known as “training" and subsequently make accurate predictions on new data. The assessment of the accuracy of our algorithms is known as “testing" or “performance evaluation". Whilst there exist a large number of types and classes of Machine Learning algorithms [Ayodele_2010]

a high percentage of the research papers in the current academic literature frame the problem of financial asset price forecasting as a “supervised learning" problem

[yoo_2005, krollner_2010, henrique_bruno_2019, Ryll2019EvaluatingTP]. Given the practical and empirical focus of this paper - strict algorithm definitions and mathematical proofs of algorithms will be omitted - instead a simple theoretical framework of supervised learning is provided in the next subsection.

3.1 Supervised Learning - theoretical framework

Definition 3.1 (Supervised Learning).

Given a set of example input-output pairs of data

we first assume that was generated by some unknown function which can be modelled by our supervised learning algorithm (the “learner").

here denotes a vector containing the explanatory input variables and

is the corresponding response. If we are doing a batch prediction on multiple input vectors (denoted by the matrix ) we can denote our mapping using where . In general we decompose our data into a training set and a test set. In the training phase our algorithm will learn to approximate our function to produce a prediction - we will denote the approximated function as . In Machine Learning we evaluate the performance of algorithm using a accuracy measure which is usually a function of the error terms in the test set - we will denote this performance metric as . The standard formulation of supervised learning tasks are known as “classification" and “regression":

  • Classification: The learner is required classify the probability of an event or multiple events occurring. Thus we have the mapping such that . Examples: Classifying the probability of an economic recession happening for a target nation or classifying the probability of a target company being merged or acquired (M&A prediction).

  • Regression: The learner is required to predict a real number as the output. Thus we have the mapping . Examples: Predicting the annual returns of a financial asset time periods in the future (which will be implemented in this paper). Another example is the forecasting of interest rates or yield curves.

Independent of the algorithm or class of algorithms that are selected and implemented, the function is usually estimated using techniques and algorithms from the fields of Applied Mathematics known as Numerical Analysis and Optimization. Bennett and Parrado-Hernandez [bennett_2006, p. 1266] make the important observation that “optimization lies at the heart of machine learning. Most machine learning problems reduce to optimization problems". In supervised learning we will determine our function

by iteratively minimizing a loss function

which minimizes the error on the parsed training data [ng_2011]. Mathematically in a general form we have


which is an optimization problem. Independent of the algorithm or loss function choice the ML literature is heavily concerned with addition of a term to equation 4 known as a regularization term [zaremba_2014, 2004_ng]. This is a mathematical term used to reduce the problem of overfitting [2004_hawkins]. Backtest overfitting in finance refers to models which perform well on historical data but ultimately do not perform well “in the wild" (on new live data streams). An additional benefit of ML techniques for financial asset pricing is that modern researchers are heavily focussed on designing algorithms and techniques that systematically reduce the problem of overfitting to increase the probability of accurate forecasts on new data streams [Salehipour_2016, bailey_2017].

for tree=l sep=3em, s sep=3em, anchor=center, inner sep=0.7em, fill=blue!50, circle, where level=2no edge [ Company and macroeconomic financial data, node box [Apply sample and feature pre-processing, node box, alias=bagging, above=4em [,red!70,alias=a1[[,alias=a2][]][,red!70,edge label=node[above=1ex,red arrow][[][]][,red!70,edge label=node[above=1ex,red arrow][,red!70,edge label=node[below=1ex,red arrow]][,alias=a3]]]] [,red!70,alias=b1[,red!70,edge label=node[below=1ex,red arrow][[,alias=b2][]][,red!70,edge label=node[above=1ex,red arrow]]][[][[][,alias=b3]]]] [   ,scale=2,no edge,fill=none,yshift=-4em] [,red!70,alias=c1[[,alias=c2][]][,red!70,edge label=node[above=1ex,red arrow][,red!70,edge label=node[above=1ex,red arrow][,alias=c3][,red!70,edge label=node[above=1ex,red arrow]]][,alias=c4]]]] ] [tree box, fit=(a1)(a2)(a3)](t1); [tree box, fit=(b1)(b2)(b3)](t2); [tree box, fit=(c1)(c2)(c3)(c4)](tn); [below right=0.5em, inner sep=0pt] at (t1.north west) Tree 1; [below right=0.5em, inner sep=0pt] at (t2.north west) Tree 2; [below right=0.5em, inner sep=0pt] at (tn.north west) Tree ; t1.southwest)--tn.south east) node[midway,below=4em, node box] (mean) Average the predictions of ensemble of simple regression trees; [below=3em of mean, node box] (pred) Asset price or return prediction; [black arrow=5mm4mm] (bagging) – (t1.north); [black arrow] (bagging) – (t2.north); [black arrow=5mm4mm] (bagging) – (tn.north); [black arrow=5mm5mm] (t1.south) – (mean); [black arrow] (t2.south) – (mean); [black arrow=5mm5mm] (tn.south) – (mean); [black arrow] (mean) – (pred);

Gradient boosting tree algorithms are greedy algorithms that first pre-process the data in a process known as sample and feature bagging [Friedman_2001]. In contrast to the CAPM which is a single model gradient boosted tree algorithms train an ensemble of weak base learners (simple regression trees). These base learners are aggregated through a function such as an average to produce the final asset price prediction. The final function is estimated through the minimization of the pre-defined loss function described in 3.1.

Figure 1: Example of Gradient Boosted Tree architecture
  • [ nosep, leftmargin=0pt, rightmargin=itemindent=listparindent=]

  • [height=5] [count=4, bias=true, title=Input
    layer, text=[count=5, bias=false, title=Hidden
    layer , text=

    [count=1, title= Output
    layer prediction, text=


Shallow Feed-Forward Neural Network Architecture

  • [ nosep, leftmargin=0pt, rightmargin=itemindent=listparindent=]

  • [height=5] [count=3, bias=true, title=Input
    layer, text=[count=4, bias=false, title=Hidden
    layer 1, text=[count=5, bias=false, title=Hidden
    layer 2, text=[count=5, bias=false, title=Hidden
    layer 3, text=[count=5, bias=false, title=Hidden
    layer 4, text=[count=2, bias=false, title=Hidden
    layer 5, text=[count=1, title= Output
    layer prediction, text=

(b) Deep Feed-Forward Neural Network Architecture
Figure 2: Example Feed-Forward Neural Network Architectures

3.2 Incorporating regulatory and financial constraints into ML algorithms

In financial asset pricing (and quantitative finance more broadly) professionals are not only concerned with accurate forecasts but are also constrained by investor risk appetite, traditional econometric and financial theory (such as CAPM), and more increasingly - international regulatory concerns in North America and Europe [2017_Cath]. Many in the investment world have been sceptical about Machine Learning due to misconceptions that the algorithms are closed sourced and black boxes [1997_benitez, wang_2007]. One can argue that this scepticism is warranted - [chiu_2016] notes that the global financial crisis of caused regulators to move away from an excessively laissez-faire approach to financial regulation to an aggressive, forward looking and proactive approach. Additionally from a psychological perspective, Dietvorst et al [Dietvorst_2014] have shown even though algorithmic forecasts outperform human forecasters by at least across multiple domains a body of Psychology research demonstrates that humans have a low tolerance to the errors of machines and algorithms - a phenomenon coined as “algorithm aversion". As new technologies and algorithms develop regulators have therefore been quick to introduce new policies and laws to ensure adequate protections to society and the general public [Kirilenko_2013]. Regulators from the European Union (E.U) have acted swiftly in their implementation of a large number of laws such as the E.U General Data Protection Regulation (GDPR) and the Markets in Financial Instruments Directive (MiFID) II which both came into effect in . Sheridan and Iain [Shredian_2017, p. 420] state the following:

“Under Article 17(2) of MiFID II, a competent authority may require the investment firm to provide, on a regular or ad hoc basis, a description of the nature of its algorithmic strategies, details of its trading parameters or limits to which the system is subject" [Shredian_2017, p. 420].

Additionally Recital 71 of the GDPR affords consumers the rights to ask private institutions to explain ML algorithms [goodman_2016, kush_2016]. Much of the recent innovation in the field of ML for asset pricing directly relates to creating more transparent algorithms that can be explained to both regulators and investors [kou_2019]. Table I provides a high level overview of regulatory constraints and recent ML literature that attempts to solve regulatory problems. We implemented modern statistical techniques to provide explainability to our Machine Learning models. As we move into an increasingly regulated world, state-of-the-art financial asset price forecasting performance and adoption will require the collaboration of experts in the legal, financial and scientific communities.

Regulatory constraints Relevant literature solutions
Transparency and
Explainability [goodman_2016, kush_2016, kou_2019, Johnson_2019, Citron_2014].
Finance practitioners and academics have focussed on modifying ML algorithm loss functions to incorporate traditional finance theory to allow for greater transparency. [chen_2019, pelger_2019] for example demonstrate an improvement of algorithm performance when they incorporate no-arbitrage pricing theory constraints from traditional finance [ross_1976]. Additionally [Feng_2017, kelly_2017] systematically utilise modern ML and statistical algorithms to reduce the number of features or “factors" used in asset pricing - this ultimately serves to improve model transparency in the face of high dimensionality and large unstructured data.

In the ML literature there has been a drive to develop domain agnostic software packages and tools to explain trained ML models [2018_Adadi]. Most notably techniques such as LIME [ribeiro_2016] and SHAP [Lundberg_2017] have been implemented in multiple popular programming languages to allow for detailed explanations of any supervised learning model.
Risk Management [Schmaltz_2013, Johnson_2018, Ang_2011, Shredian_2017]. There have been a broad array of solutions relating to risk management for ML in the finance literature: [Chandrinos_2018, Aziz_2018] explore how we can develop and incorporate supplementary ML models to statistically account for financial risk in our investment and trading systems. Recent papers such as [kou_2019, Coulombe2019HowIM, Berge_2013] explore how ML can be used to directly forecast macroeconomic variables to identify systematic risk and economic recessions. [Alberg_2017] also demonstrates how transparent models can be built by forecasting company fundamentals (cash flow, and balance sheet line items) rather than asset prices directly.

In the ML literature there has also been a movement away from the development of point forecasts models to Bayesian techniques which allow for probabilistic forecasts [gal_2015, Zhu_2017, Duan2019NGBoostNG, Ghahramani_2015]. These Bayesian algorithms inherently allow the end user to account for risk through posteriori probability distributions: [Spiegeleer_2018, Ruf_2019] review how these Bayesian techniques have been utilized for option pricing and derivatives hedging strategies.
Table I: Regulatory constraints relevant to ML for financial asset price forecasting

4 Empirical performance on U.S equities

4.1 Machine Learning algorithms

Empirical studies on the best supervised learning algorithms tend to suggest that decision tree and neural network algorithms perform the best across multiple domains and data-sets

[caruana_2008, caruana2006empirical]

. We will evaluate our algorithms on publicly traded U.S equities data. Researchers evaluating Machine Learning algorithms empirically in the domain of equity price and return forecasting have stressed the importance of both Artificial Neural Network (ANN) architectures and ensembles of decision trees (such as random forests, and gradient boosted trees)

[dongdog_2019, krollner_2010, chen_2019, kelly_2017]. For this reason algorithm testing was focussed on Python implementations of neural network, and gradient boosted trees. The gradient boosting tree and neural network architectures are illustrated in Figures 1 and 2 respectively. Modern packages such as NGBoost developed by Duan et al () [Duan2019NGBoostNG] which allows for probabilistic forecasting were also implemented and tested.

Each of our Machine Learning algorithms have a unique set of initial parameters known as hyperparameters

. These hyperparameters are initial model conditions which determine the performance our models. The ideal hyperparameters for optimal model performance cannot be known a priori and thus traditionally would require extensive manual tuning and configuration. In a relatively new sub-field of Machine Learning known as Automated Machine Learning (AutoML)

[Yao_2018] researchers have worked hard to develop and create software packages to automate the process of manual hyperparameter optimization using advanced Bayesian and biologically-inspired algorithms [Claesen_2015]. The researcher and practitioner must only defined the search space for each of the model parameters and allow the algorithm to run for number of trials or simulations, the algorithm will then attempt to search for the hyperparameters that produce the best model performance. Since we are attempting to determine whether Machine Learning algorithms outperform the CAPM model a combination of grid-search and Bayesian hyperparameter optimization was used to determine the best neural network and gradient boosting models. The models were tested using the University of Oxford’s Advance Research Computing (ARC) multi-GPU (Graphical Processing Unit) clusters. Table II

shows a high level overview of the implemented models and their respective optimized hyperparameters. A single model was built for each algorithm to predict the annual returns of our entire stock universe described in the next sub-section. Neural networks were implemented using Keras


ML Algorithm Optimization Algorithms Optimized Hyperparameters
NGBoost [Duan2019NGBoostNG]. Grid Search. Number of tree estimators (weak base learners).
XGBoost. HyperOpt implementation of the Tree of Parzen Algorithm for trials [Bergstra_2013]. Number of tree estimators, maximum depth for each tree, learning rate, regularization parameters, data sampling parameters.
Catboost [Prokhorenkova_2018]. HyperOpt implementation of the Tree of Parzen Algorithm for trials [Bergstra_2013]. Number of tree estimators, maximum depth for each tree, learning rate, regularization parameters, data sampling parameters.
LightGBM [XGboost_2016]. HyperOpt implementation of the Tree of Parzen Algorithm for trials [Bergstra_2013]. Number of tree estimators, maximum depth for each tree, learning rate, regularization parameters, data sampling parameters.
Shallow Feed-Forward
Neural Network (Shallow FNN). [chollet2015keras]
HyperOpt implementation of the Tree of Parzen Algorithm for trials [Bergstra_2013]. Number of hidden layers where , number of nodes per hidden layer where

, Batch normalization configurations for each hidden layer, regularization parameters, activation function configurations for each hidden layer.

Deep Feed-Forward Neural Network (Deep FNN) [chollet2015keras]. HyperOpt implementation of Tree Parzen Algorithm for trials [Bergstra_2013]. Number of hidden layers where , number of nodes per hidden layer where , Batch normalization configurations for each hidden layer, regularization parameters, activation function configurations for each hidden layer.
Table II: Overview of the implemented Machine Learning models and optimized hyperparameters

4.2 Data and experimental details

We conduct a large scale empirical analysis for all publicly traded U.S stocks available on the Wharton Research Data Services (WRDS) cloud [WRDS_1993] that have existed and survived from to the start of covering a time horizon of years - this equated to stocks. The study attempts to predict the annual returns of this stock universe using the Machine Learning algorithms shown in Table II versus the CAPM. All data was pulled from the WRDS. From the WRDS cloud, data was pulled from the Center for Research in Security Prices (CRSP) and the Standard & Poor’s (S&P) Global Market Intelligence Compustat databases. Data was extracted for monthly and annually asset prices, accounting financial statements (balance sheet, income statement, cash flow statement) in addition to macroeconomic factors. Figure 3 shows a sample of the U.S macroeconomic time series features which included monthly data on consumer price indices, bond rates, gross domestic product (GDP) and other features.

Figure 3: Example macroeconomic time-series data used in Machine Learning models

A proprietary python software package was built on top of the official WRDS python API’s to automate the extraction and transformation of asset price data from WRDS, in addition to the training and testing of both Machine Learning and CAPM models to allow for replicable results. In our computation of the CAPM model shown in equation 1 we follow the recommendations of [2014_Plyakha, Pae_2015] and use a value-weighted (VW) U.S S&P 500 index to represent the market portolio versus an equally weighted (EW) index. Based on empirical studies of CAPM applied to U.S markets year U.S Treasury Bill returns were used as a proxy for the risk free rate [Mukherji_2011]. Additionally [Chervany_1980, Daves_2000] recommend using a time horizon of roughly three to eight years to estimate the asset beta - a time series window of three years was for all our historical return calculations 333Beta was computed using from 3-year monthly stock and market returns using . Based on the literature [Cooper_1996, Jacquier_2003]

a simple arithmetic was used versus the geometric mean to compute the annualized average rate of return from monthly returns

444Modern researchers such as Liu et al [Liu_2019] in addition to Hasler and Martineau [Hasler_2019] have argued based on empirical evidence that it is better to compute the annualized monthly return from the arithmetic mean of daily returns. To avoid data sparsity issues over the long time horizon a more conventional approach was followed here. data over the three year time horizon.

For our Machine Learning models after data extraction and pre-processing is completed we must generate the training and test sets described in 3.1. In time series problems one must maintain the temporal order in the training and test set to prevent what is known as feature leakage in the Machine Learning literature or an equivalent term is look-ahead bias in finance and econometrics. Assuming we maintain the temporal order of our time series data the canonical approach taken the literature is to use an OOS (out-of-sample) evaluation whereby a sub-sample at the end of the time-series is held out for validation. We will use the phrase sequential evaluation to denote this type of validation.

It is important to note that recent researchers [Bergmeir_2012, Roberts_2016] have focused their attention on new forms of blocked cross-validation specifically for time-series problems to ensure robust models which do not over-fit datasets. Given the seasonality of annual international stock returns demonstrated by researchers such as Heston and Sadka [Heston_2010] these new methods of evaluating time series forecasts may be of significant interest to practitioners and industrial researchers. Recent empirical studies conducted by Cerqueira et al () [Cerqueira_2019] and Mozetic et al [Mozetic_2018] () have not demonstrated a significant improvement of these new cross-validation techniques for non-stationary time series data and thus a conventional sequential evaluation approach was followed here. 30 of the data at the end of the time-series was held out for testing reflecting approximately years of unseen data from . The final Machine Learning model training and test data-sets contained approximately features relating to company financial performance in addition to exogenous U.S macroeconomic variables for the three years prior to the prediction year.

4.3 Results

For the performance metric we used the Mean Squared Error (MSE). Given the predicted annual return and the actual annual return the MSE is defined as follows555The ideal model is that which minimizes the MSE on OOS (out-of-sample) data.:


The model results are summarized in Table III

. The results demonstrate that the Machine Learning techniques result a significant performance improvement over the classical Capital Asset Pricing Model. This demonstrates the power and flexibility of Machine Learning techniques for econometric and financial markets prediction. In line with the literature the gradient boosting tree models performed similarly to the neural network models. Given that the Deep FNN performed better than the Shallow FNN it may be exploring convolutional neural network and recurrent neural network architectures in future work (these architectures have been shown to be highly performant on complex and high dimensionality financial forecasting problems). As computing power, availability of datasets, and scientific innovation continues to increase new model architectures and algorithms will be developed which will ultimately allow the performance of these models to improve with time.

Optimized Model Mean Squared Error (MSE)
Shallow FNN
Deep FNN
Table III: Model Results

These results demonstrate the benefit of Machine Learning for financial institutions and practitioners around the world. Whilst we focussed on U.S. equities the same techniques and algorithms could be applied to any asset class. In sub-section 3.2 we explained the development of packages such as SHAP and LIME that enable researchers to explain the results of Machine Learning models to investors and regulators. Figures 4(a) and 4(b) provide the top features666Full feature descriptions can be found on the WRDS cloud. produced by a Python implementation of SHAP (SHapley Additive exPlanations) for the Catboost and XGBoost models. The feature importance plots demonstrate the significance of macroeconomic variables in the prediction of annual returns for U.S equities. As one would expect the U.S GDP, in addition to wholesale and industrial price indices had a significant impact on the predicted returns. One interesting thing to observe is that outside of stock price and stock volume these macroeconomic variables seemed to be far more important than individual stock accounting fundamentals (cash flow statement and balance sheet line items) for the prediction of annual returns in the years . These trained universal Machine Learning models should generalize to predict annual returns for all U.S equities on future data.

  • [ nosep, leftmargin=0pt, rightmargin=itemindent=listparindent=]

(a) Catboost SHAP Feature Importance Plot
  • [ nosep, leftmargin=0pt, rightmargin=itemindent=listparindent=]

(b) XGBoost SHAP Feature Importance Plot
Figure 4: Example Top 10 feature importance plots for model interpretability created using SHAP (SHapley Additive exPlanations) by Lundberg and Lee (2017) [Lundberg_2017].

4.4 Conclusion

In conclusion this paper first provided explanations and theoretical frameworks for the CAPM and supervised learning used in Machine Learning. Secondly, this paper explored the regulatory and financial constraints placed on those applying Data Science and Machine Learning techniques in the field of quantitative finance. It was shown that practitioners have a numerous amount of regulatory challenges and hurdles to ensure that scientific innovations can be lawfully adopted and implemented. Finally, we provided a large scale empirical analysis of Machine Learning algorithms versus the CAPM when predicting annual returns of U.S equities using state-of-the-art Machine Learning techniques and high performance computing (HPC) infrastructures. The results demonstrated the superior performance and power of Machine Learning techniques to forecast annual returns. In contrast to traditional finance theories such as the CAPM, the Machine Learning algorithms had the flexibility to incorporate approximately time series features to predict the returns for each target U.S equity. As we move further into the age of big data, Data Science and Machine Learning algorithms will increasingly dominate the world of economics and finance.