As record amounts of funds are allocated to private equity (PE) investments – a record breaking $671 billion in 2017 – the main issue private equity investors are struggling with has not changed in decades: the absence of transparent, easily accessible valuation-related information. PE investors, looking at possible investments in privately held companies are lacking quantitative information allowing to build their investment case, and are resorting to methods such as portfolio diversification to compensate this lack of company specific information. PE investors critically need structured approaches allowing to infer basic future performance measures such as, the prospect for a company to IPO in the future or the value of a private company as a potential acquisition target. The early capital raising experience of a private company provides some of the first and richest available informational. Typically early stage companies go through successive investment rounds (seed round, series A, series B, etc.) to which various types of investors can participate: Angel investors, Venture Capitalists, Private Equity funds, each with their own focus, expertise and history. Early investors will typically play a significant role in the future development of each company, often taking significant participation, receiving one or several board seats. It is therefore reasonable to believe that parameters such as the nature and the composition of a private company early investors contains significant information regarding the future of that company.
In this paper, three models are developed and used to forecast the future performance of a private company, using as prognostic factors qualitative information available on the company’s first three investment rounds, such as the time lag between the creation of the company and each of the invest- ment rounds. Performance measures are high level measures, such as the future decision to IPO or not, the potential for a future acquisition or the risk of a potential bankruptcy. In addition to this company specific information, the models developed use also an indicator for market sentiment at the time of each investment, namely the VIX index level.
Automatic classification algorithms are plausible candidates for offering a solution to the problem of predicting private company future performance. This family of algorithms can process large quantities of both qualitative and quantitative data for calibration purposes. They have also proved efficient in similar situations, producing accurate forecasts in areas as diverse as bio-statistics [hardle2006statistical] or corporate finance [hua2007predicting].
In our framework, due to the sporadic and qualitative nature of the data available at this stage, it is difficult to expect high accuracy from any forecast. However, contrary to many conventional market strategies, successful investments in PE typically have extremely high rates of return: Peter Thiel’s investment in Facebook in 2004 ($500,000) appreciated 693,3% by the time of Facebook IPO, [facebook_worth]. Softbank investment in Alibaba in 2009 appreciated 290,0% by the time of Alibaba IPO, [alibaba_worth]. As a result, the combination of information scarcity and potential outsized returns, makes any performance forecasting indicators, even ones with marginal forecasting power, extremely valuable for private equity investors.
In this work we make an initial attempt at PE exit prediction, by focusing on the IPO/Acquired vs Private/Bankrupt classification, which we shall from now on denote for brevity as IPO and NonIPO, respectively. The model we propose is a composite decision model based on three components. Namely,
Logistic regression (LR), which models the log odds of the IPO and NonIPO outcomes via a linear function of the features;[walker1967estimation].
Random Forest (RF), which is a tree based ensemble method. Using training data, it grows a number of decision trees, each built according to a greedy optimization algorithm. RF then casts the classification label to a new data point according to a voting model across trees (the forest);[breiman2001random].
Support Vector Machine (SVM), which attempts a geometric separation of the IPO and NonIPO samples via an hyperplane in a high dimensional space.
Since each of these methods has its own strengths and weak points, we observed that better experimental results are obtained by fusing the models into a combined model whose output is given by the majority of the outcomes of the component models, i.e., the IPO or NonIPO prediction is made on the basis of the 2/3 of the component outcomes. The algorithms tested in this paper have been already used in market and corporate finance endeavours, see, e.g., [hua2007predicting], [kumar2006forecasting]. The capabilities of a Random Forest model have been previously exploited in the context of classification of private equity data in [bhat2011predicting]
. Moreover, use of SVM and blending of SVM and classification trees has been used for estimation of financial distress in[chen2011predicting]. In general, ensemble methods, from whom fused model descend, have proved to have strong predictive power; [friedman2001elements].
we discuss the descriptive statistics and prediction metrics used throughout the study. In SectionIV we briefly describe the components models and the fused, majority based, model. Section V reports the results of the numerical experiments and a discussion on the performance of the proposed prediction model. Conclusions are finally drawn in Section VI.
Ii Input data set
The data used for this research was extracted from Thomson Reuters Eikon. The full data set contains information on US and European companies between 1996 and 2018. The data set is composed of 83544 companies belonging to 9 different industry sectors. Data is only qualitative: it contains the names of the investors’ firms, the date of the investment rounds, the company foundation year, and a public market sentiment indicator, given by the VIX index. Only information about the first 3 rounds was retained. The classification output variable (label) was the exit status of the company: IPO, Bankrupt, Merger and Acquisition (M&A), Leveraged Buyout (LBO), or Private. For this paper’s purposes, we aggregated the output into two classes, namely IPO (including actual IPOs as well as acquisitions) and NonIPO. The distribution of data for both classes (IPO, NonIPO) across time is shown in Fig.1.
Ii-a Descriptive statistics
A preliminary analysis of the data set reveals a progressive inconsistency in the data in the most recent period (2011-2018). This issue is likely to be caused by a data censoring problem, i.e., events such as acquisitions and IPO have not yet occurred in the given time frame. To circumvent this problem, we restricted our analysis to the investments in the period between 1996 and 2011, for which the corresponding outcomes are more reliable. By doing so the number of companies considered decreases from 85344 to 54697. The exit distribution for every industry sector is reported in Table I.
Iii Prediction metrics
The predictive performance of the model is quantified by means of five indicators, obtained by the comparison of the algorithm’s cast labels and the real ones, on the test data set. Considering the “IPO” class to be the positive class, we consider the standard indicators
Each indicator is a conditional probability; for example, the positive precisionexpresses the probability that, given that a company is classified as IPO by our model, the company will actually be publicly listed or acquired. For each company the algorithm provides as output the probability of an IPO exit. A threshold parameter is then used to decide the class label, i.e., a company is labeled as IPO if its IPO probability is greater than the selected threshold, and NonIPO otherwise. The level is tuned via cross validation, in order to optimize the predictive performance of the model.
Iv Component models and fused model
The three standard prediction models used in this study are briefly described next. For all models the training variables consist in a transformation of the data described in Section II. Namely, for each company we consider (i) a ranking indicator of the importance of the firms that invested in the company, (ii) the dates of the investment rounds, (iii) the company foundation year, and (iv) the VIX volatility public index value. Input investment dates consist in the time interval with respect to the company’s foundation year.
Iv-a Logistic Regression
Logistic Regression (LR) assigns an output probability to a vector of features according to the model
where , , are the logistic regression coefficient, to be estimated in the training phase of the model; [walker1967estimation], [christensen2006log].
Iv-B Random Forest
The Random Forest (RF) model is composed by an ensemble (forest) of classification trees, see, e.g., [breiman2001random]. Each of the trees is built using training data. A classification tree takes samples as input and progressively divides them using binary splits, made on a number of variables and using a greedy algorithm, [venables2002tree]. Each split is based on thresholds on a subset of the sample’s features, which are selected randomly. Once a number of trees is grown, the RF predicts a new sample’s class based on the response of the majority of trees. RF outputs are class probabilities, one for each observation. Setting the threshold for a label means that at least a proportion of the trees must agree on that label in order to cast it as the output.
The Support Vector Machine (SVM) is a classifier that attempts geometric separation of the data points in a high dimensional feature space, see, e.g., [cristianini2000introduction]. SVM builds a surface that (softly) separates the samples belonging to the two classes. In order to adapt to the data space structure, different kernels can be selected, see, e.g., [amari1999improving], [suykens1999least]. In this paper we used a radial kernel
Iv-D Fused Model
A fused model is constructed by feeding the same input to the three models described above, and then selecting as output the majority label. This model extends in the idea of the ensemble learning, averaging a response across an ensemble of very different classifiers. In order to describe the voting dynamics among the three component models, we computed the following experimental quantities: an “Agree Ratio” (AR), i.e., the probability (empirical frequency) with which all three models agree on the same outcome, ; the “True Agree Ratio IPO” (TARI), i.e., the probability of correct IPO classification conditional to the three models agreement, ; the “True Agree Ratio NonIPO” (TARNI) i.e., the probability of true NonIPO classification conditional to the three models agreement; and a probability that one of the methods (LR, RF, SVM) issues the correct classification while being the minority (TLR.MIN, TRF.MIN, TSVM.MIN, respectively). Experimental values of these quantities are reported in Section V-D.
V Experimental results
Experiments have been conducted using available data in the period 1996-2011. In order to adjust the slight positive/negative classes unbalancing shown in Figure 1, a randomized balancing resampling was implemented. The positive class was sampled from the training set with replacement until the same cardinality of the negative class was reached. Negative class was sampled without replacement.
Every component algorithm was trained and tested using a 10 fold cross-validation approach. The whole data set was split in 10 sets (or folds), randomly sampled and equally sized. Then the algorithm is performed 10 times, using as training set 9 out of the 10 folds. The predictive performance indicators are then averaged across the 10 measures. For each component algorithm, the optimal value was determined by analyzing the Receiver Operating Characteristic (ROC) plots (Figure 2, 3, 5), searching for the “knee” in the curve that maximizes the true positive rate (TPR) without increasing too much the false positive rate (FPR). The same new data point will then be presented to the three algorithms, and the three classification labels cast will be compared. The final classification label cast by the fused model will be the one that represents the majority of the three algorithms.
We used the R environment for the whole experimental procedure. Logistic Regression was provided by the default R library, while we used the randomForest package for RF, [rfRpackage], and the e1071 package for SVM, [svmRpackage]. Training was performed by the glm, randomForest and svm functions respectively. Tuning of SVM was provided by the tune function within the e1071 package.
V-a LR Results
|10 All sectors||0.615||0.510||0.622||0.717||0.620|
|10 All sectors||0.563||0.698||0.659||0.518||0.603|
V-B Random Forest Results
|10 All sectors||0.638||0.512||0.631||0.743||0.634|
|10 All sectors||0.518||0.920||0.771||0.239||0.559|
V-C SVM Results
Since the SVM turned out to be the most computationally expensive method, we applied it on a reduced number of features, obtained via Principal Component Analysis (PCA),[wold1987principal], which was performed using the princomp function in R. Figure 4
shows how the seven largest Principal Components describe more than 90% of variance of the whole original 19-dimensional data space. So, for the SVM tuning and analyses we have selected these first 7 variables.
In the SVM model, we chose a radial kernel with empirically optimized radial parameter and misclassification cost (this is a parameter of the SVM model which accounts for the tradeoff between separation margin and misclassification errors). Tuning was computationally quite demanding, since it involved multiple SVMs to be trained with different parameters. We alleviated this problem by performing the tuning on a reduced subsample of 1600 data points, along with the PCA dimensional reduction described above. We repeated the tuning session itself 200 times, with a 1600 points sample per session. Then, we computed the median and the most frequent value of tuning parameters found in these 200 tuning sessions, see Table VI). We finally selected and . The algorithm was eventually run on the entire data set, using the optimal parameters determined in the tuning phase.
The results of the SVM model are reported in Table VII for and in Table VIII for the optimized ’s value found for each sector. Contrarily to the other two algorithms, SVM needs an optimal tuning of every across different industry sectors. Tuning was performed using histograms of IPO probability, searching for a trade-off between positive recall increase and negative recall decrease. For the all sectors analysis we relied on the SVM ROC plot in Figure 5 for the choice of optimal , which resulted to be approximately equal to .
|10 All sectors||0.619||0.611||0.655||0.662||0.638|
|10 All sectors||0.513||0.873||0.701||0.265||0.551||65%|
V-D Fused Model Results
Table IX shows the performance of the fused, majority based, model.
The results of fused model shown in Table IX are satisfactory, and in line with the single-models results with (all industrial sectors classification). As stated in Section I, for PE investors any prediction method providing an accuracy that is sensibly better than a fair coin toss is potentially valuable.
We experimented also by transforming the fused model from a majority to a unanimity model, that is, we issue a IPO label only when all component model agree on that label. In this case, the results are reported in Table XI, and improve with respect to the optimized single models, since Recall- is higher than single models, and Positive+ is strengthened. This is due to the fact that the unanimity model focuses more on the quality of positive class classification.
We presented in this paper an innovative application of machine learning classification models to forecast the type of exit event for private companies, using some of the rare qualitative data available. Performance forecast is indeed the biggest challenge facing private company investors. Contrary to public companies where investors can have access to a plethora of information, information available on private companies is scarce, often inaccurate, and most of the time difficult to access. Therefore, any additional means to acquire a better insight into future performance of private companies is potentially very valuable, particularly considering the very high return on investment provided by early participation in successful ventures.
The analysis showed that standard classifiers (LR, RF, SVM) can provide such an insight, although performance indicators can vary across individual algorithms. A fused model based on these component models offers the advantage of balancing the performance of the individual component algorithms, providing more stable and equalized results. Using a “unanimity” version of the fused model provides further improvements, in particular at the level of positive precision and negative recalls, as shown in Table XI. This line of research offers numerous opportunities for further developments, many already underway in collaboration with financial firms active in the private equity market.
This research was supported by Eurostep Digital.