1 Introduction
In the last few decades, accounting fraud has been drawing a great deal of attention amongst researchers and practitioners, since it is becoming increasingly frequent and diverse. Accounting fraud is one of the most harmful financial crimes as it often results in massive corporate collapses, commonly silenced by powerful highstatus executives and managers (Mokhiber and Weissman, 2005). Given their hidden dynamic characteristics, ‘book cooking’ accounting practices are particularly hard to detect, hence the need of more sophisticated tools to assist the exposure of complex fraudulent schemes and the identification of warning signs of manipulated financial reports.
The catastrophic consequences of accounting fraud expose how vulnerable and unprotected the community is in regards to this matter, since most damage is inflicted to investors, employees and government. Several accounting scandals reflect this reality, being the Enron infamous case one of the most controversial. The giant energy company was engaged in a massive fraudulent scheme that culminated abruptly towards the end of 2001 with its impressive collapse and further bankruptcy. Consequently, Enron’s investors and stakeholders lost nearly $74 billion, and 4,500 employees lost their jobs and pensions without proper notice (Swartz, 2003). Even though the general opinion describes Enron’s failure as unpredictable, Schilit and Perler (2010) affirm that the disaster could have been avoided if a careful examination of the public documents during the preceding years of the debacle had been performed. The impressive revenue growth from $9.2 billion in 1995 to $100.8 billion in 2000 should have warned the public, especially when considering that profits did not increase at such spectacular rate. They conclude that the use of relevant indicators could be beneficial to further alert the public before a disaster occurs.
In the framework of this study, accounting fraud is defined as the calculated misrepresentation of the financial statement information that is publicly disclosed by companies. The intention is to mislead stakeholders regarding the firm’s true financial position, by overstating its expectations on assets, or understating exposure to liabilities; hence the artificial inflation of earnings, as well as its return on equity. Accounting fraud may take the form of either direct manipulation of financial items or via creative methods of accounting (Schilit and Perler, 2010). Several synonyms of accounting fraud exist in the literature, including the socalled financial statement fraud, corporate fraud and management fraud.
Perpetrators of accounting fraud can be motivated by personal benefit (e.g.: maximisation of compensation packages), or by explicit or implied contractual obligations, such as debt covenants and the need to meet market projections and expected economic growth. The most harm is inflicted to the longrun reputation of the organisation itself, the value destruction of investors and the diminishing of the public’s trust in the capital market (Ngai et al., 2011). Other victims often include suppliers, partners, customers, regulatory institutions, enforcement agencies, taxation authorities, the stock exchange, creditors and financial analysts (Pai et al., 2011).
Standard auditing procedures are often insufficient to identify fraudulent accounting reports since most managers recognise the limitations of audits, hence the need for additional dynamic and comprehensive analytical methods to detect accounting fraud accurately and in an early stage (Kaminski et al., 2004). Accordingly, the present study aims to improve the detection rate of accounting fraud offences through the implementation of several machine learning methods and assessment of industryspecific risk indicators, in order to assist the design of an innovative, flexible and responsive corporate regulation tool.
In order to achieve the proposed objective, a thorough forensic data analytic approach is implemented that includes all pertinent steps of a datadriven methodology. The study contributes in the improvement of accounting fraud detection in several ways, including the collection of a comprehensive sample of fraud and nonfraud firms concerning all financial industries, an extensive analysis of financial information and significant differences between genuine and fraudulent reporting, selection of relevant predictors of accounting fraud, contingent analytical modelling of the phenomenon to better recognise fraudulent cases, and identification of financial redflags as indicators of falsified records.
The rest of the article is organised as follows. A critical review of accounting fraud detection literature is performed in Section 2 as to summarise commonly used techniques and results achieved in previous studies. Section 3 presents a detailed description of the proposed methodology including the studied dataset, sample selection process, explanatory variables examined, variable selection process and machine learning models considered. Section 4 illustrates the empirical results of the proposed algorithms and further discussion of key findings. Finally, Section 5 concludes this paper and gives directions for future research.
2 Accounting Fraud Detection Literature
Part of the fraudulent financial reporting literature has focused primarily in the examination of qualitative characteristics related to the board of directors and principal executives, including information of corporate governance structure (Beasley, 1996; Hansen et al., 1996; Bell and Carcello, 2000) and insider trading data (Summers and Sweeney, 1998). Studies using this kind of information show promising results. However, getting access to such data is very difficult and sometimes even prohibited for most individuals.
On the other hand, studies using publicly available financial statement information are less common and usually incorporate small samples. Generally, the selection of fraud cases is limited to certain conditions and manually matched after with nonfraud observations on the basis of business fundamentals, such as industry, size, maturity, period and more. Undoubtedly, there is an interesting gap in this area of the literature where the selection process of a more representative sample has the potential to be explored and expanded.
With regard to the employed techniques, discriminant analysis and logistic regression are by far the most popular. Such algorithms are commonly considered as a benchmark framework due to their simplicity and low computational cost, and because they have been proven to efficiently detect falsified accounting reporting in relatively small samples
(Fanning and Cogger, 1998; Spathis et al., 2002; Kaminski et al., 2004; Pai et al., 2011). Better results have been achieved by the implementation of decision trees, a popular machine learning method often used to predict fraudulent accounting records mainly due to their fewer data preparation requirements and intuitive interpretation
(Kotsiantis et al., 2006; Kirkos et al., 2007; Pai et al., 2011; Gupta and Gill, 2012; Song et al., 2014).Alternative and more advanced approaches have also been adopted in order to detect accounting fraud. Neural networks are also in high demand for accounting fraud detection as they have shown promising results when predicting fraudulent reporting practices
(Kwon and Feroz, 1996; Choi and Green, 1997; Fanning and Cogger, 1998; Feroz et al., 2000; Ravisankar et al., 2011). A similar situation is experienced when considering more complex settings, such as support vector machines
(Kotsiantis et al., 2006; Ravisankar et al., 2011; Pai et al., 2011; Song et al., 2014)(Kirkos et al., 2007)(Hoogs et al., 2007) and hybrid methods (Kotsiantis et al., 2006; Song et al., 2014). Nevertheless, the achieved performance of the aforementioned methodologies is counteracted by the considerable drawbacks that these methods entail, including important computational costs and overfitting proneness, as well as struggling when interpreting results (Tu, 1996; Abe, 2005).A list of prior studies using machine learning techniques for accounting fraud detection is summarised in Table 1. Additional methodological details are also provided, such as the size of the chosen samples, number of fraud cases, methods employed and overall accuracy, when available.
Many contributions can be attributed to prior studies as all accounting fraud research enhance awareness and knowledge of this phenomenon. Furthermore, it can be said that forensic accounting strongly supports accounting fraud detection and promotes the design of relevant antifraud preventive measures.
However, a great deal of work can be further done to improve detection strategies in many ways. First, it can be observed that sample sizes of previous studies are fairly small and that, in general, samples are manually selected. The latter is a highly problematic practice as it is inherently biased and so results cannot be extrapolated to the population. Therefore, increasing the amount of data used to train, validate and test the models is a noticeable enhancement, as well as attempting to collect as many fraudulent cases as possible, and not only the most convenient for the sake of research results.
Moreover, most prior studies focus their analysis in specific industries defined by the Standard Industrial Classification (SIC) system. After careful review, it is surprisingly observed that there are no studies that investigate accounting fraud within financial services firms, situation that can be depicted in Table 2. The main reason for the exclusion of these entities is that they are structurally different and an alternative set of variables may be required since certain financial statement items, such as accounts receivable and inventory, are not available for these companies. Hence “research to find the variables most useful in the specific industries would be of great value”, especially in the poorly examined area of financial services (Fanning and Cogger, 1998). As such, a substantial improvement is achieved in the present study as cases from all industries are included.
Additional improvements in the area of accounting fraud detection can be attained when considering more relevant machine learning methods and performance evaluation metrics. As previously mentioned, complex techniques have been implemented in prior studies, most of them achieving superior performance compared to more basic methods, but the cost of this improvement is relatively high when taking into account the considerable drawbacks that these algorithms entail in terms of computational costs and interpretability. Also, most studies only focus on maximising overall accuracy without further consideration of more suitable assessment measurements.
Consequently, machine learning methods based on decision trees and boosting techniques are implemented in this paper, since their outcome can be very useful when detecting accounting fraud as straightforward classification rules can be extracted, and easily interpreted and replicated by auditors and regulatory agencies. Furthermore, alternative metrics that account for the difference between misclassification costs associated with fraud and nonfraud cases, are proposed to properly measure the predictive ability of the suggested models.
Study  Industry 

Persons (1995)  Manufacturing and services 
Kwon & Feroz (1996)  n/a 
Choi & Green (1997)  n/a 
Fanning & Cogger (1998)  Financial companies excluded 
Lee et al. (1999)  Financial companies excluded 
Feroz et al. (2000)  Banking companies excluded 
Spathis (2002)  Manufacturing firms 
Spathis et al. (2002)  Manufacturing firms 
Lin et al. (2003)  n/a 
Kaminsky (2004)  Banking and insurance firms excluded 
Kotsiantis et al. (2006)  Manufacturing firms 
Kirkos et al. (2007)  Manufacturing firms 
Hoogs et al. (2007)  Financial companies excluded 
Lenard et al. (2007)  Servicebased computer and technology firms 
Ravisankar et al. (2011)  n/a 
Pai et al. (2011)  n/a 
Gupta & Singh (2012)  n/a 
Danial et al. (2014)  Financial and insurance sectors excluded 
Song et al. (2014)  Financial companies excluded 
In brief, it can be said that although the proposed techniques of previous studies have increased the detection rate of accounting fraud offences, these are very limited and often not sufficient to uncover complex fraudulent schemes. It is fairly clear, then, the need for improved methodologies that assist the fraud detection task to further discover hidden patterns of falsified financial reports in order to expose them as soon as possible and, therefore, rapidly address recovery strategies and attenuate potential losses.
3 Methodology
3.1 Forensic Analytics
According to Van Vlasselaer et al. (2015), fraud offences are not crimes that happen fortuitously but are carefully planned, concealed and committed. Accounting fraud perpetrators are continuously conceiving new ways to commit their offences and, in consequence, always transforming their fraudulent behaviour, thus the complexity of the accounting fraud phenomenon. This deliberate managerial wrongdoing is particularly hard to detect and predict, since it involves deep knowledge of accounting and legal tricks that are intentionally employed to make documents look genuine and errorfree.
Forensic data analysis is concerned with the treatment and examination of financial crime offences, hence the relevance of its use to develop an adequate technique for accounting fraud detection. Therefore, a forensic accounting approach is proposed in order to overcome potential auditing failure and further improve examination of public documents through the recommendation of meaningful analysis of accounting items.
3.2 Data
The data collection task is critical in financial crimerelated research, since it is very difficult to find sufficient and accurate data for analysis. In addition, and given the highly sensitive nature of the topic, there is a limited amount of relevant journal articles related to accounting fraud detection, and publication of controversial results may be censored or even prohibited (Bolton and Hand, 2002). Therefore, a compilation of an exhaustive and representative database containing relevant cases of accounting fraud instances is imperative to further design an adequate and integral frauddetection method.
In this study, accounting fraud cases are identified considering all Accounting Series Releases (ASR) and Accounting and Auditing Enforcement Releases (AAER) issued by the U.S. Securities and Exchange Commission (SEC) between 1990 and 2012. In particular, all public litigation releases involving deceptive reporting were handcollected first from the SEC’s website^{1}^{1}1SEC Sanctions Database: https://www.secwhistlebloweradvocate.com/program/secenforcement/sanctionsdatabase/ and then crossvalidated with an official accounting fraud database provided by the Securities and Class Action Clearinghouse (SCAC), Stanford Law School. Nonpublic companies were excluded from this study, since the SEC only has jurisdiction over publicly traded companies.
The selection of the studied period is justified based on data availability and practicality considerations. On the one hand, discovered fraud cases published by the SEC include successful enforcement actions with monetary sanctions exceeding $1 million announced between July 29, 2002 and present. Accounting fraud cases released by the SEC date from 1990 onwards, hence the selection of the year 1990 as the beginning of the studied period. On the other hand, this study began in the middle of 2013, so including this year would have been erroneous considering that many cases of fraud could have been discovered in the remainder of the year. As such, 2012 is selected as the end year of the studied period.
The resulting fraud database consists of 1,594 fraudyear observations identified by company I.D. and fiscal year of the offence. Table 3 summarises the number of fraudulent observations obtained after splitting fraud cases into the corresponding years of occurrence, particularly arranged by industry.
SIC  Standard Industrial  Fraud  Perc 

Codes  Classification (SIC)  Cases  (%) 
0100  0999  Agriculture, Forestry and Fishing  11  0.69 
1000  1799  Mining and Construction  52  3.26 
2000  3999  Manufacturing  609  38.21 
4000  4999  Transportation, Communications, Electric and Gas  106  6.65 
5000  5999  Wholesale Trade and Retail Trade  169  10.60 
6000  6799  Finance, Insurance and Real Estate  236  14.81 
7000  8999  Services  375  23.53 
9100  9729  Public Administration  36  2.26 
1,594  100 
3.3 Sample Selection
One of the main characteristics that defines the fraud phenomenon so uniquely is that it is an uncommon activity (Van Vlasselaer et al., 2015)
, particularly in the context of accounting fraud, since only a minority of the recorded cases are actually classified as fraudulent. Learning from these rare events is a very challenging task given the small amount of observations available to train predictive models, hence especially difficult to further discriminate between fraudulent and nonfraudulent instances. As
Cerullo and Cerullo (1999)express in regards to this matter, “unrepresentative sample data or too few data observations will result in a model that poorly estimates or predicts future values”.
The classimbalance problem fully emerges when statistical learning models are applied, because they all opt for a naive strategy of classifying all firms as nonfraudulent. As a consequence, accuracy measures show excellent average performance that only reflect the underlying uneven class distribution. Nevertheless, the methods are totally ineffective in detecting positive cases (Chawla et al., 2004). Therefore, the selection of a more proportionate sample in terms of positive and negative cases is required in order to solve the imbalance problem encountered in this study, and also to enhance the discriminatory power of the proposed statistical models.
A stratifying exercise is conducted according to the target variable Fraud, where a pairing exercise is performed to match each fraud observations with a nonfraud observation on the basis of industry and fiscal period. Consequently, the sample selection process occurs in two phases, first dividing the dataset by SIC industry and fiscal year, and then randomly selecting nonfraud instances from each subgroup.
A variety of sampling methods can be employed when dealing with imbalanced datasets, individually or in combination, hence an extensive and interesting analysis could be done to select suitable samples of fraud and nonfraud cases. A more detailed discussion about this topic is addressed in Section 5.2.
3.4 Variables
A great deal of research studies includes subjective judgment and/or qualitative and nonpublic information into their models, that are only available to auditors and insiders of the sampled firms. Accounting data, on the other hand, is publicly available for external interested parties, hence whether it can be used to detect falsified reporting is an intriguing question (Persons, 1995).
The literature suggests that financial statement information is useful for accounting fraud detection. In particular, it can be seen that ratio analysis is very popular for this end suggesting that a careful reading of financial ratios can reasonably expose symptoms of fraudulent behaviour. As such, ratios are calculated to quantify the relation between two financial items and to subsequently define acceptable legitimate values. Therefore, if a fraudulent activity is taking place, financial ratios associated with manipulated accounts will deviate from the normal behaviour and conveniently exhibit signs of accounting fraud.
There has been an interesting debate about which features should be used for detecting falsified reports, but still no agreement on which ones are best for this end. An indepth analysis of the most severe accounting scandals occurred in the U.S. in the last few decades (Schilit and Perler, 2010) shows that the most frequent tricks managers employ in order to hide debilitated businesses are commonly associated with the manipulation of earnings and cash flow items.
In this manner, and considering relevant and significant variables resulting from prior research work on the topic, this study identifies 20 financial statement ratios that measure the majority of aspects of a firm’s financial performance, including leverage, profitability, liquidity and efficiency.
Leverage
One of the most important aspects of a firm is leverage, since it represents the potential return of an investment based on the debt structure of the company. When debt is used to purchase assets, then the value of assets exceeds the borrowing cost, basically because debt interest is tax deductible.
However, this practice comes with greater risks for investors, considering that sometimes firms are not able to pay their debt obligations. In consequence, companies having trouble paying their debts may be tempted to manipulate financial statements in order to meet debt covenants. Therefore, high levels of debt should increase the likelihood of accounting fraud, since it transfers the risk from the firm and its managers to shareholders.
This aspect is measured by the ratios of TLTA (total liabilities to total assets), TLTE (total liabilities to total equity) and LTDTA (longterm debt to total assets).
Profitability
Profitability measures are used to estimate the ability of a firm to generate earnings compared to its costs, hence the importance of maintaining these metrics in line with market projections. As consequence, executives may be willing to manipulate earningsrelated financial statements in order to cover profitability problems when companies are not performing as expected.
To test whether firms with poorer financial condition are more likely to engage in fraudulent financial reporting, relevant ratios associated with income, expenses and retained earnings will be considered. These ratios are: NITA (net income to total assets), RETA (retained earnings to total assets) and EBITTA (earnings before interest and tax to total assets).
Liquidity
Liquidity refers to the ability to which an asset can be converted from an investment to cash. This concept is highly important for businesses and investors, since liquid assets reduce in some extent investing risks by ensuring the capacity of a firm to pay off debts as they come due. Consequently, problems involving liquidity may provide an incentive for managers to commit accounting fraud, hence the need to investigate financial ratios related to the liquid composition of assets, as is the case of working capital and current assets. This aspect is evaluated then by the following ratios: WCTA (working capital to total assets), CATA (current assets to total assets), CACL (current assets to current liabilities) and CHNI (cash to net income).
Many investors have alternatively focused their attention on the company’s capability to generate cash from its actual business operations. This aspect however, is usually manipulated since “companies can exert a great deal of discretion when presenting cash flows” (Schilit and Perler, 2010). Ergo, the importance of thoroughly analyse cash flow from operations and, in particular, evaluate its relationship with reported earnings. Therefore, the CFFONI ratio (cash flow from operations to net income) is further considered.
Efficiency
Financial efficiency refers to the capacity of producing as much as possible using as few resources as possible. Inefficiency usually involves higher costs, hence resulting in poorer firm’s performance, which may motivate managers to misstate financial statements that allow subjective estimations, and therefore, are easier to manipulate. Such is the case of accounts receivable, accounts payable, inventory and cost of good sold, so financial ratios related to these accounts are further selected. This aspect is evaluated by ratios involving the aforementioned items, including RVSA (accounts receivable to total sales), RVTA (accounts receivable to total assets), IVTA (inventory to total assets), IVSA (inventory to total sales), IVCA (inventory to current assets), IVCOGS (inventory to cost of good sold) and PYCOGS (accounts payable to cost of good sold).
Efficiency it also linked to capital turnover, which represents the sales generating power of a firm’s assets. In order to maintain the appearance of consistent growth, fraudulent managers may be tempted to manipulate salerelated financial items when dealing with competitive situations. Accordingly, two saleratios are considered in order to identify possible fictitious trend in growth, including SATA (total sales to total assets) and SATE (total sales to total equity).
A summary of the aforementioned financial ratios is presented in Table 4, along with the category to which they belong to and their respective calculations.
Category  Financial Ratio  Calculation 

TLTA  Total Liabilities / Total Assets  
Leverage  TLTE  Total Liabilities / Total Equity 
LTDTA  LongTerm Debt / Total Assets  
NITA  Net Income / Total Assets  
Profitability  RETA  Retained Earnings / Total Assets 
EBITTA  Earning Before Interest and Tax / Total Assets  
WCTA  Working Capital / Total Assets  
CATA  Current Assets / Total Assets  
Liquidity  CACL  Current Assets / Current Liabilities 
CHNI  Cash / Net Income  
CFFONI  Cash Flow From Operations / Net Income  
RVSA  Accounts Receivable / Total Sales  
RVTA  Accounts Receivable / Total Assets  
IVSA  Inventory / Total Sales  
Efficiency  IVTA  Inventory / Total Assets 
IVCA  Inventory / Current Assets  
IVCOGS  Inventory / Cost of Good Sold  
PYCOGS  Accounts Payable / Cost of Good Sold  
SATA  Total Sales / Total Assets  
SATE  Total Sales / Total Equity 
3.5 Variable Selection
Most analytical models implemented to detect fraudulent financial reporting start with numerous variables, out of which only a minority actually contribute to their classification power (Baesens et al., 2015). Thereby, a question of interest to the public is whether fewer explanatory variables can be used in order to achieve similar accuracy rates as those accomplished when using more predictors.
A simple yet very informative univariate analysis is performed in this study in order to evaluate potential differences between financial accounts related to fraudulent and genuine reports, and to further select significant financial ratios that may be suggesting that accounting fraud has been or is being committed.
The socalled MannWhitney test is a nonparametric method that is commonly employed for this end due to its ease of use and availability in several advanced statistical software. In simple terms, nonparametric methods refer to statistical techniques that do not make assumptions on the data distribution, hence the reason they are also called distributionfree tests (Hollander et al., 2013)
. These models are particularly useful when there are clear outliers or extreme observations in the data, as is the case of the studied database.
What it sought with this nonparametric hypothesis testing technique is to test if the distribution of fraud data differs significantly compared to nonfraud data. Therefore, the MannWhitney test is performed using the rank of the data, that is, the position of each observation within the sample rather than the value per se. In light of this, then it is easy to notice that outliers will have a minimal effect on the test, which makes it very robust in terms of extreme values (Sheskin, 2003).
The following hypotheses are specified for the MannWhitney test:
(1)  
If pvalue
is lower than the 0.05 significance level considered in this paper, then the null hypothesis
can be rejected, so the evidence favours the alternative, . Therefore, it can be said that there is a significant difference between the nonfraudulent firms and fraudulent firms with regard to the financial ratio of interest.It is worth mentioning that when the suggested tests were conducted considering all observations regardless of the SIC industry they belong to, almost all variables revealed to be significant, which does not contribute to the analysis substantially. Moreover, assuming fraudsters behave the same across all sectors is fairly naive, so a more elaborated domainspecific examination is reasonably required.
Consequently, twenty MannWhitney tests are performed per industry, one per selected financial ratio. Table 5 lists industryspecific significant predictors and the relationship with the dependent variable Fraud. Interesting differences between sectors emerge from the performed analysis as some ratios are significant or not depending on the industry the observations belongs to.
On one hand, inventory and retained earnings are relevant predictors in the industries of transportation, communication, electric gas and sanitary service, wholesale trade and retail trade, and services. This may be due to the fact that inventory volumes and retained earnings are easily falsified within the aforementioned sectors. On the other hand, manufacturing companies may be tempted to modify items related to liabilities as well as current assets, while finance, insurance and real estate firms manipulate liabilities and cash flow from operation figures.
3.6 Correlation Analysis
A very popular technique, often applied in data analytics, is correlation analysis. This method is used to evaluate possible relationships between numerical variables, which is particularly useful when working with accounting items that inevitably interact with each other due to the composition of a financial statement report.
The correlation coefficient quantifies the direction and strength of the implicit relationship of two variables of interest, and only expresses the association between them, not the causality. Nonetheless, if correlation is found between two variables, then it can be used as an indicator of a potential casual relation.
Kendall correlation coefficients will be used to assess monotonic relationships, that could be linear or not, basedon rank similarity (Kendall, 1955). Monotonic relationships occur when one variable increases as well as the other variable, or when one variable increases and the other one decreases. The increase/decrease of the analysed variables could happen at the same rate, which is the case of linear relations, or in a dissimilar proportion, which is the case of nonlinear associations.
The Kendall correlation is a nonparametric correlation metric, that is, it makes no assumptions on the distribution of the data. It is said to be a measure of rank correlation in the sense that it calculates the relative position of all observations within one variable (rank position), and then compares them with the ranks obtained within the second variable. If observations from both variables have a similar rank (concordant observations), then a high positive correlation will be obtained. Conversely, if ranks are dissimilar (discordant observations), then negative correlations are expected.
Kendall coefficients always range between and , and they can be calculated using the TauA statistic defined as follows:
(2) 
where is the number of concordant pair of observations, is the number of discordant pair of observations, and is the sample size.
The resulting Kendall correlation matrix is presented below (Figure 1), summarising the correlation coefficients between all financial ratios. A friendly coloured legend is utilised to facilitate visualisation, where intense red boxes indicate positive relationships and intense blue boxes indicate negative associations.
Ratio*  Agriculture,  Mining and  Manufacturing  Transportation,  Wholesale Trade  Finance,  Services  Public 

Forestry  Construction  Communications, Electric,  and Retail Trade  Insurance and  Administration  
and Fishing  Gas and Sanitary Service  Real Estate  
TLTA  +    +  
TLTE  +  +  +  
LTDTA  +  +  +  
NITA    
RETA  +    +  +  +  +  +  
EBITTA  +  +  +  +  +  
WCTA  
CATA    +    
CACL            
CHNI    
CFFONI    
RVSA          
RVTA  +  
IVSA      +  
IVTA  +    +    +  
IVCA  +  +    +    +  
IVCOGS  +  +    +  
PYCOGS            
SATA  +          
SATE  +  
Notes:  
+ represents a positive association with the target variable, Fraud  
 represents a negative association with the target variable, Fraud  
* Twotailed test at the 0.05 significance level 
In addition, a summary of most relevant correlations is shown in Table 6.
Financial Ratios  Correlation Coefficient  

IVSA  IVCOGS  0.8693 
IVTA  IVCA  0.8167 
WCTA  CACL  0.7732 
IVSA  IVTA  0.7485 
NITA  RETA  0.7275 
NITA  EBITTA  0.7275 
TLTA  TLTE  0.7145 
IVSA  IVCA  0.7039 
IVCA  IVCOGS  0.6523 
WCTA  CATA  0.5684 
SATA  SATE  0.5504 
It can be clearly seen that all inventoryrelated ratios are strongly positive correlated: IVSA, IVTA, IVCA and IVCOGS. Although this situation is completely expected, it entails an important issue when implementing regression models. If two or more variables are highly correlated then multicollinearity emerges, which means some predictors are redundant. As such, the estimated coefficients of the regression model may be inaccurate, and therefore, not very reliable.
Furthermore, a strong positive association has also been found between CACL and WCTA. This is not surprising considering that WC is actually the subtraction of CA and CL, hence a direct relation between these three financial items results from mathematical construction. In addition, and as expected, strong positive correlations between ratios related to profitability have been exposed, which includes both NITA and RETA, as well as NITA and EBITTA. A moderate positive relation between TLTA and TLTE can also be observed, which is completely expected since total assets are calculated as the sum of total liabilities and total equity. Finally, moderate positive correlations have been also exposed between the ratios WCTA and CATA, as well as between SATA and SATE. This latter association makes perfect sense as both ratios are related to sales figures.
Results obtained from the detailed financial ratio analysis performed, which includes nonparametric hypothesis testing and correlation analysis, support the selection of a smaller and meaningful subset of industryspecific explanatory variables. As such, Table 7 provided below lists selected financial ratios by SIC industry that will be further utilised for modelling purposes.
Industry  No. of Ratios  Selected Ratios 

Agriculture, Forestry and Fishing  4  RETA, CATA 
IVSA, PYCOGS  
TLTA, TLTE  
LTDTA, RETA  
Mining and Construction  10  CACL, RVSA 
IVTA, IVCOGS  
PYCOGS, SATA  
TLTA, TLTE  
Manufacturing  6  RETA, CATA 
CACL, RVSA  
Transportation, Communications, Electric,  5  RETA, IVSA 
Gas and Sanitary Service  IVTA, SATA  
PYCOGS  
RETA  
Wholesale Trade and Retail Trade  3  CATA 
IVSA  
TLTA, TLTE  
Finance, Insurance and Real Estate  8  LTDTA, RETA 
CFFONI, IVCOGS  
PYCOGS, SATA  
RETA, CACL  
Services  6  IVSA, IVCOGS 
PYCOGS, SATA  
LTDTA, RETA  
Public Administration  8  CATA, CACL 
IVSA, IVTA  
IVCOGS, SATA 
3.7 Machine Learning Methods
The binary outcome model is considered to be the foundational scheme for detecting accounting fraud since the aim is to classify future observations into only two possible values: fraud or nonfraud.
Accordingly, this study assesses the effectiveness of several machine learning models in the identification of fraudulent reporting. First, discriminant analysis and logistic regression are employed as benchmark framework, followed by the implementation of more advanced but easytointerpret algorithms, including AdaBoost, decision trees, boosted trees and random forests.
The motivation for using boosting techniques and treebased methods is supported in part by the poor detection accuracy of basic models and in part by the excessive complexity of more sophisticated approaches, such as neural networks and support vector machines.
In order to achieve a consistent notation throughout the Section, the following conventions are used for mathematical equations:

A superscript denotes the transpose of a matrix or vector.

: fraudulent observation.

: nonfraudulent observation.

: posterior probability of fraud.

: posterior probability of nonfraud.
It is worth noting that given there are only two possible outcomes, then it holds that:
(3) 
The models were employed as implemented in the ScikitLearn library (Pedregosa et al., 2011) and an exhaustive explanation of each algorithm is given in what follows.
3.7.1 Discriminant Analysis (DA)
Discriminant analysis is a supervised method used in statistics to address classification problems and to make predictions of a categorical dependent variable. The main idea is to classify an observation into one of the predefined classes using a combination of one or more continuous independent variables in order to generate a discriminant function which best differentiate between the groups.
Subsequently, a decision boundary is generated by fitting class conditional densities to the data using Bayes’ rule:
(4) 
The appropriate class is selected which maximises these conditional probabilities. In the case of accounting fraud, only two classes are of interest; therefore:
(5) 
(6) 
The optimisation task is ultimately achieved using the training data to estimate class priors, both and , class means and the covariance matrices. In particular, class priors are estimated as the proportion of instances in each class, that is, number of fraudulent (or nonfraudulent) observation divided by the total number of observations. Class means are estimated using the empirical sample class means. Similarly, covariance matrices are estimated using the empirical sample class covariance matrices.
In accordance with the aforementioned, the following assumptions are made:

Predictors are all statistically independent.

follows a multivariate Gaussian distribution, with a classspecific mean and covariance matrix.
Different assumptions associated with the covariance matrix will lead to different decision boundaries, one defined by a linear combination of the predictors and another one by a quadratic form.
In both cases, however, the predicted class will be determined using a classification threshold of . As such, if the estimated probability of fraud occurrence () is equal or higher than , then the observation will be classified as fraudulent. On the contrary, if is lower than , or equivalently , then the observation will be classified as nonfraudulent.
Linear Discriminant Analysis (LDA)
In the particular case of linear discriminant analysis, a multivariate normal distribution of the predictors is presumed with a distinct mean for each class and a covariance matrix that is common to all classes. For accounting fraud detection, this means that both fraud and nonfraud classes share the same covariance matrix
.The advantage of a common covariance matrix is that it simplifies the problem by reducing the computational cost of estimating a large number of parameters when the number of predictors is relatively large. Taking this into consideration, then it is true that:
(7) 
(8) 
Quadratic Discriminant Analysis (QDA)
Furthermore, quadratic discriminant analysis provides a similar approach yet now it is assumed that the covariance matrix is classspecific, i.e.: for the th class. Therefore:
(9) 
(10) 
3.7.2 Logistic Regression (LR)
Similar to discriminant analysis, logistic regression is commonly use for performing binary classification. This time the goal is to fit a regression model that estimate the accounting fraud likelihood applying a logistic function that is linear in its argument:
(11) 
In order to obtain the best classification possible, the posterior probability of belonging to one of both categories is calculated by maximising the likelihood function. Likewise, let be the posterior probability of fraud (Bishop, 2006), then:
(12) 
For a dataset {}, where {0,1} and , the likelihood of any specific outcome is given by:
(13) 
where and .
As mentioned before, the maximum likelihood estimates of are obtain by minimising the crossentropy error function defined by the negative logarithm of the likelihood and then taking its gradient with respect to :
(14) 
(15) 
To finally decide if an observation is classified as fraudulent or nonfraudulent, then a threshold of will be considered. Consequently, if is estimated to be equal or greater than , then the observation will be classified as fraudulent. Otherwise, it will be classified as nonfraudulent.
3.7.3 AdaBoost (AB)
Adaptive boosting, widely known as AdaBoost, is a machine learning technique used for classification and regression problems that combines multiple ’weak learner’ classifiers in order to produce a better boosted classifier. In this context, a weak learner is a function that is only weakly correlated with the response.
The basic idea is to weight observations by how easy or difficult they are to categorise, giving more importance to those that are harder to predict in order to learn from them and further construct better subsequent classifiers. Accordingly, each individual classifier generates an output , , for every observation of the training set. Then, these classifiers are trained on a weighted form using as classifier coefficients. As mentioned before, misclassified instances will be given greater weight when used to train the subsequent classifier (Bishop, 2006).
The goal is to minimise a weighted error function in every iteration taking into account the information and performance of previous classifiers. Ultimately and after the last iteration , a final boost classifier is constructed as an additive combination of all trained weak learner classifiers :
(16) 
In this case, a classification threshold of has been adopted. As such, an observation will be classified as fraudulent when is equal or greater than , and classified as nonfraudulent when is lower than .
The AdaBoost pseudo code^{2}^{2}2Scharth, M. (2017). Statistical Learning and Data Mining, Module 15 [PowerPoint presentation]. Discipline of Business Analytics, The University of Sydney Business School, QBUS6810. is shown in Algorithm 1.
3.7.4 Decision Trees (DT)
Decision trees are a nonparametric supervised learning method that classify observations based on the values of one or more predictors. The advantage of decision trees lies in the straightforward extraction of ifthen classification rules easily replicable by auditors and regulatory authorities. Also, no assumptions on the structure of the data is needed, which is very convenient in this case considering the asymmetrical distribution of some explanatory variables.
The structure of a DT consists of nodes representing a test on a particular attribute and branches representing an outcome of the test. The idea is to divide observations into mutually exclusive classes in order to build the smallest set of rules that is consistent with the training data. To identify the attribute that best separates the sample, information gain and entropy reduction are used as estimation criteria.
There are several tree algorithms, such as ID3, C4.5, C5.0 and CART, among others. The chosen method used in this study is the Classification and Regression Trees (CART) characterised by the construction of binary trees basedon feature and threshold selection that provide the largest information gain in each node. This algorithm recursively partitions the space in order to minimise the error or impurity of each node, resulting in terminal nodes that represent homogeneous groups that differ substantially from the others.
Accordingly, let the information at node be , then the binary partition of the data is defined by a candidate split that divides the space into two subsets: and .
The error at node is calculated using an impurity function evaluated in both partitions, that later is minimised in order to estimate the parameters.
(17) 
(18) 
The impurity function implemented in this study corresponds to the Gini function:
(19) 
where is the proportion of class observations in node .
It is worth noting that the partitions of the predictor space are based on a greedy algorithm called recursive binary splitting. The technique is greedy because at the best split is made at each step of the treebuilding process without taking into account the consequences further down the tree. Consequently, in some cases very complex trees are generated as result of this approach. However, a couple of mechanisms can be used in order to avoid this situation, such as setting the minimum number of required observations at a leaf node or setting the maximum depth of the tree.
The tree size is therefore a tuning parameter determining the complexity of the model and it should be selected adaptively from the data. As such, the maximum number of node splits in the current study is settled as 5, optimal valued obtained by cross validation.
Decision trees are remarkably superior than the first two methods used as benchmark  logistic regression and discriminant analysis  considering how easy they are to explain, implement and visualise. Unfortunately, they show some drawbacks that should be mentioned, such as their inherent instability that emerges when little changes in the data cause a large change in the structure of the estimated tree, as well as the lower predictive accuracy when compared to more advanced techniques.
Decision trees can be used as the basic component of powerful prediction methods. Therefore, two additional models that employ decision trees as their foundation, will be introduce in what follows.
3.7.5 Boosted Trees (BT)
Similar to AdaBoost, the boosted trees method is an ensemble of weak learners but now in the explicit form of fixed size decision trees as base classifiers.
Accordingly, an iterative process takes place in order to fit a decision tree output in every iteration to improve the previous model by constructing a new model that adds this new information:
(20) 
The main idea is to minimise an error function defined by the difference between the old model and the new one , what is called the residual
, through a gradient boosting algorithm that is much like the gradient descent method used in the logistic regression approach.
In this case, a classification threshold of has also been adopted. In this regard, an observation will be classified as fraudulent when is equal or greater than , and classified as nonfraudulent when is lower than .
Same as in the decision tree methodology and for consistency, the maximum depth of the fitted trees is established to be 5.
3.7.6 Random Forests (RF)
A further enhancement of boosted trees is provided by the random forests approach, one of the most popular bagging techniques. Bootstrap aggregation, or bagging, averages many noisy but approximately unbiased models, which results in a reduction of the variance.
The idea is to fit a classification model to the training data to obtain the prediction . Bagging averages this prediction over a collection of bootstrap samples^{3}^{3}3In statistics, bootstrapping is any test or metric that relies on random sampling with replacement.. For each bootstrap sample , , the selected classification model is fitted to obtain a prediction . The bagged classifier selects the class (fraud or nonfraud) with the most “votes” from the classifiers:
(21) 
Decision trees are ideal candidates for bagging as they capture complex interactions structures in the data, which leads to relatively low biased but high variance. Consequently, classification trees are adopted next for bagging to further construct random forests.
Random forests improve over bagging by adding an adjustment that helps decorrelate the trees. In this context, instead of using all predictors, random forests only select a random subset of the features as split candidates in each step. The rationale behind this methodology is that when establishing a fewer and fixed number of predictors, then more variation in the structure of the model is allowed, which diminishes the correlation between the resulting trees. Interestingly, this new condition makes the average of the fitted trees less variable and therefore more reliable (James et al., 2013).
In building a random forest, independent variables out of all possible predictors are randomly selected at each node, and later the best split on the considered variables is found. As a last step, all trees are averaged to obtain a final prediction.
The random forests pseudo code^{4}^{4}4Scharth, M. (2017). Statistical Learning and Data Mining, Module 13 [PowerPoint presentation]. Discipline of Business Analytics, The University of Sydney Business School, QBUS6810. is shown in Algorithm 2.
In order to be consistent with the previous methodologies, the maximum depth of the estimated trees is established to be 5.
3.8 Models Assessment
An interesting issue related to fraudulent reporting is the difference of misclassification costs. As mentioned previously, most studies only seek to maximise overall accuracy without further analysing more suitable assessment measurements.
The cost of misclassification differs when dealing with accounting fraud, since a false negative error, which is when a fraud observation is classified as nonfraud, is usually considered more expensive that a false positive error, which is when a nonfraud observation is classified as fraud. The reasoning behind this is that a misclassification of a nonfraud firm may cause an important misuse of resources and time, but a misclassification of a fraudulent company may result in incorrect decisions and economic damage.
Accordingly, the overall accuracy rate is no longer sufficient to assess model performance. Other metrics, such as specificity, sensitivity and precision, are now taken into consideration, as well as Gmeasure, Fmeasure and AUC, that are calculated using combinations of these metrics. All mentioned indicators are based on the confusion matrix shown in Table
8.Predicted Positives  Predicted Negatives  

Real Positives  TP  FN 
Real Negatives  FP  TN 
Model assessment metrics are described next, including the formula used to calculate them when appropriate.

Overall Accuracy: it measures the ability to differentiate both fraudulent and genuine observations correctly. It is calculated as the proportion of true positive and true negative cases compared to the total number of observations.
(22) 
Specificity: it evaluates the ability to determine nonfraudulent cases correctly. As such, it is computed as the proportion of true negative compared to all legitimate negative observations.
(23) 
Sensitivity: it assesses the capacity to classify fraudulent cases correctly. It is then calculated as the proportion of true positive cases compared to all legitimate positive observations.
(24) 
Precision: it measures the proportion of true positive cases compared to all predicted positive observations.
(25) 
GMean
: is the geometric mean of sensitivity and specificity measures. As such, it takes into account the ability of correctly classifying both fraudulent and nonfraudulent observations.
(26) 
FMeasure: is a metric that integrates both measures of precision and sensitivity
(27) 
AUC: The Area Under the Curve (AUC) is a point estimate of the Receiver Operating Characteristic (ROC) curve, which evaluates the diagnostic ability of a binary classifier model as a function of varying a decision threshold. As such, it assesses both true positive and false positive rates considering different threshold settings. The AUC is the probability that the binary classifier will rank a randomly chosen positive instance higher than a randomly chosen negative one. As such, AUC is always a positive number range between 0 and 1, so the closer to the unit, the better is the model as it means it is correctly separating instances into the nonfraud and fraud groups. The AUC is computed using the trapezoidal rule, which is a commonly used technique for approximating a definite integral.
Regulatory authorities face critical limitations in terms of human resources, budget support and time constrains, thus a detailed investigation of all records and companies is infeasible or too expensive to undertake. Investigations should concentrate on those firms that are more likely to perpetrate accounting fraud. Therefore, it is preferable to focus on models that correctly classify fraudulent observations rather than nonfraudulent cases.
For this reason, GMean, FMeasure and AUC will be used as model assessment criteria, since they properly capture both false positive and false negative errors, and mitigate the misclassification issue inherent when detecting accounting fraud offences.
It is worth mentioning, before further presentation and discussion of results, that all classification accuracy metrics are calculated using outofsample data, that is, considering all the data points not belonging to the training sample. Furthermore, the considered model will learn the parameters of a prediction function from a subset of the available data and further tested in a different scenario in order to generalise the results. A standard practice in statistics is to hold out part of the dataset, commonly called testing set, and use it later to assess the performance of the model.
Therefore, a stratified 10fold crossvalidation approach is implemented before running the proposed variable selection technique. As such, the studied dataset is divided in 10 folds, each one containing an equal number of fraud and nonfraud cases. For each fold, the model is trained by using the remaining nine folds and then validated by using the hold out fold. At last, model performance is calculated as the average performance of all testing folds (Kirkos et al., 2007).
4 Results and Discussions
Table 9 reports the results of the proposed models by SIC industry.
Accuracy  Specificity  Sensitivity  Precision  GMean  FMeasure  AUC  
Agriculture, Forestry and Fishing (n = 22, p = 4)  
LDA  0.714  0.500  1.000  0.600  0.707  0.750  0.750 
QDA  0.857  0.750  1.000  0.750  0.866  0.857  0.875 
LR  0.714  0.500  1.000  0.600  0.707  0.750  0.750 
AB  0.857  0.750  1.000  0.750  0.866  0.857  0.875 
DT  0.571  0.750  0.333  0.500  0.500  0.400  0.542 
BT  0.571  0.750  0.333  0.500  0.500  0.400  0.542 
RF  0.714  0.500  1.000  0.600  0.707  0.750  0.750 
Mining and Construction (n = 104, p = 10)  
LDA  0.656  0.917  0.500  0.909  0.677  0.645  0.708 
QDA  0.812  0.917  0.750  0.938  0.829  0.833  0.833 
LR  0.688  0.917  0.550  0.917  0.710  0.687  0.733 
AB  0.625  0.667  0.600  0.750  0.632  0.667  0.633 
DT  0.812  0.833  0.800  0.889  0.816  0.842  0.817 
BT  0.750  0.833  0.700  0.875  0.764  0.778  0.767 
RF  0.781  1.000  0.650  1.000  0.806  0.788  0.825 
Manufacturing (n = 1,218, p = 6)  
LDA  0.530  0.460  0.594  0.548  0.522  0.570  0.527 
QDA  0.546  0.109  0.943  0.539  0.321  0.686  0.526 
LR  0.530  0.425  0.625  0.545  0.516  0.583  0.525 
AB  0.585  0.557  0.609  0.603  0.583  0.606  0.583 
DT  0.555  0.259  0.823  0.551  0.461  0.660  0.541 
BT  0.574  0.621  0.531  0.607  0.574  0.567  0.576 
RF  0.503  0.460  0.542  0.525  0.499  0.533  0.501 
Transportation, Communications, Electric, Gas and Sanitary Service  
(n = 212, p = 5)  
LDA  0.562  0.625  0.500  0.571  0.559  0.533  0.562 
QDA  0.562  0.969  0.156  0.833  0.389  0.263  0.562 
LR  0.578  0.594  0.562  0.581  0.578  0.571  0.578 
AB  0.609  0.719  0.500  0.640  0.599  0.561  0.609 
DT  0.531  0.625  0.438  0.538  0.523  0.483  0.531 
BT  0.672  0.719  0.625  0.690  0.670  0.656  0.672 
RF  0.656  0.562  0.750  0.632  0.650  0.686  0.656 
Wholesale and Retail Trade (n = 338, p = 3)  
LDA  0.559  0.521  0.593  0.582  0.556  0.587  0.557 
QDA  0.500  0.042  0.907  0.516  0.194  0.658  0.475 
LR  0.549  0.521  0.574  0.574  0.547  0.574  0.547 
AB  0.608  0.542  0.667  0.621  0.601  0.643  0.604 
DT  0.637  0.479  0.778  0.627  0.610  0.694  0.628 
BT  0.745  0.771  0.722  0.780  0.746  0.750  0.747 
RF  0.637  0.625  0.648  0.660  0.636  0.654  0.637 
Accuracy  Specificity  Sensitivity  Precision  GMean  FMeasure  AUC  

Finance, Insurance and Real Estate (n = 472, p = 8)  
LDA  0.570  0.621  0.526  0.615  0.572  0.567  0.574 
QDA  0.592  0.273  0.868  0.579  0.487  0.695  0.571 
LR  0.570  0.591  0.553  0.609  0.571  0.579  0.572 
AB  0.648  0.621  0.671  0.671  0.646  0.671  0.646 
DT  0.627  0.561  0.684  0.642  0.619  0.662  0.622 
BT  0.655  0.682  0.632  0.696  0.656  0.662  0.657 
RF  0.627  0.652  0.605  0.667  0.628  0.634  0.628 
Services (n = 750, p = 6)  
LDA  0.587  0.468  0.698  0.583  0.572  0.635  0.583 
QDA  0.587  0.229  0.922  0.560  0.460  0.697  0.576 
LR  0.587  0.495  0.672  0.586  0.577  0.627  0.584 
AB  0.627  0.550  0.698  0.623  0.620  0.659  0.624 
DT  0.631  0.615  0.647  0.641  0.630  0.644  0.631 
BT  0.631  0.550  0.707  0.626  0.624  0.664  0.629 
RF  0.618  0.477  0.750  0.604  0.598  0.669  0.614 
Public Administration (n = 72, p = 8)  
LDA  0.636  0.400  0.833  0.625  0.577  0.714  0.617 
QDA  0.818  0.900  0.750  0.900  0.822  0.818  0.825 
LR  0.727  0.600  0.833  0.714  0.707  0.769  0.717 
AB  0.727  0.700  0.750  0.750  0.725  0.750  0.725 
DT  0.773  0.700  0.833  0.769  0.764  0.800  0.767 
BT  0.773  0.700  0.833  0.769  0.764  0.800  0.767 
RF  0.864  0.900  0.833  0.909  0.866  0.870  0.867 
It can be seen that results are dissimilar across different industries and machine learning techniques. Best performance of the proposed models is obtained for firms belonging to the Agriculture, Forestry and Fishing industry, Mining and Construction, and Public Administration. Moderate predictive accuracy is achieved in the industries of Wholesale and Retail Trade, Transportation and Communications, and Financials. Inferior accuracy can be observed for Manufacturing and Services industries.
Agriculture, Forestry and Fishing
In particular, good classification performance is achieved in the industry of Agriculture, Forestry and Fishing probably due to the relatively small size of the sample at issue. Four financial ratios have been considered for modelling purposes, including RETA, CATA, IVSA and PYCOGS. The results indicate that quadratic discriminant analysis and boosted trees are the most accurate models as both achieved an AUC of 0.875. In both cases, 75% of nonfraud cases are correctly identified, as well as 100% of fraud cases.
Again, special case must be taken when generalising these results, as a fairly small sample is being considered. It is worth mentioning that no relevant patterns have been found within this industry when constructing a decision tree. Because of the small amount of available data, it was unfeasible to find significant redflags in this domain.
Mining and Construction
Good results can also be observed in the case of the Mining and Construction industry, where ten financial ratios were considered as predictors and a relatively big sample has been considered. In this case, superior performance has been achieved by QDA and random forests, mainly because of their remarkable accuracy when predicting negative cases, that is, high values of specificity. Nevertheless, good specificity and sensitivity rates are attained when using decision trees as they correctly classify 83.3% of nonfraud cases and 80% of fraud cases.
More interesting results can be seen when using all observation to construct a decision tree model. As depicted in Figure 2, two main redflags, associated with the items of inventory and accounts receivable, can be used to detect fraudulent companies in the Mining and Construction industry. The first one is IVTA, as the evidence suggests that it is more likely to be in presence of fraud when this ratio is bigger than 0.0118, which indicates that fraudulent firms tend to exaggerate inventory levels in this particular industry. Hence, fraud alarms should be activated when inventories represent more than 1.2% of total assets in mining and construction firms.
The second indicator than can be used to expose falsified reports is RVSA. As such, when inventory levels compared to assets (IVTA) are within the nonfraudulent range (i.e.: lower than 0.0118), then auditors should check if RVSA levels are higher than 0.234. Therefore, the greater the probability of accounting fraud when figures of receivables represent more than 23.4% of total sales.
Manufacturing
Inferior performance of all predictive models is achieved when dealing with manufacturing firms. Relatively better results are obtained by boosting techniques. In particular, AdaBoost correctly classifies 55.7% of nonfraud cases and 60.9% of fraud cases, which is only a small improve as opposed to random guessing. This is at least surprising as the size of the sample considered is relatively big and predictors have shown significance differences between the groups.
The reason for a poor predictive performance can be associated with the complexity of the fraud schemes perpetrated within this industry. Although models show bad performance in general, interesting patterns emerge when implementing a decision tree method using all observations, as it can be seen in Figure 3. Falsifying reports in this case, usually involve the manipulation of three financial items, that is, retained earnings, current assets and total liabilities.
Moreover, decision tree results indicate that auditors should be more sceptical if RETA is higher that 0.292, CATA lower than 0.347 and TLTE higher than 1.132, since these three redflags together are often seen when fraud is being committed in manufacturing firms. In other words, high probability of accounting fraud will be present when (i) accounts receivables represent more than 29.2% of total assets; (ii) the proportion of current assets in relation to total assets is lower than 0.347; and (iii) total liabilities are 13.2% or higher than shareholders’ equity.
Transportation, Communication, Electric, Gas and Sanitary Service
Moderate accuracy performance is achieved by the proposed methods in this case, being boosted trees and random forests the ones showing the best results. On the one hand, random forests perform well when predicting fraud cases (75%), as opposed to boosted trees that perform better when predicting nonfraud cases (71.9%).
More relevant results can be observed from Figure 4. The most significant predictors of accounting fraud committed in this industry are IVSA and PYCOGS. As such, fraudulent reporting is more likely to be occurring as a result of misstatement of inventory levels and/or accounts payable figures.
As for the case of inventory manipulation, the warning sign is triggered when IVSA is lower or equal than zero. From basic accounting, it is known that figures of inventory and total sales cannot be negative due to the lack of economic meaning. Then, the only possibility in this case is that inventories are zero. Consequently, auditors should be cautious when null inventories are part of financial statements as it may be a sign of accounting fraud.
On the other hand, if inventory levels are not null, then fraud alarm should be activated when accounts payable represent 28.2% of cost of good sold, as it may be indicating fraudulent activities of firms belonging to the industry at issue.
Wholesale Trade and Retail Trade
Moderate accuracy is achieved in the case of trading firms. It can be observed that boosted trees show superior performance when detecting both fraud and nonfraud cases. Decision trees, on the other hand, achieved exceptional results when predicting fraud instances, but poor performance when dealing with nonfraud cases.
Furthermore, decision trees results suggest that fraudulent trading companies manipulate mainly two financial items simultaneously, that is, retained earnings and inventories. Two clear patterns can be identified when accounting fraud is being committed, as shown in Figure 5.
The first pattern has been found when the RETA ratio is between 0 and 0.186, and the IVSA ratio is higher than 0.189. That is, moderate positive values of retained earnings and large values of inventory happening together represents a clear sign of falsified reports.
The second pattern of fraudulent activity is identified when the RETA ratio is higher than 0.186 and, at the same time, the IVSA ratio is higher than 0.335 That is, exaggerated valuation of earnings compared to assets, and inventory compared to sales are considered in this industry as irregular, hence more attention should be paid when facing this situation.
Finance, Insurance and Real Estate
Moderate prediction accuracy is obtained again, now in the industry of Finance, Insurance and Real Estate. In general, more advanced models achieved slightly better performance, out of which boosting techniques perform the best. In particular, it can be seen that boosted trees correctly classify 68.2% of nonfraud cases and 63.2% of fraud cases.
Moreover, and as it can be seen in Figure 6, fraudulent reporting within financial firms is more likely to be occurring as a result of manipulation of accounts payable and debtspecific figures. On the one hand, if accounts payable are lower or equal to zero together with longterm debt higher than zero, then more attention must be paid as it may be a sign of accounting fraud.
On the other hand, if accounts payable to cost of good sold are higher than 22.82 and, simultaneously, total liabilities are 19.05 times more than shareholders’ equity, then warning alarm should be activated as irregular patterns are occurring that suggest fraudulent activities.
Services
Poor performance achieved by machine learning methods when detecting accounting fraud within the service industry. Relatively better performance attained by treebased methods, being decision trees the methodology that showed a more balanced performance regarding correct positive and negative classifications, that is, between sensitivity and specificity.
In addition, and as depicted in Figure 7, a fairly straightforward trick is usually performed by fraudulent companies in the industry of service, that is understating of sales figure together with the artificial exaggeration of inventory. More scrutiny should be made when total sales represent less than 25.6% of total assets, as well as when the proportion of inventory in terms of cost of good sold is higher than 0.032, as they may be indicating that accounting fraud is being conducted.
Public Administration
Exceptional results are obtained in the industry of public administration. Particularly superior performance was accomplished by random forests, as 90% of nonfraudulent cases are correctly classified, as well as 83.8% of fraudulent cases.
Accounting fraud in the industry of public administration is highly related to large values of inventory compared to sales, as it can be seen in Figure 8. Furthermore, special attention should be paid when evidencing inventories representing 6.3% or more of total sales, as this is a clear sign of manipulated financial reports.
5 Conclusions
5.1 Conclusions
This study aims to identify signs of accounting fraud occurrence to be used to, first, identify companies that are more likely to be manipulating financial statement reports, and second, assist the task of examination within the riskier firms by evaluating relevant financial redflags, as to efficiently recognise irregular accounting malpractices.
To achieve this, a thorough forensic data analytic approach is proposed that includes all pertinent steps of a datadriven methodology. First, data collection and preparation is required to present pertinent information related to fraud offences and financial statements. Then, an indepth financial ratio analysis is performed in order to analyse the collected data and to preserve only meaningful variables. Finally, statistical modelling of fraudulent and nonfraudulent instances is performed by implementing several machine learning methods, followed by the extraction of distinctive fraudrisk indicators related to each economic sector.
This study contributes in the improvement of accounting fraud detection in several ways, including the collection of a comprehensive sample of fraud and nonfraud firms concerning all financial industries, an extensive analysis of financial information and significant differences between genuine and fraudulent reporting, selection of relevant predictors of accounting fraud, contingent analytical modelling for better differentiate between nonfraud and fraud cases, and identification of industryspecific indicators of falsified records.
The results of the current research suggest there is a great potential in detecting falsified accounting records through statistical modelling and analysis of publicly available accounting information. It has been shown good performance of basic models used as benchmark  discriminant analysis and logistic regression, and better performance of more advanced methods, including AdaBoost, decision trees, boosted trees and random forests. Results support the usefulness of machine learning models as they appropriately meet the criteria of accuracy, interpretability and costefficiency required for a successful detection system.
The proposed methodology can be easily used by public auditors and regulatory agencies in order to assess the likelihood of accounting fraud, and also to be adopted in combination with the experience and knowledge of experts to lead to better examination of accounting reports. In addition, the proposed methodological framework could be of assistance to many other interested parties, including investors, creditors, financial and economic analysts, amongst others.
5.2 Limitations and Future Work
The collected sample of accounting fraud offences is considered to be only a fragment of the population of companies issuing fraudulent financial statement, as there is no guarantee that nonfraudulent firms are in fact legitimate observations until proven otherwise. Also, nonpublic companies are excluded from this study as the SEC only has jurisdiction over publicly traded companies.
It is worth noting that accounting fraud is very versatile, and as such, will always evolve in terms of deceptive tricks. Managers will adapt their fraudulent schemes in order to successfully commit fraud, hence results obtained in this study are exclusively consequence of the investigation of the collected data and different conclusions may be reach when considering an alternative source of information.
Lastly, models performances are not ideal in some scenarios mainly due to sample size and omitted predictive variables. It is strongly suggested the inclusion of additional information to help better understand the accounting fraud phenomenon, which may consist of qualitative variables, including corporate governance information and inside trading data, as well as timeevolving features and industrytrending benchmarks. It would not be surprising to discover interesting temporal patterns of stock prices or asset returns when dealing with fraudulent corporations, or find an extraordinary economic performance of dishonest companies compared to the industry average.
Further work can be done for classification threshold selection. When modelling the accounting fraud phenomenon, it was mentioned that a specific classification threshold was considered to determine fraud and nonfraud categories in several machine learning techniques. Evaluation of different thresholds would be of much interest as it may improve classification accuracy in a costsensitive environment, such as the one at issue.
In addition, different methodologies are suggested to tackle the imbalance class challenge. The method adopted in the present study was based on random undersampling, but other techniques may improve this part of the process, such as random oversampling, bootstrap models, cost modifying methods and algorithmlevel approaches, to name a few.
More advanced machine learning techniques are also recommended. It would be very interesting to implement alternative and more advanced methods, such as support vector machines, neural networks and Bayesian models, as they may be helpful to correctly identify fraudulent firms.
Finally, it is suggested to replicate the proposed methodology in specific economic domains, such as the pharmaceutical industry, health care industry and financial industry, amongst others. The more specialised the industry, the more interesting patterns are likely to be found and, therefore, to be explored and analysed.
Acknowledgements
The authors would like to thank the Securities Class Action Clearinghouse, Stanford Law School, for providing access to the collection of fraud cases considered in this study.
References
References
 Abe [2005] S. Abe. Support Vector Machines for Pattern Classification. SpringerVerlag, New York, NY, 2005.

Baesens et al. [2015]
B. Baesens, V. V. Vlasselaer, and W. Verbeke.
Fraud Analytics Using Descriptive, Predictive, and Social Network Techniques: A Guide to Data Science for Fraud Detection
. John Wiley and Sons, Inc., 2015.  Beasley [1996] M. Beasley. An Empirical Analysis of the Relation between the Board of Director Composition and Financial Statement Fraud. The Accounting Review, 1996.
 Bell and Carcello [2000] T. Bell and J. Carcello. A Decision Aid for Assessing the Likelihood of Fraudulent Financial Reporting. Auditing: A Journal of Practice & Theory, 2000.
 Bishop [2006] C. M. Bishop. Pattern Recognition and Machine Learning. Springer, 2006.
 Bolton and Hand [2002] R. Bolton and D. Hand. Statistical Fraud Detection: A Review. Statistical Science, 2002.
 Cerullo and Cerullo [1999] M. Cerullo and V. Cerullo. Using Neural Networks to Predict Financial Reporting Fraud. Computer Fraud and Security, 1999.
 Chawla et al. [2004] N. V. Chawla, N. Japkowicz, and A. Kotcz. Editorial: Special Issue on Learning from Imbalanced Data Sets. ACM SIGKDD Explorations Newsletter, 2004.
 Choi and Green [1997] J. Choi and B. Green. Assessing the Risk of Management Fraud Through Neural Network Technology. Auditing, 1997.
 Fanning and Cogger [1998] K. Fanning and K. Cogger. Neural Network Detection of Management Fraud Using Published Financial Data. International Journal of Intelligent Systems in Accounting, Finance & Management, 1998.
 Feroz et al. [2000] E. H. Feroz, T. M. Kwon, V. Pastena, and K. Park. The Efficacy of Red Flags in Predicting the SEC’s Targets: An Artificial Neural Networks Approach. International Journal of Intelligent Systems in Accounting, Finance & Management, 2000.
 Gupta and Gill [2012] R. Gupta and N. S. Gill. Prevention and Detection of Financial Statement Fraud  An Implementation of Data Mining Framework. International Journal of Advanced Computer Science and Applications, 2012.
 Hansen et al. [1996] J. V. Hansen, J. B. McDonald, J. W. F. Messier, and T. B. Bell. A Generalized QuanlitativeResponse Model and the Analysis of Management Fraud. Management Science, 1996.
 Hollander et al. [2013] M. Hollander, D. A. Wolfe, and E. Chicken. Nonparametric Statistical Methods: Third Edition. Wiley Series in Probability and Statistics. Wiley, 2013.

Hoogs et al. [2007]
B. Hoogs, T. Kiehl, C. Lacomb, and D. Senturk.
A Genetic Algorithm Approach to Detecting Temporal Patterns Indicative of Financial Statement Fraud.
Intelligent Systems in Accounting, Finance and Management, 2007.  James et al. [2013] G. James, D. Witten, T. Hastie, and R. Tibshirani. An Introduction to Statistical Learning. SpringerVerlag New York, 2013.
 Kaminski et al. [2004] K. A. Kaminski, T. S. Wetzel, and L. Guan. Can Financial Ratios Detect Fraudulent Financial Reporting? Managerial Auditing Journal, 2004.
 Kendall [1955] M. G. Kendall. Rank Correlation Methods. Hafner Publishing Co, 1955.
 Kirkos et al. [2007] E. Kirkos, C. Spathis, and Y. Manolopoulos. Data Mining Techniques for the Detection of Fraudulent Financial Statements. Expert Systems with Applications, 2007.
 Kotsiantis et al. [2006] S. Kotsiantis, E. Koumanakos, D. Tzelepis, and V. Tampakas. Forecasting Fraudulent Financial Statements Using Data Mining. International Journal of Computational Intelligence, 2006.

Kwon and Feroz [1996]
T. M. Kwon and E. Feroz.
A Multilayered Perceptron Approach to Prediction of the SEC’s Investigation Targets.
IEEE Transactions on Neural Networks, 1996.  Mokhiber and Weissman [2005] R. Mokhiber and R. Weissman. On The Rampage: Corporate Power and the Destruction of Democracy. Corporate Focus Series. Common Courage Press, 2005.
 Ngai et al. [2011] E. W. Ngai, Y. Hu, Y. Wong, Y. Chen, and X. Sun. The Application of Data Mining Techniques in Financial Fraud Detection: A Classification Framework and an Academic Review of Literature. Decision Support Systems, 2011.
 Pai et al. [2011] P. F. Pai, M. F. Hsu, and M. C. Wang. A Support Vector MachineBased Model for Detecting Top Management Fraud. KnowledgeBased Systems, 2011.
 Pedregosa et al. [2011] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikitlearn: Machine Learning in Python. Journal of Machine Learning Research, 2011.
 Persons [1995] O. S. Persons. Using Financial Statement Data to Identify Factors Associated with Fraudulent Financial Reporting. Journal of Applied Business Research, 1995.

Ravisankar et al. [2011]
P. Ravisankar, V. Ravi, G. R. Rao, and I. Bose.
Detection of Financial Statement Fraud and Feature Selection Using Data Mining Techniques.
Decision Support Systems, 2011.  Schilit and Perler [2010] H. M. Schilit and J. Perler. Financial Shenanigans. Mc Graw Hill, 2010.
 Sheskin [2003] D. J. Sheskin. Handbook of Parametric and Nonparametric Statistical Procedures: Third Edition. CRC Press, 2003.
 Song et al. [2014] X. P. Song, Z. H. Hu, J. G. Du, and Z. H. Sheng. Application of Machine Learning Methods to Risk Assessment of Financial Statement Fraud: Evidence from China. Journal of Forecasting, 2014.
 Spathis et al. [2002] C. Spathis, M. Doumpos, and C. Zopounidis. Detecting Falsified Financial Statements: A Comparative Study Using Multicriteria Analysis and Multivariate Statistical Techniques. The European Accounting Review, 2002.
 Summers and Sweeney [1998] S. Summers and J. Sweeney. Fraudulently Misstated Financial Statements and Insider Trading: An Empirical Analysis. The Accounting Review, 1998.
 Swartz [2003] M. Swartz. Power Failure: The Inside Story of the Collapse of Enron. Doubleday, 2003.
 Tu [1996] J. Tu. Advantages and Disadvantages of Using Artificial Neural Networks Versus Logistic Regression for Predicting Medical Outcomes. Journal of Clinical Epidemiology, 1996.
 Van Vlasselaer et al. [2015] V. Van Vlasselaer, T. EliassiRad, L. Akoglu, M. Snoeck, and B. Baesens. GOTCHA! Networkbased Fraud Detection for Social Security Fraud. Management Science, 2015.
Comments
There are no comments yet.