Fighting Accounting Fraud Through Forensic Data Analytics

05/08/2018 ∙ by Maria Jofre, et al. ∙ The University of Sydney 0

Accounting fraud is a global concern representing a significant threat to the financial system stability due to the resulting diminishing of the market confidence and trust of regulatory authorities. Several tricks can be used to commit accounting fraud, hence the need for non-static regulatory interventions that take into account different fraudulent patterns. Accordingly, this study aims to improve the detection of accounting fraud via the implementation of several machine learning methods to better differentiate between fraud and non-fraud companies, and to further assist the task of examination within the riskier firms by evaluating relevant financial indicators. Out-of-sample results suggest there is a great potential in detecting falsified financial statements through statistical modelling and analysis of publicly available accounting information. The proposed methodology can be of assistance to public auditors and regulatory agencies as it facilitates auditing processes, and supports more targeted and effective examinations of accounting reports.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 30

page 31

page 32

page 34

page 35

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In the last few decades, accounting fraud has been drawing a great deal of attention amongst researchers and practitioners, since it is becoming increasingly frequent and diverse. Accounting fraud is one of the most harmful financial crimes as it often results in massive corporate collapses, commonly silenced by powerful high-status executives and managers (Mokhiber and Weissman, 2005). Given their hidden dynamic characteristics, ‘book cooking’ accounting practices are particularly hard to detect, hence the need of more sophisticated tools to assist the exposure of complex fraudulent schemes and the identification of warning signs of manipulated financial reports.

The catastrophic consequences of accounting fraud expose how vulnerable and unprotected the community is in regards to this matter, since most damage is inflicted to investors, employees and government. Several accounting scandals reflect this reality, being the Enron infamous case one of the most controversial. The giant energy company was engaged in a massive fraudulent scheme that culminated abruptly towards the end of 2001 with its impressive collapse and further bankruptcy. Consequently, Enron’s investors and stakeholders lost nearly $74 billion, and 4,500 employees lost their jobs and pensions without proper notice (Swartz, 2003). Even though the general opinion describes Enron’s failure as unpredictable, Schilit and Perler (2010) affirm that the disaster could have been avoided if a careful examination of the public documents during the preceding years of the debacle had been performed. The impressive revenue growth from $9.2 billion in 1995 to $100.8 billion in 2000 should have warned the public, especially when considering that profits did not increase at such spectacular rate. They conclude that the use of relevant indicators could be beneficial to further alert the public before a disaster occurs.

In the framework of this study, accounting fraud is defined as the calculated misrepresentation of the financial statement information that is publicly disclosed by companies. The intention is to mislead stakeholders regarding the firm’s true financial position, by overstating its expectations on assets, or understating exposure to liabilities; hence the artificial inflation of earnings, as well as its return on equity. Accounting fraud may take the form of either direct manipulation of financial items or via creative methods of accounting (Schilit and Perler, 2010). Several synonyms of accounting fraud exist in the literature, including the so-called financial statement fraud, corporate fraud and management fraud.

Perpetrators of accounting fraud can be motivated by personal benefit (e.g.: maximisation of compensation packages), or by explicit or implied contractual obligations, such as debt covenants and the need to meet market projections and expected economic growth. The most harm is inflicted to the long-run reputation of the organisation itself, the value destruction of investors and the diminishing of the public’s trust in the capital market (Ngai et al., 2011). Other victims often include suppliers, partners, customers, regulatory institutions, enforcement agencies, taxation authorities, the stock exchange, creditors and financial analysts (Pai et al., 2011).

Standard auditing procedures are often insufficient to identify fraudulent accounting reports since most managers recognise the limitations of audits, hence the need for additional dynamic and comprehensive analytical methods to detect accounting fraud accurately and in an early stage (Kaminski et al., 2004). Accordingly, the present study aims to improve the detection rate of accounting fraud offences through the implementation of several machine learning methods and assessment of industry-specific risk indicators, in order to assist the design of an innovative, flexible and responsive corporate regulation tool.

In order to achieve the proposed objective, a thorough forensic data analytic approach is implemented that includes all pertinent steps of a data-driven methodology. The study contributes in the improvement of accounting fraud detection in several ways, including the collection of a comprehensive sample of fraud and non-fraud firms concerning all financial industries, an extensive analysis of financial information and significant differences between genuine and fraudulent reporting, selection of relevant predictors of accounting fraud, contingent analytical modelling of the phenomenon to better recognise fraudulent cases, and identification of financial red-flags as indicators of falsified records.

The rest of the article is organised as follows. A critical review of accounting fraud detection literature is performed in Section 2 as to summarise commonly used techniques and results achieved in previous studies. Section 3 presents a detailed description of the proposed methodology including the studied dataset, sample selection process, explanatory variables examined, variable selection process and machine learning models considered. Section 4 illustrates the empirical results of the proposed algorithms and further discussion of key findings. Finally, Section 5 concludes this paper and gives directions for future research.

2 Accounting Fraud Detection Literature

Part of the fraudulent financial reporting literature has focused primarily in the examination of qualitative characteristics related to the board of directors and principal executives, including information of corporate governance structure (Beasley, 1996; Hansen et al., 1996; Bell and Carcello, 2000) and insider trading data (Summers and Sweeney, 1998). Studies using this kind of information show promising results. However, getting access to such data is very difficult and sometimes even prohibited for most individuals.

On the other hand, studies using publicly available financial statement information are less common and usually incorporate small samples. Generally, the selection of fraud cases is limited to certain conditions and manually matched after with non-fraud observations on the basis of business fundamentals, such as industry, size, maturity, period and more. Undoubtedly, there is an interesting gap in this area of the literature where the selection process of a more representative sample has the potential to be explored and expanded.

With regard to the employed techniques, discriminant analysis and logistic regression are by far the most popular. Such algorithms are commonly considered as a benchmark framework due to their simplicity and low computational cost, and because they have been proven to efficiently detect falsified accounting reporting in relatively small samples

(Fanning and Cogger, 1998; Spathis et al., 2002; Kaminski et al., 2004; Pai et al., 2011)

. Better results have been achieved by the implementation of decision trees, a popular machine learning method often used to predict fraudulent accounting records mainly due to their fewer data preparation requirements and intuitive interpretation

(Kotsiantis et al., 2006; Kirkos et al., 2007; Pai et al., 2011; Gupta and Gill, 2012; Song et al., 2014).

Alternative and more advanced approaches have also been adopted in order to detect accounting fraud. Neural networks are also in high demand for accounting fraud detection as they have shown promising results when predicting fraudulent reporting practices

(Kwon and Feroz, 1996; Choi and Green, 1997; Fanning and Cogger, 1998; Feroz et al., 2000; Ravisankar et al., 2011)

. A similar situation is experienced when considering more complex settings, such as support vector machines

(Kotsiantis et al., 2006; Ravisankar et al., 2011; Pai et al., 2011; Song et al., 2014)

, Bayesian networks

(Kirkos et al., 2007)

, genetic programming

(Hoogs et al., 2007) and hybrid methods (Kotsiantis et al., 2006; Song et al., 2014). Nevertheless, the achieved performance of the aforementioned methodologies is counteracted by the considerable drawbacks that these methods entail, including important computational costs and overfitting proneness, as well as struggling when interpreting results (Tu, 1996; Abe, 2005).

A list of prior studies using machine learning techniques for accounting fraud detection is summarised in Table 1. Additional methodological details are also provided, such as the size of the chosen samples, number of fraud cases, methods employed and overall accuracy, when available.

Study Sample Size Fraud Cases Method(s) Overall Accuracy (%) Persons (1995) 206 103 Logistic Regression n/a Kwon & Feroz (1996) 70 35 Neural Networks Logistic Regression 88 47 Choi and Green (1997) 172 86 Neural Networks n/a Fanning & Cogger (1998) 204 102 Logistic Regression Discriminant Analysis Neural Networks 50 52 63 Lee et al. (1999) 620 56 Logistic Regression n/a Feroz et al. (2000) 132 42 Neural Networks Logistic Regression 81 70 Spathis (2002) 76 38 Logistic Regression 84 Spathis et al. (2002) 76 38 Multicriteria Decision Aid Method Discriminant Analysis Logistic Regression 88 84 81 Lin et al. (2003) 200 40 Neural Networks Logistic Regression 76 79 Kaminski et al. (2004) 158 79 Discriminant Analysis n/a Kotsiantis et al. (2006) 164 41 Decision Trees Neural Networks Bayesian Networks Logistic Regression Support Vector Machines Hybrid Decision Support System 91 80 74 75 79 95 Kirkos et al. (2007) 76 38 Decision Trees Neural Networks Bayesian Networks 74 80 90 Hoogs et al. (2007) 390 51 Genetic Programming n/a Lenard et al. (2007) 30 15 Logistic Regression 77 Ravisankar et al. (2011) 202 101 Support Vector Machines Genetic Programming Logistic Regression Neural Networks 72 89 71 91 Pai et al. (2011) 75 25 Support Vector Machines Discriminant Analysis Logistic Regression Decision Trees Neural Networks 92 81 79 84 83 Gupta & Singh (2012) 114 29 Decision Trees Genetic Programming 95 88 Danial et al. (2014) 130 65 Logistic Regression 75 Song et al. (2014) 550 110 Logistic Regression Decision Trees Neural Networks Support Vector Machines 78 79 85 86

Table 1: Prior studies in detecting accounting fraud

Many contributions can be attributed to prior studies as all accounting fraud research enhance awareness and knowledge of this phenomenon. Furthermore, it can be said that forensic accounting strongly supports accounting fraud detection and promotes the design of relevant anti-fraud preventive measures.

However, a great deal of work can be further done to improve detection strategies in many ways. First, it can be observed that sample sizes of previous studies are fairly small and that, in general, samples are manually selected. The latter is a highly problematic practice as it is inherently biased and so results cannot be extrapolated to the population. Therefore, increasing the amount of data used to train, validate and test the models is a noticeable enhancement, as well as attempting to collect as many fraudulent cases as possible, and not only the most convenient for the sake of research results.

Moreover, most prior studies focus their analysis in specific industries defined by the Standard Industrial Classification (SIC) system. After careful review, it is surprisingly observed that there are no studies that investigate accounting fraud within financial services firms, situation that can be depicted in Table 2. The main reason for the exclusion of these entities is that they are structurally different and an alternative set of variables may be required since certain financial statement items, such as accounts receivable and inventory, are not available for these companies. Hence “research to find the variables most useful in the specific industries would be of great value”, especially in the poorly examined area of financial services (Fanning and Cogger, 1998). As such, a substantial improvement is achieved in the present study as cases from all industries are included.

Additional improvements in the area of accounting fraud detection can be attained when considering more relevant machine learning methods and performance evaluation metrics. As previously mentioned, complex techniques have been implemented in prior studies, most of them achieving superior performance compared to more basic methods, but the cost of this improvement is relatively high when taking into account the considerable drawbacks that these algorithms entail in terms of computational costs and interpretability. Also, most studies only focus on maximising overall accuracy without further consideration of more suitable assessment measurements.

Consequently, machine learning methods based on decision trees and boosting techniques are implemented in this paper, since their outcome can be very useful when detecting accounting fraud as straightforward classification rules can be extracted, and easily interpreted and replicated by auditors and regulatory agencies. Furthermore, alternative metrics that account for the difference between misclassification costs associated with fraud and non-fraud cases, are proposed to properly measure the predictive ability of the suggested models.

Study Industry
Persons (1995) Manufacturing and services
Kwon & Feroz (1996) n/a
Choi & Green (1997) n/a
Fanning & Cogger (1998) Financial companies excluded
Lee et al. (1999) Financial companies excluded
Feroz et al. (2000) Banking companies excluded
Spathis (2002) Manufacturing firms
Spathis et al. (2002) Manufacturing firms
Lin et al. (2003) n/a
Kaminsky (2004) Banking and insurance firms excluded
Kotsiantis et al. (2006) Manufacturing firms
Kirkos et al. (2007) Manufacturing firms
Hoogs et al. (2007) Financial companies excluded
Lenard et al. (2007) Service-based computer and technology firms
Ravisankar et al. (2011) n/a
Pai et al. (2011) n/a
Gupta & Singh (2012) n/a
Danial et al. (2014) Financial and insurance sectors excluded
Song et al. (2014) Financial companies excluded
Table 2: SIC Industries included in prior studies

In brief, it can be said that although the proposed techniques of previous studies have increased the detection rate of accounting fraud offences, these are very limited and often not sufficient to uncover complex fraudulent schemes. It is fairly clear, then, the need for improved methodologies that assist the fraud detection task to further discover hidden patterns of falsified financial reports in order to expose them as soon as possible and, therefore, rapidly address recovery strategies and attenuate potential losses.

3 Methodology

3.1 Forensic Analytics

According to Van Vlasselaer et al. (2015), fraud offences are not crimes that happen fortuitously but are carefully planned, concealed and committed. Accounting fraud perpetrators are continuously conceiving new ways to commit their offences and, in consequence, always transforming their fraudulent behaviour, thus the complexity of the accounting fraud phenomenon. This deliberate managerial wrongdoing is particularly hard to detect and predict, since it involves deep knowledge of accounting and legal tricks that are intentionally employed to make documents look genuine and error-free.

Forensic data analysis is concerned with the treatment and examination of financial crime offences, hence the relevance of its use to develop an adequate technique for accounting fraud detection. Therefore, a forensic accounting approach is proposed in order to overcome potential auditing failure and further improve examination of public documents through the recommendation of meaningful analysis of accounting items.

3.2 Data

The data collection task is critical in financial crime-related research, since it is very difficult to find sufficient and accurate data for analysis. In addition, and given the highly sensitive nature of the topic, there is a limited amount of relevant journal articles related to accounting fraud detection, and publication of controversial results may be censored or even prohibited (Bolton and Hand, 2002). Therefore, a compilation of an exhaustive and representative database containing relevant cases of accounting fraud instances is imperative to further design an adequate and integral fraud-detection method.

In this study, accounting fraud cases are identified considering all Accounting Series Releases (ASR) and Accounting and Auditing Enforcement Releases (AAER) issued by the U.S. Securities and Exchange Commission (SEC) between 1990 and 2012. In particular, all public litigation releases involving deceptive reporting were hand-collected first from the SEC’s website111SEC Sanctions Database: https://www.secwhistlebloweradvocate.com/program/sec-enforcement/sanctions-database/ and then cross-validated with an official accounting fraud database provided by the Securities and Class Action Clearinghouse (SCAC), Stanford Law School. Non-public companies were excluded from this study, since the SEC only has jurisdiction over publicly traded companies.

The selection of the studied period is justified based on data availability and practicality considerations. On the one hand, discovered fraud cases published by the SEC include successful enforcement actions with monetary sanctions exceeding $1 million announced between July 29, 2002 and present. Accounting fraud cases released by the SEC date from 1990 onwards, hence the selection of the year 1990 as the beginning of the studied period. On the other hand, this study began in the middle of 2013, so including this year would have been erroneous considering that many cases of fraud could have been discovered in the remainder of the year. As such, 2012 is selected as the end year of the studied period.

The resulting fraud database consists of 1,594 fraud-year observations identified by company I.D. and fiscal year of the offence. Table 3 summarises the number of fraudulent observations obtained after splitting fraud cases into the corresponding years of occurrence, particularly arranged by industry.

SIC Standard Industrial Fraud Perc
Codes Classification (SIC) Cases (%)
0100 - 0999 Agriculture, Forestry and Fishing 11 0.69
1000 - 1799 Mining and Construction 52 3.26
2000 - 3999 Manufacturing 609 38.21
4000 - 4999 Transportation, Communications, Electric and Gas 106 6.65
5000 - 5999 Wholesale Trade and Retail Trade 169 10.60
6000 - 6799 Finance, Insurance and Real Estate 236 14.81
7000 - 8999 Services 375 23.53
9100 - 9729 Public Administration 36 2.26
1,594 100
Table 3: Fraud cases by industry

3.3 Sample Selection

One of the main characteristics that defines the fraud phenomenon so uniquely is that it is an uncommon activity (Van Vlasselaer et al., 2015)

, particularly in the context of accounting fraud, since only a minority of the recorded cases are actually classified as fraudulent. Learning from these rare events is a very challenging task given the small amount of observations available to train predictive models, hence especially difficult to further discriminate between fraudulent and non-fraudulent instances. As

Cerullo and Cerullo (1999)

express in regards to this matter, “unrepresentative sample data or too few data observations will result in a model that poorly estimates or predicts future values”.

The class-imbalance problem fully emerges when statistical learning models are applied, because they all opt for a naive strategy of classifying all firms as non-fraudulent. As a consequence, accuracy measures show excellent average performance that only reflect the underlying uneven class distribution. Nevertheless, the methods are totally ineffective in detecting positive cases (Chawla et al., 2004). Therefore, the selection of a more proportionate sample in terms of positive and negative cases is required in order to solve the imbalance problem encountered in this study, and also to enhance the discriminatory power of the proposed statistical models.

A stratifying exercise is conducted according to the target variable Fraud, where a pairing exercise is performed to match each fraud observations with a non-fraud observation on the basis of industry and fiscal period. Consequently, the sample selection process occurs in two phases, first dividing the dataset by SIC industry and fiscal year, and then randomly selecting non-fraud instances from each subgroup.

A variety of sampling methods can be employed when dealing with imbalanced datasets, individually or in combination, hence an extensive and interesting analysis could be done to select suitable samples of fraud and non-fraud cases. A more detailed discussion about this topic is addressed in Section 5.2.

3.4 Variables

A great deal of research studies includes subjective judgment and/or qualitative and non-public information into their models, that are only available to auditors and insiders of the sampled firms. Accounting data, on the other hand, is publicly available for external interested parties, hence whether it can be used to detect falsified reporting is an intriguing question (Persons, 1995).

The literature suggests that financial statement information is useful for accounting fraud detection. In particular, it can be seen that ratio analysis is very popular for this end suggesting that a careful reading of financial ratios can reasonably expose symptoms of fraudulent behaviour. As such, ratios are calculated to quantify the relation between two financial items and to subsequently define acceptable legitimate values. Therefore, if a fraudulent activity is taking place, financial ratios associated with manipulated accounts will deviate from the normal behaviour and conveniently exhibit signs of accounting fraud.

There has been an interesting debate about which features should be used for detecting falsified reports, but still no agreement on which ones are best for this end. An in-depth analysis of the most severe accounting scandals occurred in the U.S. in the last few decades (Schilit and Perler, 2010) shows that the most frequent tricks managers employ in order to hide debilitated businesses are commonly associated with the manipulation of earnings and cash flow items.

In this manner, and considering relevant and significant variables resulting from prior research work on the topic, this study identifies 20 financial statement ratios that measure the majority of aspects of a firm’s financial performance, including leverage, profitability, liquidity and efficiency.

Leverage

One of the most important aspects of a firm is leverage, since it represents the potential return of an investment based on the debt structure of the company. When debt is used to purchase assets, then the value of assets exceeds the borrowing cost, basically because debt interest is tax deductible.

However, this practice comes with greater risks for investors, considering that sometimes firms are not able to pay their debt obligations. In consequence, companies having trouble paying their debts may be tempted to manipulate financial statements in order to meet debt covenants. Therefore, high levels of debt should increase the likelihood of accounting fraud, since it transfers the risk from the firm and its managers to shareholders.

This aspect is measured by the ratios of TLTA (total liabilities to total assets), TLTE (total liabilities to total equity) and LTDTA (long-term debt to total assets).

Profitability

Profitability measures are used to estimate the ability of a firm to generate earnings compared to its costs, hence the importance of maintaining these metrics in line with market projections. As consequence, executives may be willing to manipulate earnings-related financial statements in order to cover profitability problems when companies are not performing as expected.

To test whether firms with poorer financial condition are more likely to engage in fraudulent financial reporting, relevant ratios associated with income, expenses and retained earnings will be considered. These ratios are: NITA (net income to total assets), RETA (retained earnings to total assets) and EBITTA (earnings before interest and tax to total assets).

Liquidity

Liquidity refers to the ability to which an asset can be converted from an investment to cash. This concept is highly important for businesses and investors, since liquid assets reduce in some extent investing risks by ensuring the capacity of a firm to pay off debts as they come due. Consequently, problems involving liquidity may provide an incentive for managers to commit accounting fraud, hence the need to investigate financial ratios related to the liquid composition of assets, as is the case of working capital and current assets. This aspect is evaluated then by the following ratios: WCTA (working capital to total assets), CATA (current assets to total assets), CACL (current assets to current liabilities) and CHNI (cash to net income).

Many investors have alternatively focused their attention on the company’s capability to generate cash from its actual business operations. This aspect however, is usually manipulated since “companies can exert a great deal of discretion when presenting cash flows” (Schilit and Perler, 2010). Ergo, the importance of thoroughly analyse cash flow from operations and, in particular, evaluate its relationship with reported earnings. Therefore, the CFFONI ratio (cash flow from operations to net income) is further considered.

Efficiency

Financial efficiency refers to the capacity of producing as much as possible using as few resources as possible. Inefficiency usually involves higher costs, hence resulting in poorer firm’s performance, which may motivate managers to misstate financial statements that allow subjective estimations, and therefore, are easier to manipulate. Such is the case of accounts receivable, accounts payable, inventory and cost of good sold, so financial ratios related to these accounts are further selected. This aspect is evaluated by ratios involving the aforementioned items, including RVSA (accounts receivable to total sales), RVTA (accounts receivable to total assets), IVTA (inventory to total assets), IVSA (inventory to total sales), IVCA (inventory to current assets), IVCOGS (inventory to cost of good sold) and PYCOGS (accounts payable to cost of good sold).

Efficiency it also linked to capital turnover, which represents the sales generating power of a firm’s assets. In order to maintain the appearance of consistent growth, fraudulent managers may be tempted to manipulate sale-related financial items when dealing with competitive situations. Accordingly, two sale-ratios are considered in order to identify possible fictitious trend in growth, including SATA (total sales to total assets) and SATE (total sales to total equity).

A summary of the aforementioned financial ratios is presented in Table 4, along with the category to which they belong to and their respective calculations.

Category Financial Ratio Calculation
TLTA Total Liabilities / Total Assets
Leverage TLTE Total Liabilities / Total Equity
LTDTA Long-Term Debt / Total Assets
NITA Net Income / Total Assets
Profitability RETA Retained Earnings / Total Assets
EBITTA Earning Before Interest and Tax / Total Assets
WCTA Working Capital / Total Assets
CATA Current Assets / Total Assets
Liquidity CACL Current Assets / Current Liabilities
CHNI Cash / Net Income
CFFONI Cash Flow From Operations / Net Income
RVSA Accounts Receivable / Total Sales
RVTA Accounts Receivable / Total Assets
IVSA Inventory / Total Sales
Efficiency IVTA Inventory / Total Assets
IVCA Inventory / Current Assets
IVCOGS Inventory / Cost of Good Sold
PYCOGS Accounts Payable / Cost of Good Sold
SATA Total Sales / Total Assets
SATE Total Sales / Total Equity
Table 4: Summary of considered financial ratios and calculation

3.5 Variable Selection

Most analytical models implemented to detect fraudulent financial reporting start with numerous variables, out of which only a minority actually contribute to their classification power (Baesens et al., 2015). Thereby, a question of interest to the public is whether fewer explanatory variables can be used in order to achieve similar accuracy rates as those accomplished when using more predictors.

A simple yet very informative univariate analysis is performed in this study in order to evaluate potential differences between financial accounts related to fraudulent and genuine reports, and to further select significant financial ratios that may be suggesting that accounting fraud has been or is being committed.

The so-called Mann-Whitney test is a non-parametric method that is commonly employed for this end due to its ease of use and availability in several advanced statistical software. In simple terms, non-parametric methods refer to statistical techniques that do not make assumptions on the data distribution, hence the reason they are also called distribution-free tests (Hollander et al., 2013)

. These models are particularly useful when there are clear outliers or extreme observations in the data, as is the case of the studied database.

What it sought with this non-parametric hypothesis testing technique is to test if the distribution of fraud data differs significantly compared to non-fraud data. Therefore, the Mann-Whitney test is performed using the rank of the data, that is, the position of each observation within the sample rather than the value per se. In light of this, then it is easy to notice that outliers will have a minimal effect on the test, which makes it very robust in terms of extreme values (Sheskin, 2003).

The following hypotheses are specified for the Mann-Whitney test:

(1)

If p-value

is lower than the 0.05 significance level considered in this paper, then the null hypothesis

can be rejected, so the evidence favours the alternative, . Therefore, it can be said that there is a significant difference between the non-fraudulent firms and fraudulent firms with regard to the financial ratio of interest.

It is worth mentioning that when the suggested tests were conducted considering all observations regardless of the SIC industry they belong to, almost all variables revealed to be significant, which does not contribute to the analysis substantially. Moreover, assuming fraudsters behave the same across all sectors is fairly naive, so a more elaborated domain-specific examination is reasonably required.

Consequently, twenty Mann-Whitney tests are performed per industry, one per selected financial ratio. Table 5 lists industry-specific significant predictors and the relationship with the dependent variable Fraud. Interesting differences between sectors emerge from the performed analysis as some ratios are significant or not depending on the industry the observations belongs to.

On one hand, inventory and retained earnings are relevant predictors in the industries of transportation, communication, electric gas and sanitary service, wholesale trade and retail trade, and services. This may be due to the fact that inventory volumes and retained earnings are easily falsified within the aforementioned sectors. On the other hand, manufacturing companies may be tempted to modify items related to liabilities as well as current assets, while finance, insurance and real estate firms manipulate liabilities and cash flow from operation figures.

3.6 Correlation Analysis

A very popular technique, often applied in data analytics, is correlation analysis. This method is used to evaluate possible relationships between numerical variables, which is particularly useful when working with accounting items that inevitably interact with each other due to the composition of a financial statement report.

The correlation coefficient quantifies the direction and strength of the implicit relationship of two variables of interest, and only expresses the association between them, not the causality. Nonetheless, if correlation is found between two variables, then it can be used as an indicator of a potential casual relation.

Kendall correlation coefficients will be used to assess monotonic relationships, that could be linear or not, based-on rank similarity (Kendall, 1955). Monotonic relationships occur when one variable increases as well as the other variable, or when one variable increases and the other one decreases. The increase/decrease of the analysed variables could happen at the same rate, which is the case of linear relations, or in a dissimilar proportion, which is the case of non-linear associations.

The Kendall correlation is a non-parametric correlation metric, that is, it makes no assumptions on the distribution of the data. It is said to be a measure of rank correlation in the sense that it calculates the relative position of all observations within one variable (rank position), and then compares them with the ranks obtained within the second variable. If observations from both variables have a similar rank (concordant observations), then a high positive correlation will be obtained. Conversely, if ranks are dissimilar (discordant observations), then negative correlations are expected.

Kendall coefficients always range between and , and they can be calculated using the Tau-A statistic defined as follows:

(2)

where is the number of concordant pair of observations, is the number of discordant pair of observations, and is the sample size.

The resulting Kendall correlation matrix is presented below (Figure 1), summarising the correlation coefficients between all financial ratios. A friendly coloured legend is utilised to facilitate visualisation, where intense red boxes indicate positive relationships and intense blue boxes indicate negative associations.

Figure 1: Kendall Correlation Matrix
Ratio* Agriculture, Mining and Manufacturing Transportation, Wholesale Trade Finance, Services Public
Forestry Construction Communications, Electric, and Retail Trade Insurance and Administration
and Fishing Gas and Sanitary Service Real Estate
TLTA + - +
TLTE + + +
LTDTA + + +
NITA -
RETA + - + + + + +
EBITTA + + + + +
WCTA
CATA - + -
CACL - - - - -
CHNI -
CFFONI -
RVSA - - - -
RVTA +
IVSA - - +
IVTA + - + - +
IVCA + + - + - +
IVCOGS + + - +
PYCOGS - - - - -
SATA + - - - -
SATE +
Notes:
+ represents a positive association with the target variable, Fraud
- represents a negative association with the target variable, Fraud
* Two-tailed test at the 0.05 significance level
Table 5: Significant financial ratios by industry

In addition, a summary of most relevant correlations is shown in Table 6.

Financial Ratios Correlation Coefficient
IVSA IVCOGS 0.8693
IVTA IVCA 0.8167
WCTA CACL 0.7732
IVSA IVTA 0.7485
NITA RETA 0.7275
NITA EBITTA 0.7275
TLTA TLTE 0.7145
IVSA IVCA 0.7039
IVCA IVCOGS 0.6523
WCTA CATA 0.5684
SATA SATE 0.5504
Table 6: Most relevant Kendall correlation coefficients

It can be clearly seen that all inventory-related ratios are strongly positive correlated: IVSA, IVTA, IVCA and IVCOGS. Although this situation is completely expected, it entails an important issue when implementing regression models. If two or more variables are highly correlated then multicollinearity emerges, which means some predictors are redundant. As such, the estimated coefficients of the regression model may be inaccurate, and therefore, not very reliable.

Furthermore, a strong positive association has also been found between CACL and WCTA. This is not surprising considering that WC is actually the subtraction of CA and CL, hence a direct relation between these three financial items results from mathematical construction. In addition, and as expected, strong positive correlations between ratios related to profitability have been exposed, which includes both NITA and RETA, as well as NITA and EBITTA. A moderate positive relation between TLTA and TLTE can also be observed, which is completely expected since total assets are calculated as the sum of total liabilities and total equity. Finally, moderate positive correlations have been also exposed between the ratios WCTA and CATA, as well as between SATA and SATE. This latter association makes perfect sense as both ratios are related to sales figures.

Results obtained from the detailed financial ratio analysis performed, which includes non-parametric hypothesis testing and correlation analysis, support the selection of a smaller and meaningful subset of industry-specific explanatory variables. As such, Table 7 provided below lists selected financial ratios by SIC industry that will be further utilised for modelling purposes.

Industry No. of Ratios Selected Ratios
Agriculture, Forestry and Fishing 4 RETA, CATA
IVSA, PYCOGS
TLTA, TLTE
LTDTA, RETA
Mining and Construction 10 CACL, RVSA
IVTA, IVCOGS
PYCOGS, SATA
TLTA, TLTE
Manufacturing 6 RETA, CATA
CACL, RVSA
Transportation, Communications, Electric, 5 RETA, IVSA
Gas and Sanitary Service IVTA, SATA
PYCOGS
RETA
Wholesale Trade and Retail Trade 3 CATA
IVSA
TLTA, TLTE
Finance, Insurance and Real Estate 8 LTDTA, RETA
CFFONI, IVCOGS
PYCOGS, SATA
RETA, CACL
Services 6 IVSA, IVCOGS
PYCOGS, SATA
LTDTA, RETA
Public Administration 8 CATA, CACL
IVSA, IVTA
IVCOGS, SATA
Table 7: Summary of selected financial ratios by industry domain

3.7 Machine Learning Methods

The binary outcome model is considered to be the foundational scheme for detecting accounting fraud since the aim is to classify future observations into only two possible values: fraud or non-fraud.

Accordingly, this study assesses the effectiveness of several machine learning models in the identification of fraudulent reporting. First, discriminant analysis and logistic regression are employed as benchmark framework, followed by the implementation of more advanced but easy-to-interpret algorithms, including AdaBoost, decision trees, boosted trees and random forests.

The motivation for using boosting techniques and tree-based methods is supported in part by the poor detection accuracy of basic models and in part by the excessive complexity of more sophisticated approaches, such as neural networks and support vector machines.

In order to achieve a consistent notation throughout the Section, the following conventions are used for mathematical equations:

  • A superscript denotes the transpose of a matrix or vector.

  • : fraudulent observation.

  • : non-fraudulent observation.

  • : posterior probability of fraud.

  • : posterior probability of non-fraud.

It is worth noting that given there are only two possible outcomes, then it holds that:

(3)

The models were employed as implemented in the Scikit-Learn library (Pedregosa et al., 2011) and an exhaustive explanation of each algorithm is given in what follows.

3.7.1 Discriminant Analysis (DA)

Discriminant analysis is a supervised method used in statistics to address classification problems and to make predictions of a categorical dependent variable. The main idea is to classify an observation into one of the predefined classes using a combination of one or more continuous independent variables in order to generate a discriminant function which best differentiate between the groups.

Subsequently, a decision boundary is generated by fitting class conditional densities to the data using Bayes’ rule:

(4)

The appropriate class is selected which maximises these conditional probabilities. In the case of accounting fraud, only two classes are of interest; therefore:

(5)
(6)

The optimisation task is ultimately achieved using the training data to estimate class priors, both and , class means and the covariance matrices. In particular, class priors are estimated as the proportion of instances in each class, that is, number of fraudulent (or non-fraudulent) observation divided by the total number of observations. Class means are estimated using the empirical sample class means. Similarly, covariance matrices are estimated using the empirical sample class covariance matrices.

In accordance with the aforementioned, the following assumptions are made:

  1. Predictors are all statistically independent.

  2. follows a multivariate Gaussian distribution, with a class-specific mean and covariance matrix.

Different assumptions associated with the covariance matrix will lead to different decision boundaries, one defined by a linear combination of the predictors and another one by a quadratic form.

In both cases, however, the predicted class will be determined using a classification threshold of . As such, if the estimated probability of fraud occurrence () is equal or higher than , then the observation will be classified as fraudulent. On the contrary, if is lower than , or equivalently , then the observation will be classified as non-fraudulent.

Linear Discriminant Analysis (LDA)

In the particular case of linear discriminant analysis, a multivariate normal distribution of the predictors is presumed with a distinct mean for each class and a covariance matrix that is common to all classes. For accounting fraud detection, this means that both fraud and non-fraud classes share the same covariance matrix

.

The advantage of a common covariance matrix is that it simplifies the problem by reducing the computational cost of estimating a large number of parameters when the number of predictors is relatively large. Taking this into consideration, then it is true that:

(7)
(8)

Quadratic Discriminant Analysis (QDA)

Furthermore, quadratic discriminant analysis provides a similar approach yet now it is assumed that the covariance matrix is class-specific, i.e.: for the th class. Therefore:

(9)
(10)

3.7.2 Logistic Regression (LR)

Similar to discriminant analysis, logistic regression is commonly use for performing binary classification. This time the goal is to fit a regression model that estimate the accounting fraud likelihood applying a logistic function that is linear in its argument:

(11)

In order to obtain the best classification possible, the posterior probability of belonging to one of both categories is calculated by maximising the likelihood function. Likewise, let be the posterior probability of fraud (Bishop, 2006), then:

(12)

For a dataset {}, where {0,1} and , the likelihood of any specific outcome is given by:

(13)

where and .

As mentioned before, the maximum likelihood estimates of are obtain by minimising the cross-entropy error function defined by the negative logarithm of the likelihood and then taking its gradient with respect to :

(14)
(15)

To finally decide if an observation is classified as fraudulent or non-fraudulent, then a threshold of will be considered. Consequently, if is estimated to be equal or greater than , then the observation will be classified as fraudulent. Otherwise, it will be classified as non-fraudulent.

3.7.3 AdaBoost (AB)

Adaptive boosting, widely known as AdaBoost, is a machine learning technique used for classification and regression problems that combines multiple ’weak learner’ classifiers in order to produce a better boosted classifier. In this context, a weak learner is a function that is only weakly correlated with the response.

The basic idea is to weight observations by how easy or difficult they are to categorise, giving more importance to those that are harder to predict in order to learn from them and further construct better subsequent classifiers. Accordingly, each individual classifier generates an output , , for every observation of the training set. Then, these classifiers are trained on a weighted form using as classifier coefficients. As mentioned before, misclassified instances will be given greater weight when used to train the subsequent classifier (Bishop, 2006).

The goal is to minimise a weighted error function in every iteration taking into account the information and performance of previous classifiers. Ultimately and after the last iteration , a final boost classifier is constructed as an additive combination of all trained weak learner classifiers :

(16)

In this case, a classification threshold of has been adopted. As such, an observation will be classified as fraudulent when is equal or greater than , and classified as non-fraudulent when is lower than .

The AdaBoost pseudo code222Scharth, M. (2017). Statistical Learning and Data Mining, Module 15 [PowerPoint presentation]. Discipline of Business Analytics, The University of Sydney Business School, QBUS6810. is shown in Algorithm 1.

1:  Initialise the observation weights , .
2:  for  to  do
3:     Fit a classifier to the training data using weights .
4:     Compute the weighted error rate.
5:     Compute .
6:     Update the weights,
7:  end for
8:  Output the classification sign[].
Algorithm 1 AdaBoost

3.7.4 Decision Trees (DT)

Decision trees are a non-parametric supervised learning method that classify observations based on the values of one or more predictors. The advantage of decision trees lies in the straightforward extraction of if-then classification rules easily replicable by auditors and regulatory authorities. Also, no assumptions on the structure of the data is needed, which is very convenient in this case considering the asymmetrical distribution of some explanatory variables.

The structure of a DT consists of nodes representing a test on a particular attribute and branches representing an outcome of the test. The idea is to divide observations into mutually exclusive classes in order to build the smallest set of rules that is consistent with the training data. To identify the attribute that best separates the sample, information gain and entropy reduction are used as estimation criteria.

There are several tree algorithms, such as ID3, C4.5, C5.0 and CART, among others. The chosen method used in this study is the Classification and Regression Trees (CART) characterised by the construction of binary trees based-on feature and threshold selection that provide the largest information gain in each node. This algorithm recursively partitions the space in order to minimise the error or impurity of each node, resulting in terminal nodes that represent homogeneous groups that differ substantially from the others.

Accordingly, let the information at node be , then the binary partition of the data is defined by a candidate split that divides the space into two subsets: and .

The error at node is calculated using an impurity function evaluated in both partitions, that later is minimised in order to estimate the parameters.

(17)
(18)

The impurity function implemented in this study corresponds to the Gini function:

(19)

where is the proportion of class observations in node .

It is worth noting that the partitions of the predictor space are based on a greedy algorithm called recursive binary splitting. The technique is greedy because at the best split is made at each step of the tree-building process without taking into account the consequences further down the tree. Consequently, in some cases very complex trees are generated as result of this approach. However, a couple of mechanisms can be used in order to avoid this situation, such as setting the minimum number of required observations at a leaf node or setting the maximum depth of the tree.

The tree size is therefore a tuning parameter determining the complexity of the model and it should be selected adaptively from the data. As such, the maximum number of node splits in the current study is settled as 5, optimal valued obtained by cross validation.

Decision trees are remarkably superior than the first two methods used as benchmark - logistic regression and discriminant analysis - considering how easy they are to explain, implement and visualise. Unfortunately, they show some drawbacks that should be mentioned, such as their inherent instability that emerges when little changes in the data cause a large change in the structure of the estimated tree, as well as the lower predictive accuracy when compared to more advanced techniques.

Decision trees can be used as the basic component of powerful prediction methods. Therefore, two additional models that employ decision trees as their foundation, will be introduce in what follows.

3.7.5 Boosted Trees (BT)

Similar to AdaBoost, the boosted trees method is an ensemble of weak learners but now in the explicit form of fixed size decision trees as base classifiers.

Accordingly, an iterative process takes place in order to fit a decision tree output in every iteration to improve the previous model by constructing a new model that adds this new information:

(20)

The main idea is to minimise an error function defined by the difference between the old model and the new one , what is called the residual

, through a gradient boosting algorithm that is much like the gradient descent method used in the logistic regression approach.

In this case, a classification threshold of has also been adopted. In this regard, an observation will be classified as fraudulent when is equal or greater than , and classified as non-fraudulent when is lower than .

Same as in the decision tree methodology and for consistency, the maximum depth of the fitted trees is established to be 5.

3.7.6 Random Forests (RF)

A further enhancement of boosted trees is provided by the random forests approach, one of the most popular bagging techniques. Bootstrap aggregation, or bagging, averages many noisy but approximately unbiased models, which results in a reduction of the variance.

The idea is to fit a classification model to the training data to obtain the prediction . Bagging averages this prediction over a collection of bootstrap samples333In statistics, bootstrapping is any test or metric that relies on random sampling with replacement.. For each bootstrap sample , , the selected classification model is fitted to obtain a prediction . The bagged classifier selects the class (fraud or non-fraud) with the most “votes” from the classifiers:

(21)

Decision trees are ideal candidates for bagging as they capture complex interactions structures in the data, which leads to relatively low biased but high variance. Consequently, classification trees are adopted next for bagging to further construct random forests.

Random forests improve over bagging by adding an adjustment that helps decorrelate the trees. In this context, instead of using all predictors, random forests only select a random subset of the features as split candidates in each step. The rationale behind this methodology is that when establishing a fewer and fixed number of predictors, then more variation in the structure of the model is allowed, which diminishes the correlation between the resulting trees. Interestingly, this new condition makes the average of the fitted trees less variable and therefore more reliable (James et al., 2013).

In building a random forest, independent variables out of all possible predictors are randomly selected at each node, and later the best split on the considered variables is found. As a last step, all trees are averaged to obtain a final prediction.

The random forests pseudo code444Scharth, M. (2017). Statistical Learning and Data Mining, Module 13 [PowerPoint presentation]. Discipline of Business Analytics, The University of Sydney Business School, QBUS6810. is shown in Algorithm 2.

1:  for  to  do
2:     Sample observations with replacement from the training data to obtain the bootstrap sample .
3:     Grow a random forest tree to by repeating the following steps for each terminal node of the tree, until the minimum node size is reached:
4:     (i) Select variables at random from the variables.
5:     (ii) Pick the best variable and split point among the candidates.
6:     (iii) Split the node into two daughter nodes.
7:  end for
8:  Output the ensemble of trees .
Algorithm 2 Random Forests

In order to be consistent with the previous methodologies, the maximum depth of the estimated trees is established to be 5.

3.8 Models Assessment

An interesting issue related to fraudulent reporting is the difference of misclassification costs. As mentioned previously, most studies only seek to maximise overall accuracy without further analysing more suitable assessment measurements.

The cost of misclassification differs when dealing with accounting fraud, since a false negative error, which is when a fraud observation is classified as non-fraud, is usually considered more expensive that a false positive error, which is when a non-fraud observation is classified as fraud. The reasoning behind this is that a misclassification of a non-fraud firm may cause an important misuse of resources and time, but a misclassification of a fraudulent company may result in incorrect decisions and economic damage.

Accordingly, the overall accuracy rate is no longer sufficient to assess model performance. Other metrics, such as specificity, sensitivity and precision, are now taken into consideration, as well as G-measure, F-measure and AUC, that are calculated using combinations of these metrics. All mentioned indicators are based on the confusion matrix shown in Table

8.

Predicted Positives Predicted Negatives
Real Positives TP FN
Real Negatives FP TN
Table 8: Confusion matrix

Model assessment metrics are described next, including the formula used to calculate them when appropriate.

  1. Overall Accuracy: it measures the ability to differentiate both fraudulent and genuine observations correctly. It is calculated as the proportion of true positive and true negative cases compared to the total number of observations.

    (22)
  2. Specificity: it evaluates the ability to determine non-fraudulent cases correctly. As such, it is computed as the proportion of true negative compared to all legitimate negative observations.

    (23)
  3. Sensitivity: it assesses the capacity to classify fraudulent cases correctly. It is then calculated as the proportion of true positive cases compared to all legitimate positive observations.

    (24)
  4. Precision: it measures the proportion of true positive cases compared to all predicted positive observations.

    (25)
  5. G-Mean

    : is the geometric mean of sensitivity and specificity measures. As such, it takes into account the ability of correctly classifying both fraudulent and non-fraudulent observations.

    (26)
  6. F-Measure: is a metric that integrates both measures of precision and sensitivity

    (27)
  7. AUC: The Area Under the Curve (AUC) is a point estimate of the Receiver Operating Characteristic (ROC) curve, which evaluates the diagnostic ability of a binary classifier model as a function of varying a decision threshold. As such, it assesses both true positive and false positive rates considering different threshold settings. The AUC is the probability that the binary classifier will rank a randomly chosen positive instance higher than a randomly chosen negative one. As such, AUC is always a positive number range between 0 and 1, so the closer to the unit, the better is the model as it means it is correctly separating instances into the non-fraud and fraud groups. The AUC is computed using the trapezoidal rule, which is a commonly used technique for approximating a definite integral.

Regulatory authorities face critical limitations in terms of human resources, budget support and time constrains, thus a detailed investigation of all records and companies is infeasible or too expensive to undertake. Investigations should concentrate on those firms that are more likely to perpetrate accounting fraud. Therefore, it is preferable to focus on models that correctly classify fraudulent observations rather than non-fraudulent cases.

For this reason, G-Mean, F-Measure and AUC will be used as model assessment criteria, since they properly capture both false positive and false negative errors, and mitigate the misclassification issue inherent when detecting accounting fraud offences.

It is worth mentioning, before further presentation and discussion of results, that all classification accuracy metrics are calculated using out-of-sample data, that is, considering all the data points not belonging to the training sample. Furthermore, the considered model will learn the parameters of a prediction function from a subset of the available data and further tested in a different scenario in order to generalise the results. A standard practice in statistics is to hold out part of the dataset, commonly called testing set, and use it later to assess the performance of the model.

Therefore, a stratified 10-fold cross-validation approach is implemented before running the proposed variable selection technique. As such, the studied dataset is divided in 10 folds, each one containing an equal number of fraud and non-fraud cases. For each fold, the model is trained by using the remaining nine folds and then validated by using the hold out fold. At last, model performance is calculated as the average performance of all testing folds (Kirkos et al., 2007).

4 Results and Discussions

Table 9 reports the results of the proposed models by SIC industry.

Accuracy Specificity Sensitivity Precision G-Mean F-Measure AUC
Agriculture, Forestry and Fishing (n = 22, p = 4)
LDA 0.714 0.500 1.000 0.600 0.707 0.750 0.750
QDA 0.857 0.750 1.000 0.750 0.866 0.857 0.875
LR 0.714 0.500 1.000 0.600 0.707 0.750 0.750
AB 0.857 0.750 1.000 0.750 0.866 0.857 0.875
DT 0.571 0.750 0.333 0.500 0.500 0.400 0.542
BT 0.571 0.750 0.333 0.500 0.500 0.400 0.542
RF 0.714 0.500 1.000 0.600 0.707 0.750 0.750
Mining and Construction (n = 104, p = 10)
LDA 0.656 0.917 0.500 0.909 0.677 0.645 0.708
QDA 0.812 0.917 0.750 0.938 0.829 0.833 0.833
LR 0.688 0.917 0.550 0.917 0.710 0.687 0.733
AB 0.625 0.667 0.600 0.750 0.632 0.667 0.633
DT 0.812 0.833 0.800 0.889 0.816 0.842 0.817
BT 0.750 0.833 0.700 0.875 0.764 0.778 0.767
RF 0.781 1.000 0.650 1.000 0.806 0.788 0.825
Manufacturing (n = 1,218, p = 6)
LDA 0.530 0.460 0.594 0.548 0.522 0.570 0.527
QDA 0.546 0.109 0.943 0.539 0.321 0.686 0.526
LR 0.530 0.425 0.625 0.545 0.516 0.583 0.525
AB 0.585 0.557 0.609 0.603 0.583 0.606 0.583
DT 0.555 0.259 0.823 0.551 0.461 0.660 0.541
BT 0.574 0.621 0.531 0.607 0.574 0.567 0.576
RF 0.503 0.460 0.542 0.525 0.499 0.533 0.501
Transportation, Communications, Electric, Gas and Sanitary Service
(n = 212, p = 5)
LDA 0.562 0.625 0.500 0.571 0.559 0.533 0.562
QDA 0.562 0.969 0.156 0.833 0.389 0.263 0.562
LR 0.578 0.594 0.562 0.581 0.578 0.571 0.578
AB 0.609 0.719 0.500 0.640 0.599 0.561 0.609
DT 0.531 0.625 0.438 0.538 0.523 0.483 0.531
BT 0.672 0.719 0.625 0.690 0.670 0.656 0.672
RF 0.656 0.562 0.750 0.632 0.650 0.686 0.656
Wholesale and Retail Trade (n = 338, p = 3)
LDA 0.559 0.521 0.593 0.582 0.556 0.587 0.557
QDA 0.500 0.042 0.907 0.516 0.194 0.658 0.475
LR 0.549 0.521 0.574 0.574 0.547 0.574 0.547
AB 0.608 0.542 0.667 0.621 0.601 0.643 0.604
DT 0.637 0.479 0.778 0.627 0.610 0.694 0.628
BT 0.745 0.771 0.722 0.780 0.746 0.750 0.747
RF 0.637 0.625 0.648 0.660 0.636 0.654 0.637
Table 9: Prediction accuracy by industry
Accuracy Specificity Sensitivity Precision G-Mean F-Measure AUC
Finance, Insurance and Real Estate (n = 472, p = 8)
LDA 0.570 0.621 0.526 0.615 0.572 0.567 0.574
QDA 0.592 0.273 0.868 0.579 0.487 0.695 0.571
LR 0.570 0.591 0.553 0.609 0.571 0.579 0.572
AB 0.648 0.621 0.671 0.671 0.646 0.671 0.646
DT 0.627 0.561 0.684 0.642 0.619 0.662 0.622
BT 0.655 0.682 0.632 0.696 0.656 0.662 0.657
RF 0.627 0.652 0.605 0.667 0.628 0.634 0.628
Services (n = 750, p = 6)
LDA 0.587 0.468 0.698 0.583 0.572 0.635 0.583
QDA 0.587 0.229 0.922 0.560 0.460 0.697 0.576
LR 0.587 0.495 0.672 0.586 0.577 0.627 0.584
AB 0.627 0.550 0.698 0.623 0.620 0.659 0.624
DT 0.631 0.615 0.647 0.641 0.630 0.644 0.631
BT 0.631 0.550 0.707 0.626 0.624 0.664 0.629
RF 0.618 0.477 0.750 0.604 0.598 0.669 0.614
Public Administration (n = 72, p = 8)
LDA 0.636 0.400 0.833 0.625 0.577 0.714 0.617
QDA 0.818 0.900 0.750 0.900 0.822 0.818 0.825
LR 0.727 0.600 0.833 0.714 0.707 0.769 0.717
AB 0.727 0.700 0.750 0.750 0.725 0.750 0.725
DT 0.773 0.700 0.833 0.769 0.764 0.800 0.767
BT 0.773 0.700 0.833 0.769 0.764 0.800 0.767
RF 0.864 0.900 0.833 0.909 0.866 0.870 0.867

It can be seen that results are dissimilar across different industries and machine learning techniques. Best performance of the proposed models is obtained for firms belonging to the Agriculture, Forestry and Fishing industry, Mining and Construction, and Public Administration. Moderate predictive accuracy is achieved in the industries of Wholesale and Retail Trade, Transportation and Communications, and Financials. Inferior accuracy can be observed for Manufacturing and Services industries.

Agriculture, Forestry and Fishing

In particular, good classification performance is achieved in the industry of Agriculture, Forestry and Fishing probably due to the relatively small size of the sample at issue. Four financial ratios have been considered for modelling purposes, including RETA, CATA, IVSA and PYCOGS. The results indicate that quadratic discriminant analysis and boosted trees are the most accurate models as both achieved an AUC of 0.875. In both cases, 75% of non-fraud cases are correctly identified, as well as 100% of fraud cases.

Again, special case must be taken when generalising these results, as a fairly small sample is being considered. It is worth mentioning that no relevant patterns have been found within this industry when constructing a decision tree. Because of the small amount of available data, it was unfeasible to find significant red-flags in this domain.

Mining and Construction

Good results can also be observed in the case of the Mining and Construction industry, where ten financial ratios were considered as predictors and a relatively big sample has been considered. In this case, superior performance has been achieved by QDA and random forests, mainly because of their remarkable accuracy when predicting negative cases, that is, high values of specificity. Nevertheless, good specificity and sensitivity rates are attained when using decision trees as they correctly classify 83.3% of non-fraud cases and 80% of fraud cases.

More interesting results can be seen when using all observation to construct a decision tree model. As depicted in Figure 2, two main red-flags, associated with the items of inventory and accounts receivable, can be used to detect fraudulent companies in the Mining and Construction industry. The first one is IVTA, as the evidence suggests that it is more likely to be in presence of fraud when this ratio is bigger than 0.0118, which indicates that fraudulent firms tend to exaggerate inventory levels in this particular industry. Hence, fraud alarms should be activated when inventories represent more than 1.2% of total assets in mining and construction firms.

The second indicator than can be used to expose falsified reports is RVSA. As such, when inventory levels compared to assets (IVTA) are within the non-fraudulent range (i.e.: lower than 0.0118), then auditors should check if RVSA levels are higher than 0.234. Therefore, the greater the probability of accounting fraud when figures of receivables represent more than 23.4% of total sales.

Figure 2: Decision Tree Visualisation
Industry: Mining and Construction

Manufacturing

Inferior performance of all predictive models is achieved when dealing with manufacturing firms. Relatively better results are obtained by boosting techniques. In particular, AdaBoost correctly classifies 55.7% of non-fraud cases and 60.9% of fraud cases, which is only a small improve as opposed to random guessing. This is at least surprising as the size of the sample considered is relatively big and predictors have shown significance differences between the groups.

The reason for a poor predictive performance can be associated with the complexity of the fraud schemes perpetrated within this industry. Although models show bad performance in general, interesting patterns emerge when implementing a decision tree method using all observations, as it can be seen in Figure 3. Falsifying reports in this case, usually involve the manipulation of three financial items, that is, retained earnings, current assets and total liabilities.

Moreover, decision tree results indicate that auditors should be more sceptical if RETA is higher that -0.292, CATA lower than 0.347 and TLTE higher than 1.132, since these three red-flags together are often seen when fraud is being committed in manufacturing firms. In other words, high probability of accounting fraud will be present when (i) accounts receivables represent more than 29.2% of total assets; (ii) the proportion of current assets in relation to total assets is lower than 0.347; and (iii) total liabilities are 13.2% or higher than shareholders’ equity.

Transportation, Communication, Electric, Gas and Sanitary Service

Moderate accuracy performance is achieved by the proposed methods in this case, being boosted trees and random forests the ones showing the best results. On the one hand, random forests perform well when predicting fraud cases (75%), as opposed to boosted trees that perform better when predicting non-fraud cases (71.9%).

More relevant results can be observed from Figure 4. The most significant predictors of accounting fraud committed in this industry are IVSA and PYCOGS. As such, fraudulent reporting is more likely to be occurring as a result of misstatement of inventory levels and/or accounts payable figures.

As for the case of inventory manipulation, the warning sign is triggered when IVSA is lower or equal than zero. From basic accounting, it is known that figures of inventory and total sales cannot be negative due to the lack of economic meaning. Then, the only possibility in this case is that inventories are zero. Consequently, auditors should be cautious when null inventories are part of financial statements as it may be a sign of accounting fraud.

On the other hand, if inventory levels are not null, then fraud alarm should be activated when accounts payable represent 28.2% of cost of good sold, as it may be indicating fraudulent activities of firms belonging to the industry at issue.

Figure 3: Decision Tree Visualisation
Industry: Manufacturing
Figure 4: Decision Tree Visualisation
Industry: Transp., Comm., Electric, Gas and Sanitary Serv.

Wholesale Trade and Retail Trade

Moderate accuracy is achieved in the case of trading firms. It can be observed that boosted trees show superior performance when detecting both fraud and non-fraud cases. Decision trees, on the other hand, achieved exceptional results when predicting fraud instances, but poor performance when dealing with non-fraud cases.

Furthermore, decision trees results suggest that fraudulent trading companies manipulate mainly two financial items simultaneously, that is, retained earnings and inventories. Two clear patterns can be identified when accounting fraud is being committed, as shown in Figure 5.

The first pattern has been found when the RETA ratio is between 0 and 0.186, and the IVSA ratio is higher than 0.189. That is, moderate positive values of retained earnings and large values of inventory happening together represents a clear sign of falsified reports.

The second pattern of fraudulent activity is identified when the RETA ratio is higher than 0.186 and, at the same time, the IVSA ratio is higher than 0.335 That is, exaggerated valuation of earnings compared to assets, and inventory compared to sales are considered in this industry as irregular, hence more attention should be paid when facing this situation.

Figure 5: Decision Tree Visualisation
Industry: Wholesale Trade and Retail Trade

Finance, Insurance and Real Estate

Moderate prediction accuracy is obtained again, now in the industry of Finance, Insurance and Real Estate. In general, more advanced models achieved slightly better performance, out of which boosting techniques perform the best. In particular, it can be seen that boosted trees correctly classify 68.2% of non-fraud cases and 63.2% of fraud cases.

Moreover, and as it can be seen in Figure 6, fraudulent reporting within financial firms is more likely to be occurring as a result of manipulation of accounts payable and debt-specific figures. On the one hand, if accounts payable are lower or equal to zero together with long-term debt higher than zero, then more attention must be paid as it may be a sign of accounting fraud.

On the other hand, if accounts payable to cost of good sold are higher than 22.82 and, simultaneously, total liabilities are 19.05 times more than shareholders’ equity, then warning alarm should be activated as irregular patterns are occurring that suggest fraudulent activities.

Services

Poor performance achieved by machine learning methods when detecting accounting fraud within the service industry. Relatively better performance attained by tree-based methods, being decision trees the methodology that showed a more balanced performance regarding correct positive and negative classifications, that is, between sensitivity and specificity.

In addition, and as depicted in Figure 7, a fairly straightforward trick is usually performed by fraudulent companies in the industry of service, that is understating of sales figure together with the artificial exaggeration of inventory. More scrutiny should be made when total sales represent less than 25.6% of total assets, as well as when the proportion of inventory in terms of cost of good sold is higher than 0.032, as they may be indicating that accounting fraud is being conducted.

Public Administration

Exceptional results are obtained in the industry of public administration. Particularly superior performance was accomplished by random forests, as 90% of non-fraudulent cases are correctly classified, as well as 83.8% of fraudulent cases.

Accounting fraud in the industry of public administration is highly related to large values of inventory compared to sales, as it can be seen in Figure 8. Furthermore, special attention should be paid when evidencing inventories representing 6.3% or more of total sales, as this is a clear sign of manipulated financial reports.

Figure 6: Decision Tree Visualisation
Industry: Finance, Insurance and Real Estate
Figure 7: Decision Tree Visualisation
Industry: Services
Figure 8: Decision Tree Visualisation
Industry: Public Administration

5 Conclusions

5.1 Conclusions

This study aims to identify signs of accounting fraud occurrence to be used to, first, identify companies that are more likely to be manipulating financial statement reports, and second, assist the task of examination within the riskier firms by evaluating relevant financial red-flags, as to efficiently recognise irregular accounting malpractices.

To achieve this, a thorough forensic data analytic approach is proposed that includes all pertinent steps of a data-driven methodology. First, data collection and preparation is required to present pertinent information related to fraud offences and financial statements. Then, an in-depth financial ratio analysis is performed in order to analyse the collected data and to preserve only meaningful variables. Finally, statistical modelling of fraudulent and non-fraudulent instances is performed by implementing several machine learning methods, followed by the extraction of distinctive fraud-risk indicators related to each economic sector.

This study contributes in the improvement of accounting fraud detection in several ways, including the collection of a comprehensive sample of fraud and non-fraud firms concerning all financial industries, an extensive analysis of financial information and significant differences between genuine and fraudulent reporting, selection of relevant predictors of accounting fraud, contingent analytical modelling for better differentiate between non-fraud and fraud cases, and identification of industry-specific indicators of falsified records.

The results of the current research suggest there is a great potential in detecting falsified accounting records through statistical modelling and analysis of publicly available accounting information. It has been shown good performance of basic models used as benchmark - discriminant analysis and logistic regression-, and better performance of more advanced methods, including AdaBoost, decision trees, boosted trees and random forests. Results support the usefulness of machine learning models as they appropriately meet the criteria of accuracy, interpretability and cost-efficiency required for a successful detection system.

The proposed methodology can be easily used by public auditors and regulatory agencies in order to assess the likelihood of accounting fraud, and also to be adopted in combination with the experience and knowledge of experts to lead to better examination of accounting reports. In addition, the proposed methodological framework could be of assistance to many other interested parties, including investors, creditors, financial and economic analysts, amongst others.

5.2 Limitations and Future Work

The collected sample of accounting fraud offences is considered to be only a fragment of the population of companies issuing fraudulent financial statement, as there is no guarantee that non-fraudulent firms are in fact legitimate observations until proven otherwise. Also, non-public companies are excluded from this study as the SEC only has jurisdiction over publicly traded companies.

It is worth noting that accounting fraud is very versatile, and as such, will always evolve in terms of deceptive tricks. Managers will adapt their fraudulent schemes in order to successfully commit fraud, hence results obtained in this study are exclusively consequence of the investigation of the collected data and different conclusions may be reach when considering an alternative source of information.

Lastly, models performances are not ideal in some scenarios mainly due to sample size and omitted predictive variables. It is strongly suggested the inclusion of additional information to help better understand the accounting fraud phenomenon, which may consist of qualitative variables, including corporate governance information and inside trading data, as well as time-evolving features and industry-trending benchmarks. It would not be surprising to discover interesting temporal patterns of stock prices or asset returns when dealing with fraudulent corporations, or find an extraordinary economic performance of dishonest companies compared to the industry average.

Further work can be done for classification threshold selection. When modelling the accounting fraud phenomenon, it was mentioned that a specific classification threshold was considered to determine fraud and non-fraud categories in several machine learning techniques. Evaluation of different thresholds would be of much interest as it may improve classification accuracy in a cost-sensitive environment, such as the one at issue.

In addition, different methodologies are suggested to tackle the imbalance class challenge. The method adopted in the present study was based on random under-sampling, but other techniques may improve this part of the process, such as random over-sampling, bootstrap models, cost modifying methods and algorithm-level approaches, to name a few.

More advanced machine learning techniques are also recommended. It would be very interesting to implement alternative and more advanced methods, such as support vector machines, neural networks and Bayesian models, as they may be helpful to correctly identify fraudulent firms.

Finally, it is suggested to replicate the proposed methodology in specific economic domains, such as the pharmaceutical industry, health care industry and financial industry, amongst others. The more specialised the industry, the more interesting patterns are likely to be found and, therefore, to be explored and analysed.

Acknowledgements

The authors would like to thank the Securities Class Action Clearinghouse, Stanford Law School, for providing access to the collection of fraud cases considered in this study.

References

References

  • Abe [2005] S. Abe. Support Vector Machines for Pattern Classification. Springer-Verlag, New York, NY, 2005.
  • Baesens et al. [2015] B. Baesens, V. V. Vlasselaer, and W. Verbeke.

    Fraud Analytics Using Descriptive, Predictive, and Social Network Techniques: A Guide to Data Science for Fraud Detection

    .
    John Wiley and Sons, Inc., 2015.
  • Beasley [1996] M. Beasley. An Empirical Analysis of the Relation between the Board of Director Composition and Financial Statement Fraud. The Accounting Review, 1996.
  • Bell and Carcello [2000] T. Bell and J. Carcello. A Decision Aid for Assessing the Likelihood of Fraudulent Financial Reporting. Auditing: A Journal of Practice & Theory, 2000.
  • Bishop [2006] C. M. Bishop. Pattern Recognition and Machine Learning. Springer, 2006.
  • Bolton and Hand [2002] R. Bolton and D. Hand. Statistical Fraud Detection: A Review. Statistical Science, 2002.
  • Cerullo and Cerullo [1999] M. Cerullo and V. Cerullo. Using Neural Networks to Predict Financial Reporting Fraud. Computer Fraud and Security, 1999.
  • Chawla et al. [2004] N. V. Chawla, N. Japkowicz, and A. Kotcz. Editorial: Special Issue on Learning from Imbalanced Data Sets. ACM SIGKDD Explorations Newsletter, 2004.
  • Choi and Green [1997] J. Choi and B. Green. Assessing the Risk of Management Fraud Through Neural Network Technology. Auditing, 1997.
  • Fanning and Cogger [1998] K. Fanning and K. Cogger. Neural Network Detection of Management Fraud Using Published Financial Data. International Journal of Intelligent Systems in Accounting, Finance & Management, 1998.
  • Feroz et al. [2000] E. H. Feroz, T. M. Kwon, V. Pastena, and K. Park. The Efficacy of Red Flags in Predicting the SEC’s Targets: An Artificial Neural Networks Approach. International Journal of Intelligent Systems in Accounting, Finance & Management, 2000.
  • Gupta and Gill [2012] R. Gupta and N. S. Gill. Prevention and Detection of Financial Statement Fraud - An Implementation of Data Mining Framework. International Journal of Advanced Computer Science and Applications, 2012.
  • Hansen et al. [1996] J. V. Hansen, J. B. McDonald, J. W. F. Messier, and T. B. Bell. A Generalized Quanlitative-Response Model and the Analysis of Management Fraud. Management Science, 1996.
  • Hollander et al. [2013] M. Hollander, D. A. Wolfe, and E. Chicken. Nonparametric Statistical Methods: Third Edition. Wiley Series in Probability and Statistics. Wiley, 2013.
  • Hoogs et al. [2007] B. Hoogs, T. Kiehl, C. Lacomb, and D. Senturk.

    A Genetic Algorithm Approach to Detecting Temporal Patterns Indicative of Financial Statement Fraud.

    Intelligent Systems in Accounting, Finance and Management, 2007.
  • James et al. [2013] G. James, D. Witten, T. Hastie, and R. Tibshirani. An Introduction to Statistical Learning. Springer-Verlag New York, 2013.
  • Kaminski et al. [2004] K. A. Kaminski, T. S. Wetzel, and L. Guan. Can Financial Ratios Detect Fraudulent Financial Reporting? Managerial Auditing Journal, 2004.
  • Kendall [1955] M. G. Kendall. Rank Correlation Methods. Hafner Publishing Co, 1955.
  • Kirkos et al. [2007] E. Kirkos, C. Spathis, and Y. Manolopoulos. Data Mining Techniques for the Detection of Fraudulent Financial Statements. Expert Systems with Applications, 2007.
  • Kotsiantis et al. [2006] S. Kotsiantis, E. Koumanakos, D. Tzelepis, and V. Tampakas. Forecasting Fraudulent Financial Statements Using Data Mining. International Journal of Computational Intelligence, 2006.
  • Kwon and Feroz [1996] T. M. Kwon and E. Feroz.

    A Multilayered Perceptron Approach to Prediction of the SEC’s Investigation Targets.

    IEEE Transactions on Neural Networks, 1996.
  • Mokhiber and Weissman [2005] R. Mokhiber and R. Weissman. On The Rampage: Corporate Power and the Destruction of Democracy. Corporate Focus Series. Common Courage Press, 2005.
  • Ngai et al. [2011] E. W. Ngai, Y. Hu, Y. Wong, Y. Chen, and X. Sun. The Application of Data Mining Techniques in Financial Fraud Detection: A Classification Framework and an Academic Review of Literature. Decision Support Systems, 2011.
  • Pai et al. [2011] P. F. Pai, M. F. Hsu, and M. C. Wang. A Support Vector Machine-Based Model for Detecting Top Management Fraud. Knowledge-Based Systems, 2011.
  • Pedregosa et al. [2011] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 2011.
  • Persons [1995] O. S. Persons. Using Financial Statement Data to Identify Factors Associated with Fraudulent Financial Reporting. Journal of Applied Business Research, 1995.
  • Ravisankar et al. [2011] P. Ravisankar, V. Ravi, G. R. Rao, and I. Bose.

    Detection of Financial Statement Fraud and Feature Selection Using Data Mining Techniques.

    Decision Support Systems, 2011.
  • Schilit and Perler [2010] H. M. Schilit and J. Perler. Financial Shenanigans. Mc Graw Hill, 2010.
  • Sheskin [2003] D. J. Sheskin. Handbook of Parametric and Nonparametric Statistical Procedures: Third Edition. CRC Press, 2003.
  • Song et al. [2014] X. P. Song, Z. H. Hu, J. G. Du, and Z. H. Sheng. Application of Machine Learning Methods to Risk Assessment of Financial Statement Fraud: Evidence from China. Journal of Forecasting, 2014.
  • Spathis et al. [2002] C. Spathis, M. Doumpos, and C. Zopounidis. Detecting Falsified Financial Statements: A Comparative Study Using Multicriteria Analysis and Multivariate Statistical Techniques. The European Accounting Review, 2002.
  • Summers and Sweeney [1998] S. Summers and J. Sweeney. Fraudulently Misstated Financial Statements and Insider Trading: An Empirical Analysis. The Accounting Review, 1998.
  • Swartz [2003] M. Swartz. Power Failure: The Inside Story of the Collapse of Enron. Doubleday, 2003.
  • Tu [1996] J. Tu. Advantages and Disadvantages of Using Artificial Neural Networks Versus Logistic Regression for Predicting Medical Outcomes. Journal of Clinical Epidemiology, 1996.
  • Van Vlasselaer et al. [2015] V. Van Vlasselaer, T. Eliassi-Rad, L. Akoglu, M. Snoeck, and B. Baesens. GOTCHA! Network-based Fraud Detection for Social Security Fraud. Management Science, 2015.