Log In Sign Up

Federated Continual Learning to Detect Accounting Anomalies in Financial Auditing

by   Marco Schreyer, et al.
Rutgers University
University of St. Gallen

The International Standards on Auditing require auditors to collect reasonable assurance that financial statements are free of material misstatement. At the same time, a central objective of Continuous Assurance is the real-time assessment of digital accounting journal entries. Recently, driven by the advances in artificial intelligence, Deep Learning techniques have emerged in financial auditing to examine vast quantities of accounting data. However, learning highly adaptive audit models in decentralised and dynamic settings remains challenging. It requires the study of data distribution shifts over multiple clients and time periods. In this work, we propose a Federated Continual Learning framework enabling auditors to learn audit models from decentral clients continuously. We evaluate the framework's ability to detect accounting anomalies in common scenarios of organizational activity. Our empirical results, using real-world datasets and combined federated continual learning strategies, demonstrate the learned model's ability to detect anomalies in audit settings of data distribution shifts.


page 3

page 6

page 13


Continual Learning for Unsupervised Anomaly Detection in Continuous Auditing of Financial Accounting Data

International audit standards require the direct assessment of a financi...

Federated and Privacy-Preserving Learning of Accounting Data in Financial Statement Audits

The ongoing 'digital transformation' fundamentally changes audit evidenc...

Learning Sampling in Financial Statement Audits using Vector Quantised Autoencoder Neural Networks

The audit of financial statements is designed to collect reasonable assu...

Multi-view Contrastive Self-Supervised Learning of Accounting Data Representations for Downstream Audit Tasks

International audit standards require the direct assessment of a financi...

Adversarial Learning of Deepfakes in Accounting

Nowadays, organizations collect vast quantities of accounting relevant t...

Mixture of basis for interpretable continual learning with distribution shifts

Continual learning in environments with shifting data distributions is a...

Continuum: Simple Management of Complex Continual Learning Scenarios

Continual learning is a machine learning sub-field specialized in settin...

1 Introduction

The International Standards in Auditing (ISA) demand auditors to collect reasonable assurance that financial statements are free from material misstatements, whether caused by error or fraud AICPA (2002); IFAC (2009). The term fraud refers to ‘the abuse of one’s occupation for personal enrichment through the deliberate misuse of an organisation’s resources or assets’ Wells (2017). According to the Association of Certified Fraud Examiners (ACFE), organisations lose 5% of their annual revenues due to fraud ACFE (2020).111The ACFE study encompasses an analysis of 2,110 cases of occupational fraud surveyed between January 2020 and September 2021 in 133 countries. The ACFE highlights that respondents experienced a median loss of USD 100K in the first 7-12 months after a fraud scheme begins. Detecting fraud and abuse quickly is critical since the longer a scheme remains undetected, the more severe its financial, reputational, and organisational impact PwC (2022).

In the last decade, the ongoing digital transformation has fundamentally changed the nature, recording, and volume of audit evidence Yoon et al. (2015). Nowadays, organizations record vast quantities of digital accounting records, referred to as Journal Entries (JEs), in Enterprise Resource Planning (ERP) systems (Grabski et al., 2011). For the audit profession, this unprecedented exposure to large volumes of accounting records offers new opportunities to obtain audit-relevant insights Appelbaum (2016). Lately, accounting firms also foster the development of Deep Learning (DL) capabilities LeCun et al. (2015)

to learn advanced models to audit digital journal entry data. To augment a human auditor’s capabilities DL models are applied in various audit tasks, such as accounting anomaly detection

Schultz and Tropmann-Frick (2020); Zupan et al. (2020); Nonnenmacher et al. (2021) illustrated in Fig. 1, audit sampling Schreyer et al. (2020, 2021) or notes analysis Sifa et al. (2019); Ramamurthy et al. (2021). However, learning DL-enabled models in auditing is still in its infancy, exhibiting two main limitations. First, most of today’s audit models are trained from scratch on stationary client data of the in-scope audit period, e.g., a financial quarter or year Sun (2019). Disregarding that organizations operate in environments in which activities rapidly and dynamically evolve Vasarhelyi and Halper (1991); Vasarhelyi et al. (2018); Hemati et al. (2021). New business processes, models, or departments are constantly introduced, while current ones are redesigned or discontinued. Second, audit models are often trained centrally on data of a single client, e.g., the organization ‘in-scope’ of the audit Hemati et al. (2021). Although large audit firms audit multiple organisations operating in the same industry Hoitash et al. (2006). Such ‘peer audit clients’ Kogan and Yin (2021) are affected by similar economic and societal factors, e.g., supply-chains, market cycles, or fiscal policy Chan et al. (2004).

Figure 1:

Overview autoencoder (AEN) based accounting anomaly detection setting

Schreyer et al. (2017); Schultz and Tropmann-Frick (2020). Journal entries corresponding to a high reconstruction error are selected for detailed audit procedures.

In 1995 Thrun et al.

proposed a machine learning setting in which a model incrementally learns from a stream of experiences

Thrun (1995). The underlying idea of Continual Learning (CL) refers to a progressive model adaption; once new information becomes available Parisi et al. (2019). Furthermore, in 2017 McMahan proposed Federated Learning (FL), a learning setting enabling distributed clients to collaboratively train models under the orchestration of a central server McMahan et al. (2017a). A key idea of FL is creating learning synergies while preserving data privacy Dwork et al. (2006). Ultimately, the Federated Continual Learning (FCL) of highly adaptive audit models can be viewed as a central objective of Continuous Assurance Vasarhelyi and Halper (1991); Vasarhelyi et al. (2018). However, learning such models in dynamic and decentralised audit settings remains challenging due to the stability-plasticity dilemma Grossberg (1988). It requires to cope with two types of data distribution shifts Gama et al. (2004); Casado et al. (2022); Jothimurugesan et al. (2022), as described in the following:

  • [leftmargin=5.5mm, labelwidth=0.3em, align=left]

  • A temporal distribution shift between time step and denotes a change in distributions . In the CL setting, such shifts potentially results in the loss of a model’s ability to perform on after learning from referred to as catastrophic forgetting Kirkpatrick et al. (2017).

  • A client distribution shift between client and denotes a divergence of the distributions . In the FL setting, such shifts cause the divergence of client models that eventually result in non-convergence of the server model referred to as model interference Karimireddy et al. (2020).

In this work, we investigate the practical application of FCL in financial auditing to improve the assurance of financial statements. In summary, we present the following contributions:

  • [leftmargin=5.5mm, labelwidth=0.3em, align=left]

  • We propose a novel federated continual learning framework that enables auditors to incrementally learn industry-specific models from distributed data of multiple audit clients.

  • We demonstrate that the framework enables audit models to retain previously learned knowledge that is still relevant and replace obsolete knowledge with novel information.

  • We conduct an extensive evaluation of scenarios of common organizational activity patterns using real-world datasets to illustrate the setting’s benefit in detecting accounting anomalies.

Ultimately, we view FCL as a promising, under-explored learning setting in contemporary auditing. The remainder of this work is structured as follows: In Section 2, we provide an overview of related work. Section 3 follows with a description of the proposed framework to learn federated and privacy-preserving models from vast quantities of JE data. The experimental setup and results are outlined in Section 4 and Section 5. In Section 6, the work concludes with a summary.

Figure 2: Schematic overview of the proposed Federated Continual Learning (FCL) setting Yoon et al. (2021) (left) to detect accounting anomalies and simulated organisational activity scenarios (right).

2 Related Work

The application of ML in financial audits triggered a sizable body of research by academia (Appelbaum, 2016; Sun, 2019) and practitioners (Sun and Vasarhelyi, 2017; Dickey et al., 2019). This section presents our literature study focusing on the unsupervised, federated, and continual learning of accounting data representations.

Representation Learning denotes a machine learning technique that allows systems to discover relevant data features to solve a given task Bengio et al. (2013). Nowadays, most ML methods used in financial audits depend on ‘human’ engineered data representations Cho et al. (2020)

. Such techniques encompass, Naive Bayes classification

Bay et al. (2006), network analysis McGlohon et al. (2009), univariate and multivariate attribute analysis Argyrou (2012)

, cluster analysis

Thiprungsri and Vasarhelyi (2011), transaction log mining Khan et al. (2009), or business process mining Jans et al. (2011); Werner and Gehrke (2015). With the advent of Deep Learning (DL), representation learning techniques have emerged in financial audits Sun (2019); Nonnenmacher and Gómez (2021); Ramamurthy et al. (2021)

. Nowadays, the application of DL in auditing JEs encompasses, autoencoder neural networks

Schultz and Tropmann-Frick (2020), adversarial autoencoders Schreyer et al. (2019a), or variational autoencoders Zupan et al. (2020)

. Lately, self-supervised learning techniques have been proposed to learn representations for multiple audit tasks

Schreyer et al. (2021).

Continual Learning (CL) describes a setting where a model continuously trains on a sequence of experiences Thrun (1995). The challenge of forgetting in model learning has been studied extensively (French, 1999; Díaz-Rodríguez et al., 2018; Toneva et al., 2018; Lesort et al., 2021). Rehearsal techniques replay samples from past experiences (Rolnick et al., 2018; Isele and Cosgun, 2018; Chaudhry et al., 2019) to conduct knowledge distillation Rebuffi et al. (2017), while generative rehearsal techniques replay synthesised samples Shin et al. (2017). Recent methods use gradients of previous tasks to mitigate gradient interference (Lopez-Paz and Ranzato, 2017; Chaudhry et al., 2018). Regularisation techniques consolidate previously acquired knowledge and restrict parameter updates in model optimization. LwF Li and Hoiem (2017), uses knowledge distillation to regularize parameter updates. EWC Kirkpatrick et al. (2017) directly regularises the parameters based on their previous task importance. In SI (Zenke et al., 2017) regularization is applied in separate parameter optimization step. Dynamic architecture techniques prevent forgetting by parameter reuse or increase Rusu et al. (2016); Mendez and Eaton (2021) or. Lately, CL has been deployed in a variety of application scenarios, such as, healthcare Lenga et al. (2020), machine translation Barrault et al. (2020), and financial auditing Hemati et al. (2021).

Federated Learning (FL) enables entities to collaboratively train ML models under the orchestration of a central trusted entity under differential privacy (Kairouz et al., 2019; Dwork et al., 2006). FedAvg McMahan et al. (2017a) aggregates the client’s model parameters by computing a weighted average based on the number of training samples. PATE Papernot et al. (2016) aggregates knowledge transferred from a teacher model that is trained on separated data of student models. FedAdagrad, FedYogi, and FedAdam Reddi et al. (2020) use adaptive optimizer variations to improve FL convergence. FedProx Li et al. (2020) trains the local with a proximal term that restricts the updates to the central model. FedCurv Shoham et al. (2019) aims to minimise the model dispersity across clients by adopting a modified version of EWC Kirkpatrick et al. (2017). Recent works Yurochkin et al. (2019); Wang et al. (2020) also introduce Bayesian non-parametric aggregation policies. Nowadays, FL learning is applied in sensitive application scenarios, such as, healthcare Dankar and El Emam (2012), financial risk modeling Zheng et al. (2020), and financial auditing Schreyer et al. (2022b).

Federated Continual Learning (FCL) addresses the challenge of distributed distribution shifts over time and has only been preliminary studied Casado et al. (2022). Scaffold Karimireddy et al. (2020)

applies variance reduction in the local updates to correct for a client’s concept drifts.

FedWeIT Yoon et al. (2021) decompose a model into global shared and sparse task-specific parameters to prevent inter-client interference. CDA-FedAvg Casado et al. (2022)

extends FedAVG by distribution based drift detection and a long-short term memory for rehearsal.

VER Park et al. (2021) conducts an additional server-side rehearsal based training using a VAE’s Kingma and Welling (2013) embedding statistics or actual embeddings. FedDrift Jothimurugesan et al. (2022)

uses local drift detection and hierarchical clustering to learn multiple global concept models. To the best of our knowledge, this work presents the first step towards the federated continual learning of highly adaptive audit models in financial auditing.

3 Methodology

We consider a unsupervised audit learning setting where a dataset formally defines a population of JEs. Each entry, denoted by , consists of categorical accounting attributes and numerical accounting attributes. The individual attributes encompass the journal entry details, such as posting date, amount, or general ledger. Each JE is generated by a particular organizational activity , e.g., an organizational unit, entity or business process. Furthermore, denotes the set of all activities. In our audit setting we distinguish two types of ‘anomalous’ JEs Breunig et al. (2000) that auditors aim to detect Schultz and Tropmann-Frick (2020); Zupan et al. (2020); Nonnenmacher et al. (2021):

  • [leftmargin=5.5mm, labelwidth=0.3em, align=left]

  • Global Anomalies correspond to entries that exhibit unusual or rare individual attribute values, e.g., rarely used ledgers or unusual posting times. Such anomalies often correspond to unintentional mistakes, are comparably simple to detect, and possess a high error risk.

  • Local Anomalies correspond to entries that exhibit unusual attribute value correlations, e.g., rare co-occurrences of ledgers and posting types. Such anomalies might correspond to intentional deviations, are comparably difficult to detect, and possess a high fraud risk.

Inspired by a human auditor’s learning process, we introduce an audit framework comprised of three interacting learning settings, namely (i) representation, (ii) continual, and (iii) federated learning. Figure 2 illustrates the interactions of the settings which are described in the following.

First, in the representation learning setting Autoencoder Networks (AENs) Hinton and Salakhutdinov (2006) are trained to learn a comprehensive model of a given data distribution . In general, the AEN architecture comprises two non-linear functions, usually neural networks, referred to as encoder and decoder. The encoder , with parameters , learns a representation of a given input , where . In an attempt to achieve , the decoder , with parameters , learns a reconstruction of the original input. Throughout the training process, the AEN model is optimized to learn a set of encoder and decoder parameters , as defined by Hawkins et al. (2002):


where denotes the learned optimal model parameters and a single data observation. Upon successful training, the learning success is quantified by the model’s reconstruction error given and its reconstruction . In our audit setting, similar to Hawkins et al. Hawkins et al. (2002), we interpret the JEs reconstruction error magnitude as its deviation from regular posting patterns. Journal entries that exhibit a high are selected for a detailed audit Schultz and Tropmann-Frick (2020); Schreyer et al. (2022a) as shown in Fig. 1.

Second, in the continual learning setting, an AEN model learns from data that is observed as a stream of disjoint experiences Thrun (1995). The data of the -th experience encompasses data instances of different activities and an individual activity corresponds to data instances . With progressing experiences, the parameters of a given AEN model are then continuously optimized, as defined by French (1999):


where denotes the optimal parameters of the previous experience . In our audit setting each experience dataset exhibits JEs of activities , where , and , may be non-iid. Throughout the learning process, a model has access to the data of an activity only during the time of the -th experience. Although, data instances of an activity and generated by a particular organizational process might occur in multiple experiences.

Third, in the federated learning setting, a central AEN model is collaboratively learned by decentral clients and coordinated by a server Kairouz et al. (2019). In each experience, a single client exhibits access to its private data subset . To initiate the learning a synchronous model optimisation protocol is established that proceeds in rounds . Each round, the server broadcasts its central AEN model to a selection of available clients. Subsequently, each client conducts a decentral training of the central model on its data subset . Upon training completion, the clients send their updated models to the server. The server then aggregates the parameters of the decentral models to create an updated central AEN model , as defined by McMahan et al. (2017a):


where denotes the number of data observations privately held by . In our audit setting, a central audit firm coordinates the learning as illustrated in Fig. 2. We distinguish two categories of decentral FCL clients. First, an audit client ‘in-scope’ or subject of the audit. Second, a set of collaborating ‘peer’ or collaborative clients contributing to the learning of a central audit model. In each experience, the central audit model is applied to detect anomalies in the audit client’s data.

4 Experimental Setup

Figure 3: Example [Scenario 3] audit client

and collaborating clients

configuration (left). Activity (city department) reconstruction errors of

per strategy and progressing experiences (right).

In this section, we describe the (i) dataset, (ii) learning setting, and (iii) experimental details to detect accounting anomalies in an FCL setting. We provide additional experimental details in Appendix A.

City Payments Datasets: To evaluate the FCL capabilities and allow for reproducibility, we use three publicly available datasets of real-world payment data. The datasets exhibit high similarity to ERP accounting data, e.g., typical manual payments or payments runs (SAP T-Codes: F-35, F-110):

Each tabular dataset comprises categorical and numerical attributes described in the Appendix.

Federated Continual Learning: We establish, a systematic FCL setting encompassing a central coordinating audit firm, a single decentral audit client , and three decentral collaborating clients . The client models are learned in parallel, each over a different data stream of continual experiences Park et al. (2021). Each experience, exhibits learning rounds, while each round encompasses = 1,000 training iterations. The data of an individual client experience , exhibits payments of activities. A single experience activity corresponds to = 1,000 randomly sampled payments generated by a particular city department. The experience streams simulate data distribution shifts according to three common client activity scenarios. Each scenario corresponds to a different sparse client activity, where payment data of individual activities is randomly observed. The scenarios, illustrated in Fig 2, are described in the following:

  • [leftmargin=5.5mm, labelwidth=0.3em, align=left]

  • [Scenario 1] simulates a situation where an audit client exhibits sparse payment activities, e.g., due to discontinued or periodic business processes. The distribution shifts at eventually yield catastrophic forgetting. The activities of the collaborating clients remain constant.

  • [Scenario 2] simulates a situation where collaborating clients exhibits sparse payment activities, e.g., due to a carve-out or merger. The distribution shifts at eventually yield severe client model interference affecting . The activities of the audit client remain constant.

  • [Scenario 3] simulates a situation where an audit client and the collaborating clients exhibit sparse payment activities. The scenario simulates a common setting in which client activities opt-in and opt-out with progressing experiences causing model interference and forgetting.

Accounting Anomaly Detection: In each experience, we inject 20 global and 20 local anomalies555To sample both anomaly classes, we use the Faker library Faraglia and Other Contributors : . into each payment activity of the audit client ( 4% of each payment activity). To quantitatively assess the anomaly detection capability of the FCL audit setting, we determine the audit client’s average anomaly detection precision over its stream of learning experiences Schreyer et al. (2022a).

5 Experimental Results

In this section, we present the experimental results of scenarios 1-3. Our evaluation encompasses several federated (FL) and continual learning (CL) strategies. For each strategy, we report the the audit client’s average global anomaly and local anomaly detection precision.

Figure 4: Audit client [Scenario 1] anomaly detection results comparing the CL techniques Sequential Fine-Tuning, Replay Rebuffi et al. (2017), LwF Li and Hoiem (2017), and EWC Kirkpatrick et al. (2017) to mitigate catastrophic forgetting.
Figure 5: Audit client [Scenario 2] anomaly detection results comparing the FL techniques FedAvg McMahan et al. (2017b), FedYogi Reddi et al. (2020), FedProx Li et al. (2020), and Scaffold Karimireddy et al. (2020) to mitigate the client’s model interference.

In Fig. 4, we report the [Scenario 1] audit client anomaly detection results using FedAvg McMahan et al. (2017b) model aggregation. For all datasets, the CL strategies, Replay Rebuffi et al. (2017), LwF Li and Hoiem (2017), and EWC Kirkpatrick et al. (2017), outperform the from Scratch Hemati et al. (2021) learning and Sequential fine-tuning baselines. The results show the ability of the CL strategies ability to mitigate catastrophic forgetting in a sparse audit client setting.

In Fig. 5, we report the [Scenario 2] audit client anomaly detection results conducting sequential model fine-tuning. For all datasets, the FL strategies, FedYogi Reddi et al. (2020), FedProx Li et al. (2020), and Scaffold Karimireddy et al. (2020), outperform the baseline of FedAvg McMahan et al. (2017b). In addition, the FL strategies mitigate model interference closing the gap towards Single Schreyer et al. (2022b) client learning in a sparse collaborating client setting.

Learning Strategies

City of Philadelphia

City of Chicago

City of York



FedAvg McMahan et al. (2017b)














FedProx Li et al. (2020)

Replay Li and Hoiem (2017)













FedProx Li et al. (2020)

LwF Li and Hoiem (2017)













FedProx Li et al. (2020)

EWC Kirkpatrick et al. (2017)













Scaffold Karimireddy et al. (2020)














*Variances originate from training using five distinct random seeds of model parameter initialisation and anomaly sampling.

Table 1: Audit client [Scenario 3] anomaly detection results comparing the combination of CL and FL techniques to mitigate catastrophic forgetting and the client’s model interference.

In Tab. 1, we report the [Scenario 3] audit client anomaly detection results of combining FL and CL strategies. The strategies yield a superior precision for both anomaly classes when compared to the FedAvg-Sequential baseline. The obtained results also suggest that certain strategies are advantageous in detecting particular anomaly classes. For smaller (larger) AEN models we observe, that FedProx (Scaffold) learning successfully regulates dissimilar model convergence. The results provide initial empirical evidence, as illustrated in Fig. 3, that the strategies mitigate distributed-continual distribution shifts in a ‘real-world’ audit setting.

6 Conclusion

In this work, we proposed financial auditing as a novel application area of Federated Continual Learning (FCL). We demonstrated the benefits of such a learning setting by detecting accounting anomalies under different data distribution shifts usually observable in organizational activities. We believe that FCL will enable the audit profession to enhance its assurance services and thereby contributes to the integrity of financial markets. In future work, we aim to investigate (i) novel learning strategies, (ii) potential attack scenarios, and (iii) the addition of differential privacy.

The research conducted by Marco Schreyer was funded by the Mobi.Doc mobility grant for doctoral students of the University of St.Gallen (HSG) under project no. 1031606.


  • [1] ACFE (2020) Report to the Nations - Global Study on Occupational Fraud and Abuse. Association of Certified Fraud Examiners (ACFE). Cited by: §1.
  • [2] AICPA (2002) Statement on Auditing Standards No. 99.: Consideration of Fraud in a Financial Statement Audit. American Institute of Certified Public Accountants (AICPA). Cited by: §1.
  • [3] D. Appelbaum (2016) Securing Big Data Provenance for Auditors: The Big Data Provenance Black Box as Reliable Evidence. Journal of Emerging Technologies in Accounting 13 (1), pp. 17–36. Cited by: §1, §2.
  • [4] A. Argyrou (2012)

    Auditing Journal Entries Using Self-Organizing Map

    In Proceedings of the Eighteenth Americas Conference on Information Systems, Cited by: §2.
  • [5] L. Barrault, M. M. Biesialska, M. Ruiz Costa-Jussà, F. Bougares, and O. Galibert (2020) Findings of the first shared task on lifelong learning machine translation. In EMNLP 2020, Fifth Conference on Machine Translation: November 19-20, 2020, online: proceedings of the conference, pp. 56–64. Cited by: §2.
  • [6] S. Bay, K. Kumaraswamy, M. G. Anderle, R. Kumar, D. M. Steier, A. Blvd, and S. Jose (2006) Large Scale Detection of Irregularities in Accounting Data. In Sixth International Conference on Data Mining, pp. 75–86. Cited by: §2.
  • [7] Y. Bengio, A. Courville, and P. Vincent (2013) Representation learning: a review and new perspectives. IEEE transactions on pattern analysis and machine intelligence 35 (8), pp. 1798–1828. Cited by: §2.
  • [8] D. J. Beutel, T. Topal, A. Mathur, X. Qiu, T. Parcollet, P. P. de Gusmão, and N. D. Lane (2020) Flower: a friendly federated learning research framework. arXiv preprint arXiv:2007.14390. Cited by: §A.5.
  • [9] M. M. Breunig, H. Kriegel, R. T. Ng, and J. Sander (2000)

    LOF: Identifying Density-based Local Outliers

    In Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pp. 93–104. Cited by: §3.
  • [10] F. E. Casado, D. Lema, M. F. Criado, R. Iglesias, C. V. Regueiro, and S. Barro (2022) Concept drift detection and adaptation for federated and continual learning. Multimedia Tools and Applications 81 (3), pp. 3397–3419. Cited by: §1, §2.
  • [11] D. Chan, A. Ferguson, D. Simunic, and D. Stokes (2004) A spatial analysis and test of oligopolistic competition in the market for audit services. Technical report Working paper, University of British Columbia. Cited by: §1.
  • [12] A. Chaudhry, M. Ranzato, M. Rohrbach, and M. Elhoseiny (2018) Efficient Lifelong Learning with A-GEM. arXiv preprint arXiv:1812.00420. Cited by: §2.
  • [13] A. Chaudhry, M. Rohrbach, M. Elhoseiny, T. Ajanthan, P. K. Dokania, P. H. Torr, and M. Ranzato (2019) Continual learning with tiny episodic memories.

    ICML Workshop on Multi-Task and Lifelong Reinforcement Learning

    Cited by: §2.
  • [14] S. Cho, M. A. Vasarhelyi, T. Sun, and C. Zhang (2020) Learning from machine learning in accounting and assurance. Journal of Emerging Technologies in Accounting 17 (1), pp. 1–10. Cited by: §2.
  • [15] F. K. Dankar and K. El Emam (2012) The application of differential privacy to health data. In Proceedings of the 2012 Joint EDBT/ICDT Workshops, pp. 158–166. Cited by: §2.
  • [16] N. Díaz-Rodríguez, V. Lomonaco, D. Filliat, and D. Maltoni (2018) Don’t Forget, There is More than Forgetting: New Metrics for Continual Learning. arXiv preprint arXiv:1810.13166. Cited by: §2.
  • [17] G. Dickey, S. Blanke, and L. Seaton (2019) Machine learning in auditing. The CPA Journal, pp. 16–21. Cited by: §2.
  • [18] C. Dwork, K. Kenthapadi, F. McSherry, I. Mironov, and M. Naor (2006) Our data, ourselves: Privacy via Distributed Noise Generation. In Annual International Conference on the Theory and Applications of Cryptographic Techniques, pp. 486–503. Cited by: §1, §2.
  • [19] Faker External Links: Link Cited by: §A.6, footnote 5.
  • [20] R. M. French (1999) Catastrophic Forgetting in Connectionist Networks. Trends in Cognitive Sciences 3 (4), pp. 128–135. Cited by: §2, §3.
  • [21] J. Gama, P. Medas, G. Castillo, and P. Rodrigues (2004) Learning with drift detection. In Brazilian symposium on artificial intelligence, pp. 286–295. Cited by: §1.
  • [22] S. V. Grabski, S. A. Leech, and P. J. Schmidt (2011) A review of erp research: a future agenda for accounting information systems. Journal of information systems 25 (1), pp. 37–78. Cited by: §1.
  • [23] S. Grossberg (1988) Nonlinear neural networks: principles, mechanisms, and architectures. Neural networks 1 (1), pp. 17–61. Cited by: §1.
  • [24] S. Hawkins, H. He, G. Williams, and R. Baxter (2002) Outlier Detection using Replicator Neural Networks. In International Conference on Data Warehousing and Knowledge Discovery, pp. 170–180. Cited by: §3, §3.
  • [25] H. Hemati, M. Schreyer, and D. Borth (2021) Continual learning for unsupervised anomaly detection in continuous auditing of financial accounting data. arXiv preprint arXiv:2112.13215. Cited by: §1, §2, §5.
  • [26] G. E. Hinton and R. R. Salakhutdinov (2006) Reducing the Dimensionality of Data with Neural Networks. Science 313 (5786), pp. 504–507. Cited by: §3.
  • [27] R. Hoitash, A. Kogan, and M. A. Vasarhelyi (2006) Peer-based approach for analytical procedures. Auditing: A Journal of Practice & Theory 25 (2), pp. 53–84. Cited by: §1.
  • [28] IFAC (2009) International Standards on Auditing 240, The Auditor’s Responsibilities Relating to Fraud in an Audit of Financial Statements. International Federation of Accountants (IFAC). Cited by: §1.
  • [29] D. Isele and A. Cosgun (2018) Selective Experience Replay for Lifelong Learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32. Cited by: §2.
  • [30] M. Jans, J. M. Van Der Werf, N. Lybaert, and K. Vanhoof (2011) A Business Process Mining Application for Internal Transaction Fraud Mitigation. Expert Systems with Applications 38 (10), pp. 13351–13359. Cited by: §2.
  • [31] E. Jothimurugesan, K. Hsieh, J. Wang, G. Joshi, and P. B. Gibbons (2022) Federated learning under distributed concept drift. arXiv preprint arXiv:2206.00799. Cited by: §1, §2.
  • [32] P. Kairouz, H. B. McMahan, B. Avent, A. Bellet, M. Bennis, A. N. Bhagoji, K. Bonawitz, Z. Charles, G. Cormode, R. Cummings, et al. (2019) Advances and Open Problems in Federated Learning. arXiv preprint arXiv:1912.04977. Cited by: §2, §3.
  • [33] S. P. Karimireddy, S. Kale, M. Mohri, S. Reddi, S. Stich, and A. T. Suresh (2020) Scaffold: stochastic controlled averaging for federated learning. In International Conference on Machine Learning, pp. 5132–5143. Cited by: §A.5, 2nd item, §2, Figure 5, Table 1, §5.
  • [34] R. Khan, M. Corney, A. Clark, and G. Mohay (2009) A role mining inspired approach to representing user behaviour in erp systems. In Asia Pacific Industrial Engineering & Management Systems Conference 2009, Cited by: §2.
  • [35] D. P. Kingma and M. Welling (2013) Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114. Cited by: §2.
  • [36] J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska, et al. (2017) Overcoming Catastrophic Forgetting in Neural Networks. Proceedings of the National Academy of Sciences 114 (13), pp. 3521–3526. Cited by: 1st item, §2, §2, Figure 4, Table 1, §5.
  • [37] A. Kogan and C. Yin (2021) Privacy-preserving information sharing within an audit firm. Journal of Information Systems 35 (2), pp. 243–268. Cited by: §1.
  • [38] Y. LeCun, Y. Bengio, and G. Hinton (2015) Deep learning. Nature 521 (7553). Cited by: §1.
  • [39] M. Lenga, H. Schulz, and A. Saalbach (2020) Continual learning for domain adaptation in chest x-ray classification. In Medical Imaging with Deep Learning, pp. 413–423. Cited by: §2.
  • [40] T. Lesort, M. Caccia, and I. Rish (2021) Understanding continual learning settings with data distribution drift analysis. arXiv preprint arXiv:2104.01678. Cited by: §2.
  • [41] T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V. Smith (2020) Federated optimization in heterogeneous networks. Proceedings of Machine Learning and Systems 2, pp. 429–450. Cited by: §2, Figure 5, Table 1, §5.
  • [42] Z. Li and D. Hoiem (2017) Learning Without Forgetting. IEEE Transactions on Pattern Analysis and Machine Intelligence 40 (12), pp. 2935–2947. Cited by: §2, Figure 4, Table 1, §5.
  • [43] V. Lomonaco, L. Pellegrini, A. Cossu, A. Carta, G. Graffieti, T. L. Hayes, M. D. Lange, M. Masana, J. Pomponi, G. van de Ven, M. Mundt, Q. She, K. Cooper, J. Forest, E. Belouadah, S. Calderara, G. I. Parisi, F. Cuzzolin, A. Tolias, S. Scardapane, L. Antiga, S. Amhad, A. Popescu, C. Kanan, J. van de Weijer, T. Tuytelaars, D. Bacciu, and D. Maltoni (2021) Avalanche: an End-to-End Library for Continual Learning. In

    Proceedings of IEEE Conference on Computer Vision and Pattern Recognition

    2nd Continual Learning in Computer Vision Workshop. Cited by: §A.4.
  • [44] D. Lopez-Paz and M. Ranzato (2017) Gradient Episodic Memory for Continual Learning. Advances in Neural Information Processing Systems 30, pp. 6467–6476. Cited by: §2.
  • [45] M. McGlohon, S. Bay, M. G. M. Anderle, D. M. Steier, and C. Faloutsos (2009) SNARE: A Link Analytic System for Graph Labeling and Risk Detection. In KDD’09: 15th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Cited by: §2.
  • [46] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas (2017) Communication-Efficient Learning of Deep Networks from Decentralized Data. In Artificial Intelligence and Statistics, pp. 1273–1282. Cited by: §1, §2, §3.
  • [47] H. B. McMahan, D. Ramage, K. Talwar, and L. Zhang (2017)

    Learning differentially private recurrent language models

    arXiv preprint arXiv:1710.06963. Cited by: Figure 5, Table 1, §5, §5.
  • [48] J. A. Mendez and E. Eaton (2021) Lifelong Learning of Compositional Structures. In International Conference on Learning Representations, Cited by: §2.
  • [49] J. Nonnenmacher and J. M. Gómez (2021) Unsupervised anomaly detection for internal auditing: literature review and research agenda. The International Journal of Digital Accounting Research 21 (27), pp. 1–22. Cited by: §2.
  • [50] J. Nonnenmacher, F. Kruse, G. Schumann, and J. Marx Gómez (2021) Using autoencoders for data-driven analysis in internal auditing. In Proceedings of the 54th Hawaii International Conference on System Sciences, pp. 5748. Cited by: §1, §3.
  • [51] N. Papernot, M. Abadi, U. Erlingsson, I. Goodfellow, and K. Talwar (2016) Semi-supervised knowledge transfer for deep learning from private training data. arXiv preprint arXiv:1610.05755. Cited by: §2.
  • [52] G. I. Parisi, R. Kemker, J. L. Part, C. Kanan, and S. Wermter (2019) Continual lifelong learning with neural networks: a review. Neural Networks 113, pp. 54–71. Cited by: §1.
  • [53] T. J. Park, K. Kumatani, and D. Dimitriadis (2021) Tackling dynamics in federated incremental learning with variational embedding rehearsal. arXiv preprint arXiv:2110.09695. Cited by: §2, §4.
  • [54] PwC (2022) Protecting the Perimeter: The rise of external Fraud, The Global Economic Crime and Fraud Survey 2022. PricewaterhouseCoopers LLP. Cited by: §1.
  • [55] R. Ramamurthy, M. Pielka, R. Stenzel, C. Bauckhage, R. Sifa, T. D. Khameneh, U. Warning, B. Kliem, and R. Loitz (2021) ALiBERT: improved automated list inspection (ali) with bert. In 21st ACM Symposium on Document Engineering, Cited by: §1, §2.
  • [56] S. Rebuffi, A. Kolesnikov, G. Sperl, and C. H. Lampert (2017)

    ICARL: Incremental Classifier and Representation Learning

    In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 2001–2010. Cited by: §2, Figure 4, §5.
  • [57] S. Reddi, Z. Charles, M. Zaheer, Z. Garrett, K. Rush, J. Konečnỳ, S. Kumar, and H. B. McMahan (2020) Adaptive federated optimization. arXiv preprint arXiv:2003.00295. Cited by: §2, Figure 5, §5.
  • [58] D. Rolnick, A. Ahuja, J. Schwarz, T. P. Lillicrap, and G. Wayne (2018) Experience Replay for Continual Learning. arXiv preprint arXiv:1811.11682. Cited by: §2.
  • [59] A. A. Rusu, N. C. Rabinowitz, G. Desjardins, H. Soyer, J. Kirkpatrick, K. Kavukcuoglu, R. Pascanu, and R. Hadsell (2016) Progressive Neural Networks. arXiv preprint arXiv:1606.04671. Cited by: §2.
  • [60] M. Schreyer, M. Baumgartner, F. Ruud, and D. Borth (2022) Artificial Intelligence in Internal Audit as a Contribution to Effective Governance. Expert Focus 01. Cited by: §3, §4.
  • [61] M. Schreyer, T. Sattarov, D. Borth, A. Dengel, and B. Reimer (2017) Detection of anomalies in large scale accounting data using deep autoencoder networks. arXiv preprint arXiv:1709.05254. Cited by: Figure 1.
  • [62] M. Schreyer, T. Sattarov, and D. Borth (2021) Multi-view contrastive self-supervised learning of accounting data representations for downstream audit tasks. In International Conference on Artificial Intelligence, Cited by: §1, §2.
  • [63] M. Schreyer, T. Sattarov, and D. Borth (2022) Federated and privacy-preserving learning of accounting data in financial statement audits. arXiv preprint arXiv:2208.12708. Cited by: §2, §5.
  • [64] M. Schreyer, T. Sattarov, A. S. Gierbl, B. Reimer, and D. Borth (2020)

    Learning sampling in financial statement audits using vector quantised variational autoencoder neural networks

    In International Conference on Artificial Intelligence, Cited by: §1.
  • [65] M. Schreyer, T. Sattarov, B. Reimer, and D. Borth (2019) Adversarial learning of deepfakes in accounting. NeurIPS 2019 Workshop on Robust AI in Financial Services, Vancouver, BC, Canada. Cited by: §2.
  • [66] M. Schreyer, T. Sattarov, C. Schulze, B. Reimer, and D. Borth (2019) Detection of accounting anomalies in the latent space using adversarial autoencoder neural networks. 2nd KDD Workshop on Anomaly Detection in Finance, USA. Cited by: §A.3.
  • [67] M. Schultz and M. Tropmann-Frick (2020) Autoencoder neural networks versus external auditors: detecting unusual journal entries in financial statement audits. In Proceedings of the 53rd Hawaii International Conference on System Sciences, Cited by: Figure 1, §1, §2, §3, §3.
  • [68] H. Shin, J. K. Lee, J. Kim, and J. Kim (2017) Continual Learning with Deep Generative Replay. arXiv preprint arXiv:1705.08690. Cited by: §2.
  • [69] N. Shoham, T. Avidor, A. Keren, N. Israel, D. Benditkis, L. Mor-Yosef, and I. Zeitak (2019) Overcoming forgetting in federated learning on non-iid data. arXiv preprint arXiv:1910.07796. Cited by: §2.
  • [70] R. Sifa, A. Ladi, M. Pielka, R. Ramamurthy, L. Hillebrand, B. Kirsch, D. Biesner, R. Stenzel, T. Bell, M. Lübbering, et al. (2019) Towards automated auditing with machine learning. In ACM Symposium on Document Engineering 2019, Cited by: §1.
  • [71] T. Sun and M. A. Vasarhelyi (2017) Deep learning and the future of auditing: how an evolving technology could transform analysis and improve judgment.. CPA Journal 87 (6). Cited by: §2.
  • [72] T. Sun (2019) Applying Deep Learning to Audit Procedures: An Illustrative Framework. Accounting Horizons 33 (3), pp. 89–109. Cited by: §1, §2, §2.
  • [73] S. Thiprungsri and M. A. Vasarhelyi (2011) Cluster Analysis for Anomaly Detection in Accounting Data: An Audit Approach. International Journal of Digital Accounting Research 11. Cited by: §2.
  • [74] S. Thrun (1995) A lifelong learning perspective for mobile robot control. In Intelligent robots and systems, pp. 201–214. Cited by: §1, §2, §3.
  • [75] M. Toneva, A. Sordoni, R. T. d. Combes, A. Trischler, Y. Bengio, and G. J. Gordon (2018) An Empirical Study of Example Forgetting During Deep Neural Network Learning. arXiv preprint arXiv:1812.05159. Cited by: §2.
  • [76] M. A. Vasarhelyi, M. G. Alles, and A. Kogan (2018) Principles of analytic monitoring for continuous assurance. In Continuous Auditing, Cited by: §1, §1.
  • [77] M. A. Vasarhelyi and F. B. Halper (1991) The continuous audit of online systems. In Auditing: A Journal of Practice and Theory, Cited by: §1, §1.
  • [78] H. Wang, M. Yurochkin, Y. Sun, D. Papailiopoulos, and Y. Khazaeni (2020) Federated learning with matched averaging. arXiv preprint arXiv:2002.06440. Cited by: §2.
  • [79] J. T. Wells (2017) Corporate Fraud Handbook: Prevention and Detection. John Wiley & Sons. Cited by: §1.
  • [80] M. Werner and N. Gehrke (2015) Multilevel Process Mining for Financial Audits. IEEE Transactions on Services Computing 8 (6), pp. 820–832. Cited by: §2.
  • [81] J. Yoon, W. Jeong, G. Lee, E. Yang, and S. J. Hwang (2021) Federated continual learning with weighted inter-client transfer. In International Conference on Machine Learning, pp. 12073–12086. Cited by: Figure 2, §2.
  • [82] K. Yoon, L. Hoogduin, and L. Zhang (2015) Big Data as Complementary Audit Evidence. Accounting Horizons 29 (2), pp. 431–438. Cited by: §1.
  • [83] M. Yurochkin, M. Agarwal, S. Ghosh, K. Greenewald, N. Hoang, and Y. Khazaeni (2019) Bayesian nonparametric federated learning of neural networks. In International Conference on Machine Learning, pp. 7252–7261. Cited by: §2.
  • [84] F. Zenke, B. Poole, and S. Ganguli (2017) Continual Learning through Synaptic Intelligence. In International Conference on Machine Learning, pp. 3987–3995. Cited by: §2.
  • [85] Y. Zheng, Z. Wu, Y. Yuan, T. Chen, and Z. Wang (2020) PCAL: a privacy-preserving intelligent credit risk modeling framework based on adversarial learning. arXiv preprint arXiv:2010.02529. Cited by: §2.
  • [86] M. Zupan, V. Budimir, and S. Letinic (2020) Journal entry anomaly detection model. Intelligent Systems in Accounting, Finance and Management 27 (4), pp. 197–209. Cited by: §1, §2, §3.

Appendix A Appendix

In the Appendix, we provide additional details of the datasets, data preprocessing, federated continual learning scenarios, and experimental setup applied to detect accounting anomalies in an FCL setting.

a.1 Datasets and Data Preprocessing:

We use three publicly available datasets of real-world financial city payment data. The datasets exhibit high similarity to ERP accounting data, e.g., typical manual payments (SAP T-Code: F-53) or payment runs (SAP T-Code: F-110):

For each dataset , we pre-process the original payment line-item attributes to (i) remove semantically redundant attributes and (ii) obtain an encoded representation of each payment. The following pre-processing is applied to the distinct payment attributes of the datasets:

  • [leftmargin=5.5mm, labelwidth=0.3em, align=left]

  • The categorical attribute values are converted into one-hot numerical tuples of bits , where denotes the number of unique attribute values in .

  • The numerical attribute values are scaled, according to , where () denotes the minimum (maximum) attribute value of .

In the following, the pre-processed journal entries are denoted as and corresponding individual attributes as .

a.2 Federated Continual Learning (FCL) Scenarios

For each federated audit client , we create a stream of continual payment experiences . A single experience encompasses payment activities, each generated by a different city department. In our experiments we selected the following city departments to sample from each dataset respectively:

  • [leftmargin=5.5mm, labelwidth=0.3em, align=left]

  • The selected City of Philadelphia departments are (i) ‘42 Commerce’, (ii) ‘52 Free Library’, (iii) ‘10 Managing Director’, (iv) ‘11 Police’, and (v) ‘14 Health’.

  • The selected City of Chicago departments are (i) ‘Dept. of Family and Suppport Services’, (ii) ‘Dept. of Aviation’, (iii) ‘Chicago Department of Transportation’, (iv) ‘Department of Health’, and (v) ‘Department of Water Management’.

  • The selected City of York departments are (i) ‘Adult Social Care’, (ii) ‘Economy Regeneration and Housing’, (iii) ‘Housing and Community Safety’, (iv) ‘Transport Highways and Environ.’, and (v) ‘School Funding and Assets’.

The departments exhibit a high volume of payment posting activity in each dataset. Furthermore, the departments correspond to different municipal duties establishing a certain degree of department non-iid setting. Following, three continual learning scenarios are generated as described in Sec. 4 and illustrated in Fig. 2. For each payment activity and experience , the payments are sampled according to three pre-defined scenario configurations, denoted as [Scenario 1-3] in the following. Figure 6 shows the configuration bar charts of the payment sampling for each scenario and FCL client. A bar at particular client and experience and city department indicates that = 1,000 payments have been sampled in the FCL setting.

Figure 6: Experimental [Scenario 1-3] configurations of the evaluated FCL settings. The distinct configurations illustrate the payment activity per department at each of the federated clients: [Scenario 1] top-row, [Scenario 2] middle-row, and [Scenario 3] bottom-row.

a.3 Representation Learning Setting Details

We use a symmetrical encoder and decoder architecture, in all our experiments. Furthermore, we use two architectural setups, each designed to detect a different activity of injected anomalies in both datasets. A ‘shallow’ architecture, as shown in Tab. 2, designed to detect global anomalies and a ‘deep’ architecture, as shown in Tab. 3, designed to detect the local anomalies.

Layer = 1 2 3 4 5 6 7 8
|| 128 64 32 16 8 4 2
2 4 8 16 32 64 128 ||
Table 2:

Number of neurons per layer

of the encoder and decoder network that constitute the AEN architecture to detect global anomalies used in our experiments.
Layer = 1 2 3 4 5 6 7 8 9 10 11 12
|| 2048 1024 512 256 128 64 32 16 8 4 2
2 4 8 16 32 64 128 256 512 1024 2048 ||
Table 3: Number of neurons per layer of the encoder and decoder network that constitute the AEN architecture to detect local anomalies used in our experiments.

In all architectural setups, we apply Leaky-ReLU non-linear activations with a scaling factor

except in the encoder’s bottleneck and decoder’s final layer, which comprise Tanh activations. We use a batch size of 16 journal entries per training iteration , apply Adam optimization with , , and early stopping once the reconstruction loss converges. Given an encoded journal entry and its reconstruction , we compute a combined loss of a binary cross-entropy error loss and a mean squared error loss, as defined by [66]:


where denotes the number of categorical attributes, the number of numerical attributes, and balances the categorical and numerical attribute losses. For each categorical attribute , we compute the , as defined by:



corresponds to the number of one-hot encoded attribute dimensions of a given categorical attribute. For each numerical attribute

, we compute the , as defined by:


We balance both loss terms in to account for the high number of categorical attributes in each city payment dataset and set in all experiments.

a.4 Continual Learning Setting Details

We set the number of CL experiences in all our experiments. In ER learning, we set the replay buffer size samples, which seemed sufficient in all datasets. The buffer keeps a stratified sample of all departments observed until the current experience. For EWC we found that preserves an optimal degree of plasticity. For LwF we determined that a yields a good knowledge distillation. We reset the optimizer parameters upon each experience to avoid past experience information transfer through the optimizer state. We adapted the different CL strategies from the algorithmic implementations of ER, LwF, and EWC available in the Avalanche v0.2.1999 CL library [43].

a.5 Federated Learning Setting Details

The models are trained for 5 communication rounds per experience, while each round encompasses training iterations. In each experiment, we set the number of participating clients to 4 assuming a real-world peer audit client setting. In FedProx learning, we found that setting the proximal term to 1.2 yields a good regularization of the local model updates. For the SCAFFOLD algorithm, we use the Option II implementation proposed in [33] by re-using previously computed gradients to update the control variate . In all our experiments we built upon the FedAvg strategy and the FL framework implementation available in the Flower v0.19.0101010 FL library [8].

a.6 Anomaly Detection Details

In each experience, we randomly inject 20 global and 20 local anomalies into each payment activity of the in-scope audit client ( 4% of each payment activity). To create both classes of anomalies, we use the Faker v14.2.0 111111 library [19] using five distinct random seed initialisation of the random anomaly sampling mechanism. To quantitatively assess the anomaly detection capability of the FCL audit setting, we determine the in-scope audit client’s average precision over the sorted payments reconstruction errors. The summarises the precision-recall curve as formally defined by:


where denotes the detection precision, and denotes the detection recall of the i-th reconstruction error threshold.