Financial institutions who process payment have to fight a tremendous amount of fraudulent activities. In 2018, the total level of card fraud losses amounted to 1.8 billion euros in the Single European Payment Area [ecb2018]. At the global level, it is expected to grow to more than 34 billion by 2022. Payment processors also face indirect financial repercussions as fraud jeopardizes the trust of their customers, i.e. banks and cardholders. It is therefore obvious for them to resort to Fraud Detection Systems (FDS). Most FDS have a rule-based basis (e.g. ”raise an alert if previous transaction from cardholder occurred less than 5 seconds before”) [kou2004survey], designed by fraud experts, which can be specific (low number of false positives) but not very sensitive nor flexible. In particular, the cat-and-mouse game played by fraudsters makes them quickly obsolete, and their update requires a lot of manual work. Adding an incremental ML-based system can provide a complementarity111From experience, combining expert rules and ML models provides from 20% to 100% better results than both approaches separately. and the required adaptability to make an overall powerful FDS [dal2014learned, lebichot2020incremental]. ML also opens the door to underlying techniques like Transfer Learning (TL) [lebichot2019deep]. TL consists in using an existing model to perform well on new domains (e.g. new clients) with little/no supervision. In our work, we focus our research on this topic as it provides a very competitive business advantage (customer prospect). In particular, it helps getting new customers on board by promising a less perilous start.
Rather than going into details with our research findings, we here describe our back-and-forth journey between business and production: from the formulation of practical business constraints/challenges on FDS and TL, to the research study definition and conclusions, and finally to the transformation of prototypes into concrete assets for deployment. We give a small tour on the reproducibility issues that can arise along the process and our return on experience to anticipate them.
Ii The Business constraints and challenges
Most machine learning research studies for fraud detection consist in getting an appropriate dataset, and proposing an algorithm to optimize selected performance metrics. But really implementing ML results in production usually starts with open questions (e.g. how to discover new fraud patterns quickly without a heavy human intervention?) and imposes a lot of additional constraints: (1) Data collection: Putting a dataset together for fraud detection must be done with caution (e.g. proper train-test temporal split with verification gap [dal2014learned], denoised annotations, etc.). Data-centric considerations are now acknowledged by experts to be key in the overall ML process. [alazizi2019anomaly] gives a non-exhaustive list of those related to the fraud detection problem. (2) Metrics: Apart from detection rate, metrics like memory/time consumption are primordial for efficient real time fraud detection [tran2018real]. Contracts can also impose a minimal level of performance (e.g a precision above 1/3). Finally, interpretable indicators are required for continous improvement [siblini2020master]. (3) Compliance and interoperability: the ML pipeline should comply with regulations (e.g. limits in data storage/usage), be integrable in the payment process, and interact properly with the production platform and services. (4) Relatively to transfer learning, specific constraints can arise like having to homogenize the variables encoding and the processing pipeline between source and target domains.
Iii The research problem and its conclusions
To formalize Transfer Learning, let us introduce the concept of domains (described by variables and distributions ) and tasks (e.g. predict fraud by modeling with a function ). Given a source domain , task and dataset (usually rich in annotations), TL uses the solution to compute a solution for a target domain , task and dataset (usually poor in annotations). In our project, we focused on two use-case scenarios for TL: reusing fraud detection models for online payment (resp. payment from bank A) to payment in physical terminals (resp. payment from bank B). These two cases fall into a category called ”Homogeneous domain adaptation”: equivalent tasks and same or close set of variables, but different distributions since customers (resp. fraudsters) do not have the same habits (resp. techniques) from one domain to another. To tackle our problem, our methodology was to adapt, implement and evaluate a large number of approaches from the literature. For the sake of conciseness, we refer the reader to [lebichot2019deep] for details on the used algorithms and obtained results. Our main conclusions is that models trained mostly on source perform better when target has a very low quantity of annotations. Then, when target labels get less scarce, semi-supervised domain adaptation methods take over the reins. Overall, a combination model trained only on source and model trained on both target and source performs well in most cases. We therefore decided to integrate this combination in production.
Iv The Integration Methodology: make it Happen
Figure 1 shows the steps to industrialize findings. Research is followed by Pre-industrialization, during which (1) the work is adapted to be integrated into production-compatible tools (libraries, versioning tools) and (2) results are confirmed on larger datasets. Then comes Industrialization, where methods are adapted and deployed to the production environment.
As the steps are usually carried one after another (with delays of several months) and have their own requirements, several reproducibility issues arise. Some are related to the data. Usually, a rather small subset of data is selected for the research part. Meanwhile, new data is collected so when the study is over, one has to check that the results still hold. Sometimes, it is even necessary to change the data format and/or integrate new features. Other issues are related to the code. Research and pre-industrialization usually use different environments (also different from the actual production environment) due (i) to their different objectives (for instance research can require a flexible environment like Jupyter) and (ii) due to collaboration between different entities (e.g. academics and industrials). As a consequence, library versions can require adaptation either in the delivered code or in the platform welcoming it. Additionally, a standalone research implementation might contain code snippets which are already implemented into the next stages’ libraries. In this case, the code needs to be refactored accordingly. Lastly, versioning tools (Git, DVC, wandb, …) for code, data and results are essential in later steps. To properly apply them, the code needs adaptations as well.
To prevent reproducibility challenges, our main return on experience is that all steps from business motivation to research study and then industrialization should not be seen as linear and successive. It is important (1) to anticipate: providing pieces of the production environment as a basis for the research implementation allows an easier later integration (savings counted in months) and even a cleaner formulation of the problem; and (2) to do several quick rounds with all the steps instead of a single long one: integrating code soon allows to capture hidden issues and redefining formulation actively to reorientate the study. However, it remains important to maintain flexibility because if research gets too restricted by business constraints, its findings can remain limited to incremental innovations with moderate added value [assink2006inhibitors].
Research and development is important for competitiveness in the industry. It demonstrates expertise and creates value. It is also at the core of the ML revolution. But we usually tend to only communicate on the research part of the process. Yet the complete path from emergence to deployment hides many other challenges. Good practice is to anticipate them to go further faster. Promising formalized directions start to emerge for instance with MLOps [van2020model], the set of recipes/tools for ML deployment. This however does not mean that business should constrain research. It allows efficient integration of course but can prevent disruptive innovations [assink2006inhibitors].