Synthesizing Credit Card Transactions

10/04/2019
by   Erik R. Altman, et al.
0

Two elements have been essential to AI's recent boom: (1) deep neural nets and the theory and practice behind them; and (2) cloud computing with its abundant labeled data and large computing resources. Abundant labeled data is available for key domains such as images, speech, natural language processing, and recommendation engines. However, there are many other domains where such data is not available, or access to it is highly restricted for privacy reasons, as with health and financial data. Even when abundant data is available, it is often not labeled. Doing such labeling is labor-intensive and non-scalable. As a result, to the best of our knowledge, key domains still lack labeled data or have at most toy data; or the synthetic data must have access to real data from which it can mimic new data. This paper outlines work to generate realistic synthetic data for an important domain: credit card transactions. Some challenges: there are many patterns and correlations in real purchases. There are millions of merchants and innumerable locations. Those merchants offer a wide variety of goods. Who shops where and when? How much do people pay? What is a realistic fraudulent transaction? We use a mixture of technical approaches and domain knowledge including mechanics of credit card processing, a broad set of consumer domains: electronics, clothing, hair styling, etc. Connecting everything is a virtual world. This paper outlines some of our key techniques and provides evidence that the data generated is indeed realistic. Beyond the scope of this paper: (1) use of our data to develop and train models to predict fraud; (2) coupling models and the synthetic dataset to assess performance in designing accelerators such as GPUs and TPUs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/27/2022

A Combination of Deep Neural Networks and K-Nearest Neighbors for Credit Card Fraud Detection

Detection of a Fraud transaction on credit cards became one of the major...
research
12/02/2020

Unsupervised Neural Domain Adaptation for Document Image Binarization

Binarization is a well-known image processing task, whose objective is t...
research
03/07/2021

What If We Only Use Real Datasets for Scene Text Recognition? Toward Scene Text Recognition With Fewer Labels

Scene text recognition (STR) task has a common practice: All state-of-th...
research
05/31/2022

Hands-Up: Leveraging Synthetic Data for Hands-On-Wheel Detection

Over the past few years there has been major progress in the field of sy...
research
05/08/2017

Machine Learning with World Knowledge: The Position and Survey

Machine learning has become pervasive in multiple domains, impacting a w...

Please sign up or login with your details

Forgot password? Click here to reset