Airline Passenger Name Record Generation using Generative Adversarial Networks

07/17/2018
by   Alejandro Mottini, et al.
0

Passenger Name Records (PNRs) are at the heart of the travel industry. Created when an itinerary is booked, they contain travel and passenger information. It is usual for airlines and other actors in the industry to inter-exchange and access each other's PNR, creating the challenge of using them without infringing data ownership laws. To address this difficulty, we propose a method to generate realistic synthetic PNRs using Generative Adversarial Networks (GANs). Unlike other GAN applications, PNRs consist of categorical and numerical features with missing/NaN values, which makes the use of GANs challenging. We propose a solution based on Cramér GANs, categorical feature embedding and a Cross-Net architecture. The method was tested on a real PNR dataset, and evaluated in terms of distribution matching, memorization, and performance of predictive models for two real business problems: client segmentation and passenger nationality prediction. Results show that the generated data matches well with the real PNRs without memorizing them, and that it can be used to train models for real business applications.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/03/2018

Generating Multi-Categorical Samples with Generative Adversarial Networks

We propose a method to train generative adversarial networks on mutivari...
research
11/11/2017

Disease Prediction from Electronic Health Records Using Generative Adversarial Networks

Electronic health records (EHRs) have contributed to the computerization...
research
03/01/2021

On the Fairness of Generative Adversarial Networks (GANs)

Generative adversarial networks (GANs) are one of the greatest advances ...
research
08/13/2020

Synthesizing Property Casualty Ratemaking Datasets using Generative Adversarial Networks

Due to confidentiality issues, it can be difficult to access or share in...
research
08/20/2020

Conditional Wasserstein GAN-based Oversampling of Tabular Data for Imbalanced Learning

Class imbalance is a common problem in supervised learning and impedes t...
research
07/04/2018

Generating Synthetic but Plausible Healthcare Record Datasets

Generating datasets that "look like" given real ones is an interesting t...
research
05/03/2021

Synthesizing time-series wound prognosis factors from electronic medical records using generative adversarial networks

Wound prognostic models not only provide an estimate of wound healing ti...

Please sign up or login with your details

Forgot password? Click here to reset