Customs Import Declaration Datasets

08/04/2022
by   Chaeyoon Jeong, et al.
0

Given the huge volume of cross-border flows, effective and efficient control of trades becomes more crucial in protecting people and society from illicit trades while facilitating legitimate trades. However, limited accessibility of the transaction-level trade datasets hinders the progress of open research, and lots of customs administrations have not benefited from the recent progress in data-based risk management. In this paper, we introduce an import declarations dataset to facilitate the collaboration between the domain experts in customs administrations and data science researchers. The dataset contains 54,000 artificially generated trades with 22 key attributes, and it is synthesized with CTGAN while maintaining correlated features. Synthetic data has several advantages. First, releasing the dataset is free from restrictions that do not allow disclosing the original import data. Second, the fabrication step minimizes the possible identity risk which may exist in trade statistics. Lastly, the published data follow a similar distribution to the source data so that it can be used in various downstream tasks. With the provision of data and its generation process, we open baseline codes for fraud detection tasks, as we empirically show that more advanced algorithms can better detect frauds.

READ FULL TEXT
research
11/03/2020

Tabular Transformers for Modeling Multivariate Time Series

Tabular datasets are ubiquitous in data science applications. Given thei...
research
05/07/2023

Shall We Trust All Relational Tuples by Open Information Extraction? A Study on Speculation Detection

Open Information Extraction (OIE) aims to extract factual relational tup...
research
05/10/2021

Transitioning from Real to Synthetic data: Quantifying the bias in model

With the advent of generative modeling techniques, synthetic data and it...
research
08/25/2021

CDCGen: Cross-Domain Conditional Generation via Normalizing Flows and Adversarial Training

How to generate conditional synthetic data for a domain without utilizin...
research
08/09/2020

Partially Synthetic Data for Recommender Systems: Prediction Performance and Preference Hiding

This paper demonstrates the potential of statistical disclosure control ...
research
03/03/2023

Interoperability-oriented Quality Assessment for Czech Open Data

With the rapid increase of published open datasets, it is crucial to sup...
research
12/12/2019

SegTHOR: Segmentation of Thoracic Organs at Risk in CT images

In the era of open science, public datasets, along with common experimen...

Please sign up or login with your details

Forgot password? Click here to reset