A Framework for Large Scale Synthetic Graph Dataset Generation

10/04/2022
by   Sajad Darabi, et al.
0

Recently there has been increasing interest in developing and deploying deep graph learning algorithms for many graph analysis tasks such as node and edge classification, link prediction, and clustering with numerous practical applications such as fraud detection, drug discovery, or recommender systems. Allbeit there is a limited number of publicly available graph-structured datasets, most of which are tiny compared to production-sized applications with trillions of edges and billions of nodes. Further, new algorithms and models are benchmarked across similar datasets with similar properties. In this work, we tackle this shortcoming by proposing a scalable synthetic graph generation tool that can mimic the original data distribution of real-world graphs and scale them to arbitrary sizes. This tool can be used then to learn a set of parametric models from proprietary datasets that can subsequently be released to researchers to study various graph methods on the synthetic data increasing prototype development and novel applications. Finally, the performance of the graph learning algorithms depends not only on the size but also on the dataset's structure. We show how our framework generalizes across a set of datasets, mimicking both structural and feature distributions as well as its scalability across varying dataset sizes.

READ FULL TEXT
research
04/04/2022

Synthetic Graph Generation to Benchmark Graph Learning

Graph learning algorithms have attained state-of-the-art performance on ...
research
06/05/2019

GRAM: Scalable Generative Models for Graphs with Graph Attention Mechanism

Graphs are ubiquitous real-world data structures, and generative models ...
research
01/30/2021

Synthetic Dataset Generation of Driver Telematics

This article describes techniques employed in the production of a synthe...
research
11/27/2018

Adaptive-similarity node embedding for scalable learning over graphs

Node embedding is the task of extracting informative and descriptive fea...
research
05/30/2023

GraphCleaner: Detecting Mislabelled Samples in Popular Graph Learning Benchmarks

Label errors have been found to be prevalent in popular text, vision, an...
research
01/11/2018

A tool framework for tweaking features in synthetic datasets

Researchers and developers use benchmarks to compare their algorithms an...
research
04/26/2021

Synthetic 3D Data Generation Pipeline for Geometric Deep Learning in Architecture

With the growing interest in deep learning algorithms and computational ...

Please sign up or login with your details

Forgot password? Click here to reset