A Large-Scale Database for Graph Representation Learning

11/16/2020
by   Scott Freitas, et al.
0

With the rapid emergence of graph representation learning, the construction of new large-scale datasets are necessary to distinguish model capabilities and accurately assess the strengths and weaknesses of each technique. By carefully analyzing existing graph databases, we identify 3 critical components important for advancing the field of graph representation learning: (1) large graphs, (2) many graphs, and (3) class diversity. To date, no single graph database offers all of these desired properties. We introduce MalNet, the largest public graph database ever constructed, representing a large-scale ontology of software function call graphs. MalNet contains over 1.2 million graphs, averaging over 17k nodes and 39k edges per graph, across a hierarchy of 47 types and 696 families. Compared to the popular REDDIT-12K database, MalNet offers 105x more graphs, 44x larger graphs on average, and 63x the classes. We provide a detailed analysis of MalNet, discussing its properties and provenance. The unprecedented scale and diversity of MalNet offers exciting opportunities to advance the frontiers of graph representation learning—enabling new discoveries and research into imbalanced classification, explainability and the impact of class hardness. The database is publically available at www.mal-net.org.

READ FULL TEXT

page 5

page 8

research
01/31/2021

MalNet: A Large-Scale Cybersecurity Image Database of Malicious Software

Computer vision is playing an increasingly important role in automated m...
research
06/27/2022

A Representation Learning Framework for Property Graphs

Representation learning on graphs, also called graph embedding, has demo...
research
03/15/2022

PDNS-Net: A Large Heterogeneous Graph Benchmark Dataset of Network Resolutions for Graph Learning

In order to advance the state of the art in graph learning algorithms, i...
research
01/05/2021

LSSD: a Controlled Large JPEG Image Database for Deep-Learning-based Steganalysis "into the Wild"

For many years, the image databases used in steganalysis have been relat...
research
08/25/2022

A Survey on Temporal Graph Representation Learning and Generative Modeling

Temporal graphs represent the dynamic relationships among entities and o...
research
05/19/2022

Are Graph Representation Learning Methods Robust to Graph Sparsity and Asymmetric Node Information?

The growing popularity of Graph Representation Learning (GRL) methods ha...
research
09/27/2016

Benchmarking the Graphulo Processing Framework

Graph algorithms have wide applicablity to a variety of domains and are ...

Please sign up or login with your details

Forgot password? Click here to reset