Computing Graph Descriptors on Edge Streams

09/02/2021
by   Zohair Raza Hassan, et al.
0

Graph feature extraction is a fundamental task in graphs analytics. Using feature vectors (graph descriptors) in tandem with data mining algorithms that operate on Euclidean data, one can solve problems such as classification, clustering, and anomaly detection on graph-structured data. This idea has proved fruitful in the past, with spectral-based graph descriptors providing state-of-the-art classification accuracy on benchmark datasets. However, these algorithms do not scale to large graphs since: 1) they require storing the entire graph in memory, and 2) the end-user has no control over the algorithm's runtime. In this paper, we present single-pass streaming algorithms to approximate structural features of graphs (counts of subgraphs of order k ≥ 4). Operating on edge streams allows us to avoid keeping the entire graph in memory, and controlling the sample size enables us to control the time taken by the algorithm. We demonstrate the efficacy of our descriptors by analyzing the approximation error, classification accuracy, and scalability to massive graphs. Our experiments showcase the effect of the sample size on approximation error and predictive accuracy. The proposed descriptors are applicable on graphs with millions of edges within minutes and outperform the state-of-the-art descriptors in classification accuracy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/28/2020

Estimating Descriptors for Large Graphs

Embedding networks into a fixed dimensional Euclidean feature space, whi...
research
03/03/2020

Just SLaQ When You Approximate: Accurate Spectral Distances for Web-Scale Graphs

Graph comparison is a fundamental operation in data mining and informati...
research
05/30/2018

Anonymous Walk Embeddings

The task of representing entire graphs has seen a surge of prominent res...
research
07/30/2015

When VLAD met Hilbert

Vectors of Locally Aggregated Descriptors (VLAD) have emerged as powerfu...
research
03/28/2022

GraphZeppelin: Storage-Friendly Sketching for Connected Components on Dynamic Graph Streams

Finding the connected components of a graph is a fundamental problem wit...
research
09/02/2018

Mining Frequent Patterns in Evolving Graphs

Given a labeled graph, the frequent-subgraph mining (FSM) problem asks t...
research
09/29/2020

Efficient SVDD Sampling with Approximation Guarantees for the Decision Boundary

Support Vector Data Description (SVDD) is a popular one-class classifier...

Please sign up or login with your details

Forgot password? Click here to reset