Effective and Scalable Clustering on Massive Attributed Graphs

02/07/2021
by   Renchi Yang, et al.
0

Given a graph G where each node is associated with a set of attributes, and a parameter k specifying the number of output clusters, k-attributed graph clustering (k-AGC) groups nodes in G into k disjoint clusters, such that nodes within the same cluster share similar topological and attribute characteristics, while those in different clusters are dissimilar. This problem is challenging on massive graphs, e.g., with millions of nodes and billions of edges. For such graphs, existing solutions either incur prohibitively high costs, or produce clustering results with compromised quality. In this paper, we propose ACMin, an effective approach to k-AGC that yields high-quality clusters with cost linear to the size of the input graph G. The main contributions of ACMin are twofold: (i) a novel formulation of the k-AGC problem based on an attributed multi-hop conductance quality measure custom-made for this problem setting, which effectively captures cluster coherence in terms of both topological proximities and attribute similarities, and (ii) a linear-time optimization solver that obtains high-quality clusters iteratively, based on efficient matrix operations such as orthogonal iterations, an alternative optimization approach, as well as an initialization technique that significantly speeds up the convergence of ACMin in practice. Extensive experiments, comparing 11 competitors on 6 real datasets, demonstrate that ACMin consistently outperforms all competitors in terms of result quality measured against ground-truth labels, while being up to orders of magnitude faster. In particular, on the Microsoft Academic Knowledge Graph dataset with 265.2 million edges and 1.1 billion attribute values, ACMin outputs high-quality results for 5-AGC within 1.68 hours using a single CPU core, while none of the 11 competitors finish within 3 days.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/21/2018

Non-linear Attributed Graph Clustering by Symmetric NMF with PU Learning

We consider the clustering problem of attributed graphs. Our challenge i...
research
09/20/2021

Network Clustering by Embedding of Attribute-augmented Graphs

In this paper we propose a new approach to detect clusters in undirected...
research
05/27/2019

Scaling Fine-grained Modularity Clustering for Massive Graphs

Modularity clustering is an essential tool to understand complicated gra...
research
03/24/2020

Incorporating User's Preference into Attributed Graph Clustering

Graph clustering has been studied extensively on both plain graphs and a...
research
09/12/2020

Smoothness Sensor: Adaptive Smoothness-Transition Graph Convolutions for Attributed Graph Clustering

Clustering techniques attempt to group objects with similar properties i...
research
04/02/2023

Enhancing Cluster Quality of Numerical Datasets with Domain Ontology

Ontology-based clustering has gained attention in recent years due to th...
research
07/27/2021

Scalable Community Detection via Parallel Correlation Clustering

Graph clustering and community detection are central problems in modern ...

Please sign up or login with your details

Forgot password? Click here to reset