Network structure, metadata and the prediction of missing nodes and annotations

04/01/2016
by   Darko Hric, et al.
0

The empirical validation of community detection methods is often based on available annotations on the nodes that serve as putative indicators of the large-scale network structure. Most often, the suitability of the annotations as topological descriptors itself is not assessed, and without this it is not possible to ultimately distinguish between actual shortcomings of the community detection algorithms on one hand, and the incompleteness, inaccuracy or structured nature of the data annotations themselves on the other. In this work we present a principled method to access both aspects simultaneously. We construct a joint generative model for the data and metadata, and a nonparametric Bayesian framework to infer its parameters from annotated datasets. We assess the quality of the metadata not according to its direct alignment with the network communities, but rather in its capacity to predict the placement of edges in the network. We also show how this feature can be used to predict the connections to missing nodes when only the metadata is available, as well as missing metadata. By investigating a wide range of datasets, we show that while there are seldom exact agreements between metadata tokens and the inferred data groups, the metadata is often informative of the network structure nevertheless, and can improve the prediction of missing nodes. This shows that the method uncovers meaningful patterns in both the data and metadata, without requiring or expecting a perfect agreement between the two.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/09/2021

Metadata-informed community detection with lazy encoding using absorbing random walks

Integrating structural information and metadata, such as gender, social ...
research
10/17/2022

Implicit models, latent compression, intrinsic biases, and cheap lunches in community detection

The task of community detection, which aims to partition a network into ...
research
06/05/2021

IM-META: Influence Maximization Using Node Metadata in Networks With Unknown Topology

In real-world applications of influence maximization (IM), the network s...
research
04/25/2011

Bayesian approach for near-duplicate image detection

In this paper we propose a bayesian approach for near-duplicate image de...
research
10/24/2018

A Map Equation with Metadata: Varying the Role of Attributes in Community Detection

As the No Free Lunch theorem formally states [1], algorithms for detecti...
research
10/03/2020

Joint Inference of Structure and Diffusion in Partially Observed Social Networks

Access to complete data in large scale networks is often infeasible. The...
research
09/12/2018

Evaluation of Semantic Metadata Pair Modelling Using Data Clustering

Metadata presents a medium for connection, elaboration, examination, and...

Please sign up or login with your details

Forgot password? Click here to reset