Efficient Centrality Maximization with Rademacher Averages

06/06/2023
by   Leonardo Pellegrina, et al.
0

The identification of the set of k most central nodes of a graph, or centrality maximization, is a key task in network analysis, with various applications ranging from finding communities in social and biological networks to understanding which seed nodes are important to diffuse information in a graph. As the exact computation of centrality measures does not scale to modern-sized networks, the most practical solution is to resort to rigorous, but efficiently computable, randomized approximations. In this work we present CentRA, the first algorithm based on progressive sampling to compute high-quality approximations of the set of k most central nodes. CentRA is based on a novel approach to efficiently estimate Monte Carlo Rademacher Averages, a powerful tool from statistical learning theory to compute sharp data-dependent approximation bounds. Then, we study the sample complexity of centrality maximization using the VC-dimension, a key concept from statistical learning theory. We show that the number of random samples required to compute high-quality approximations scales with finer characteristics of the graph, such as its vertex diameter, or of the centrality of interest, significantly improving looser bounds derived from standard techniques. We apply CentRA to analyze large real-world networks, showing that it significantly outperforms the state-of-the-art approximation algorithm in terms of number of samples, running times, and accuracy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/07/2021

SILVAN: Estimating Betweenness Centralities with Progressive Sampling and Non-uniform Rademacher Bounds

Betweenness centrality is a popular centrality measure with applications...
research
03/01/2022

ONBRA: Rigorous Estimation of the Temporal Betweenness Centrality in Temporal Networks

In network analysis, the betweenness centrality of a node informally cap...
research
10/03/2019

Importance Sample-based Approximation Algorithm for Cost-aware Targeted Viral Marketing

Cost-aware Targeted Viral Marketing (CTVM), a generalization of Influenc...
research
04/17/2023

On approximating the temporal betweenness centrality through sampling

We present a collection of sampling-based algorithms for approximating t...
research
01/18/2021

PRESTO: Simple and Scalable Sampling Techniques for the Rigorous Approximation of Temporal Motif Counts

The identification and counting of small graph patterns, called network ...
research
12/27/2019

Combinatorial Trace Method for Network Immunization

Immunizing a subset of nodes in a network - enabling them to identify an...
research
06/01/2023

Scaling Expected Force: Efficient Identification of Key Nodes in Network-based Epidemic Models

Centrality measures are fundamental tools of network analysis as they hi...

Please sign up or login with your details

Forgot password? Click here to reset