Data-adaptive Active Sampling for Efficient Graph-Cognizant Classification

05/19/2017
by   Dimitris Berberidis, et al.
0

The present work deals with active sampling of graph nodes representing training data for binary classification. The graph may be given or constructed using similarity measures among nodal features. Leveraging the graph for classification builds on the premise that labels across neighboring nodes are correlated according to a categorical Markov random field (MRF). This model is further relaxed to a Gaussian (G)MRF with labels taking continuous values - an approximation that not only mitigates the combinatorial complexity of the categorical model, but also offers optimal unbiased soft predictors of the unlabeled nodes. The proposed sampling strategy is based on querying the node whose label disclosure is expected to inflict the largest change on the GMRF, and in this sense it is the most informative on average. Such a strategy subsumes several measures of expected model change, including uncertainty sampling, variance minimization, and sampling based on the Σ-optimality criterion. A simple yet effective heuristic is also introduced for increasing the exploration capabilities of the sampler, and reducing bias of the resultant classifier, by taking into account the confidence on the model label predictions. The novel sampling strategies are based on quantities that are readily available without the need for model retraining, rendering them computationally efficient and scalable to large graphs. Numerical tests using synthetic and real data demonstrate that the proposed methods achieve accuracy that is comparable or superior to the state-of-the-art even at reduced runtime.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/16/2014

Active Semi-Supervised Learning Using Sampling Theory for Graph Signals

We consider the problem of offline, pool-based active semi-supervised le...
research
09/18/2022

Bivariate Causal Discovery for Categorical Data via Classification with Optimal Label Permutation

Causal discovery for quantitative data has been extensively studied but ...
research
07/31/2023

DiffusAL: Coupling Active Learning with Graph Diffusion for Label-Efficient Node Classification

Node classification is one of the core tasks on attributed graphs, but s...
research
06/27/2023

Large Language Models as Annotators: Enhancing Generalization of NLP Models at Minimal Cost

State-of-the-art supervised NLP models achieve high accuracy but are als...
research
08/30/2021

Adaptive Label Smoothing To Regularize Large-Scale Graph Training

Graph neural networks (GNNs), which learn the node representations by re...
research
06/23/2017

A Variance Maximization Criterion for Active Learning

Active learning aims to train a classifier as fast as possible with as f...
research
03/13/2021

SMOTE-ENC: A novel SMOTE-based method to generate synthetic data for nominal and continuous features

Real world datasets are heavily skewed where some classes are significan...

Please sign up or login with your details

Forgot password? Click here to reset