Large Scale Distributed Semi-Supervised Learning Using Streaming Approximation

12/06/2015
by   Sujith Ravi, et al.
0

Traditional graph-based semi-supervised learning (SSL) approaches, even though widely applied, are not suited for massive data and large label scenarios since they scale linearly with the number of edges |E| and distinct labels m. To deal with the large label size problem, recent works propose sketch-based methods to approximate the distribution on labels per node thereby achieving a space reduction from O(m) to O( m), under certain conditions. In this paper, we present a novel streaming graph-based SSL approximation that captures the sparsity of the label distribution and ensures the algorithm propagates labels accurately, and further reduces the space complexity per node to O(1). We also provide a distributed version of the algorithm that scales well to large data sizes. Experiments on real-world datasets demonstrate that the new method achieves better performance than existing state-of-the-art algorithms with significant reduction in memory footprint. We also study different graph construction mechanisms for natural language applications and propose a robust graph augmentation strategy trained using state-of-the-art unsupervised deep learning architectures that yields further significant quality gains.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/26/2019

A Flexible Generative Framework for Graph-based Semi-supervised Learning

We consider a family of problems that are concerned about making predict...
research
07/08/2016

Graph Construction with Label Information for Semi-Supervised Learning

In the literature, most existing graph-based semi-supervised learning (S...
research
03/23/2022

Semi-Supervised Graph Learning Meets Dimensionality Reduction

Semi-supervised learning (SSL) has recently received increased attention...
research
02/16/2019

Semi-supervised Learning on Graph with an Alternating Diffusion Process

Graph-based semi-supervised learning usually involves two separate stage...
research
12/02/2020

SemiNLL: A Framework of Noisy-Label Learning by Semi-Supervised Learning

Deep learning with noisy labels is a challenging task. Recent prominent ...
research
02/18/2020

Higher-Order Label Homogeneity and Spreading in Graphs

Do higher-order network structures aid graph semi-supervised learning? G...
research
02/11/2021

OpinionRank: Extracting Ground Truth Labels from Unreliable Expert Opinions with Graph-Based Spectral Ranking

As larger and more comprehensive datasets become standard in contemporar...

Please sign up or login with your details

Forgot password? Click here to reset