Confidence May Cheat: Self-Training on Graph Neural Networks under Distribution Shift

01/27/2022
by   Hongrui Liu, et al.
0

Graph Convolutional Networks (GCNs) have recently attracted vast interest and achieved state-of-the-art performance on graphs, but its success could typically hinge on careful training with amounts of expensive and time-consuming labeled data. To alleviate labeled data scarcity, self-training methods have been widely adopted on graphs by labeling high-confidence unlabeled nodes and then adding them to the training step. In this line, we empirically make a thorough study for current self-training methods on graphs. Surprisingly, we find that high-confidence unlabeled nodes are not always useful, and even introduce the distribution shift issue between the original labeled dataset and the augmented dataset by self-training, severely hindering the capability of self-training on graphs. To this end, in this paper, we propose a novel Distribution Recovered Graph Self-Training framework (DR-GST), which could recover the distribution of the original labeled dataset. Specifically, we first prove the equality of loss function in self-training framework under the distribution shift case and the population distribution if each pseudo-labeled node is weighted by a proper coefficient. Considering the intractability of the coefficient, we then propose to replace the coefficient with the information gain after observing the same changing trend between them, where information gain is respectively estimated via both dropout variational inference and dropedge variational inference in DR-GST. However, such a weighted loss function will enlarge the impact of incorrect pseudo labels. As a result, we apply the loss correction method to improve the quality of pseudo labels. Both our theoretical analysis and extensive experiments on five benchmark datasets demonstrate the effectiveness of the proposed DR-GST, as well as each well-designed component in DR-GST.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/20/2022

Informative Pseudo-Labeling for Graph Neural Networks with Few Labels

Graph Neural Networks (GNNs) have achieved state-of-the-art results for ...
research
06/13/2022

EnergyMatch: Energy-based Pseudo-Labeling for Semi-Supervised Learning

Recent state-of-the-art methods in semi-supervised learning (SSL) combin...
research
02/05/2022

LST: Lexicon-Guided Self-Training for Few-Shot Text Classification

Self-training provides an effective means of using an extremely small am...
research
09/18/2023

Towards Self-Adaptive Pseudo-Label Filtering for Semi-Supervised Learning

Recent semi-supervised learning (SSL) methods typically include a filter...
research
02/15/2022

Debiased Pseudo Labeling in Self-Training

Deep neural networks achieve remarkable performances on a wide range of ...
research
07/13/2023

Intent-calibrated Self-training for Answer Selection in Open-domain Dialogues

Answer selection in open-domain dialogues aims to select an accurate ans...

Please sign up or login with your details

Forgot password? Click here to reset