Data-Centric Learning from Unlabeled Graphs with Diffusion Model

03/17/2023
by   Gang Liu, et al.
0

Graph property prediction tasks are important and numerous. While each task offers a small size of labeled examples, unlabeled graphs have been collected from various sources and at a large scale. A conventional approach is training a model with the unlabeled graphs on self-supervised tasks and then fine-tuning the model on the prediction tasks. However, the self-supervised task knowledge could not be aligned or sometimes conflicted with what the predictions needed. In this paper, we propose to extract the knowledge underlying the large set of unlabeled graphs as a specific set of useful data points to augment each property prediction model. We use a diffusion model to fully utilize the unlabeled graphs and design two new objectives to guide the model's denoising process with each task's labeled data to generate task-specific graph examples and their labels. Experiments demonstrate that our data-centric approach performs significantly better than fourteen existing various methods on fifteen tasks. The performance improvement brought by unlabeled data is visible as the generated labeled examples unlike self-supervised learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/17/2020

Big Self-Supervised Models are Strong Semi-Supervised Learners

One paradigm for learning from few labeled examples while making best us...
research
03/11/2021

Self-supervised Text-to-SQL Learning with Header Alignment Training

Since we can leverage a large amount of unlabeled data without any human...
research
05/04/2022

Crystal Twins: Self-supervised Learning for Crystalline Material Property Prediction

Machine learning (ML) models have been widely successful in the predicti...
research
11/10/2020

UmBERTo-MTSA @ AcCompl-It: Improving Complexity and Acceptability Prediction with Multi-task Learning on Self-Supervised Annotations

This work describes a self-supervised data augmentation approach used to...
research
07/04/2022

Masked Self-Supervision for Remaining Useful Lifetime Prediction in Machine Tools

Prediction of Remaining Useful Lifetime(RUL) in the modern manufacturing...
research
01/18/2022

Deep Cervix Model Development from Heterogeneous and Partially Labeled Image Datasets

Cervical cancer is the fourth most common cancer in women worldwide. The...
research
04/22/2021

Self-Supervised Learning from Semantically Imprecise Data

Learning from imprecise labels such as "animal" or "bird", but making pr...

Please sign up or login with your details

Forgot password? Click here to reset