Neural Data-to-Text Generation Based on Small Datasets: Comparing the Added Value of Two Semi-Supervised Learning Approaches on Top of a Large Language Model

07/14/2022
by   Chris van der Lee, et al.
0

This study discusses the effect of semi-supervised learning in combination with pretrained language models for data-to-text generation. It is not known whether semi-supervised learning is still helpful when a large-scale language model is also supplemented. This study aims to answer this question by comparing a data-to-text system only supplemented with a language model, to two data-to-text systems that are additionally enriched by a data augmentation or a pseudo-labeling semi-supervised learning approach. Results show that semi-supervised learning results in higher scores on diversity metrics. In terms of output quality, extending the training set of a data-to-text system with a language model using the pseudo-labeling approach did increase text quality scores, but the data augmentation approach yielded similar scores to the system without training set extension. These results indicate that semi-supervised learning approaches can bolster output quality and diversity, even when a language model is also present.

READ FULL TEXT
research
06/16/2023

Semi-supervised Relation Extraction via Data Augmentation and Consistency-training

Due to the semantic complexity of the Relation extraction (RE) task, obt...
research
09/01/2022

Attack Tactic Identification by Transfer Learning of Language Model

Cybersecurity has become a primary global concern with the rapid increas...
research
11/17/2020

Neural Semi-supervised Learning for Text Classification Under Large-Scale Pretraining

The goal of semi-supervised learning is to utilize the unlabeled, in-dom...
research
10/17/2016

The Peaking Phenomenon in Semi-supervised Learning

For the supervised least squares classifier, when the number of training...
research
10/22/2020

slimIPL: Language-Model-Free Iterative Pseudo-Labeling

Recent results in end-to-end ASR have demonstrated the efficacy of simpl...
research
06/17/2023

KEST: Kernel Distance Based Efficient Self-Training for Improving Controllable Text Generation

Self-training (ST) has come to fruition in language understanding tasks ...
research
01/18/2021

On Data-Augmentation and Consistency-Based Semi-Supervised Learning

Recently proposed consistency-based Semi-Supervised Learning (SSL) metho...

Please sign up or login with your details

Forgot password? Click here to reset