GPT Self-Supervision for a Better Data Annotator

06/07/2023
by   Xiaohuan Pei, et al.
0

The task of annotating data into concise summaries poses a significant challenge across various domains, frequently requiring the allocation of significant time and specialized knowledge by human experts. Despite existing efforts to use large language models for annotation tasks, significant problems such as limited applicability to unlabeled data, the absence of self-supervised methods, and the lack of focus on complex structured data still persist. In this work, we propose a GPT self-supervision annotation method, which embodies a generating-recovering paradigm that leverages the one-shot learning capabilities of the Generative Pretrained Transformer (GPT). The proposed approach comprises a one-shot tuning phase followed by a generation phase. In the one-shot tuning phase, we sample a data from the support set as part of the prompt for GPT to generate a textual summary, which is then used to recover the original data. The alignment score between the recovered and original data serves as a self-supervision navigator to refine the process. In the generation stage, the optimally selected one-shot sample serves as a template in the prompt and is applied to generating summaries from challenging datasets. The annotation performance is evaluated by tuning several human feedback reward networks and by calculating alignment scores between original and recovered data at both sentence and structure levels. Our self-supervised annotation method consistently achieves competitive scores, convincingly demonstrating its robust strength in various data-to-summary annotation tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/03/2022

Improving In-Context Few-Shot Learning via Self-Supervised Training

Self-supervised pretraining has made few-shot learning possible for many...
research
10/10/2021

Injecting Text and Cross-lingual Supervision in Few-shot Learning from Self-Supervised Models

Self-supervised model pre-training has recently garnered significant int...
research
03/10/2021

Multi-Pretext Attention Network for Few-shot Learning with Self-supervision

Few-shot learning is an interesting and challenging study, which enables...
research
02/28/2022

A Mutually Reinforced Framework for Pretrained Sentence Embeddings

The lack of labeled data is a major obstacle to learning high-quality se...
research
12/28/2020

Lattice-Free MMI Adaptation Of Self-Supervised Pretrained Acoustic Models

In this work, we propose lattice-free MMI (LFMMI) for supervised adaptat...
research
06/05/2023

Few Shot Rationale Generation using Self-Training with Dual Teachers

Self-rationalizing models that also generate a free-text explanation for...

Please sign up or login with your details

Forgot password? Click here to reset