To BERT or Not to BERT: Comparing Task-specific and Task-agnostic Semi-Supervised Approaches for Sequence Tagging

10/27/2020
by   Kasturi Bhattacharjee, et al.
0

Leveraging large amounts of unlabeled data using Transformer-like architectures, like BERT, has gained popularity in recent times owing to their effectiveness in learning general representations that can then be further fine-tuned for downstream tasks to much success. However, training these models can be costly both from an economic and environmental standpoint. In this work, we investigate how to effectively use unlabeled data: by exploring the task-specific semi-supervised approach, Cross-View Training (CVT) and comparing it with task-agnostic BERT in multiple settings that include domain and task relevant English data. CVT uses a much lighter model architecture and we show that it achieves similar performance to BERT on a set of sequence tagging tasks, with lesser financial and environmental impact.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/22/2018

Semi-Supervised Sequence Modeling with Cross-View Training

Unsupervised representation learning algorithms such as word2vec and ELM...
research
06/11/2021

Generate, Annotate, and Learn: Generative Models Advance Self-Training and Knowledge Distillation

Semi-Supervised Learning (SSL) has seen success in many application doma...
research
01/03/2022

Which Student is Best? A Comprehensive Knowledge Distillation Exam for Task-Specific BERT Models

We perform knowledge distillation (KD) benchmark from task-specific BERT...
research
05/28/2021

Noised Consistency Training for Text Summarization

Neural abstractive summarization methods often require large quantities ...
research
01/11/2023

A Distinct Unsupervised Reference Model From The Environment Helps Continual Learning

The existing continual learning methods are mainly focused on fully-supe...
research
11/26/2021

Semi-Supervised Music Tagging Transformer

We present Music Tagging Transformer that is trained with a semi-supervi...
research
10/20/2021

Model Composition: Can Multiple Neural Networks Be Combined into a Single Network Using Only Unlabeled Data?

The diversity of deep learning applications, datasets, and neural networ...

Please sign up or login with your details

Forgot password? Click here to reset