Log In Sign Up

To BERT or Not to BERT: Comparing Task-specific and Task-agnostic Semi-Supervised Approaches for Sequence Tagging

by   Kasturi Bhattacharjee, et al.

Leveraging large amounts of unlabeled data using Transformer-like architectures, like BERT, has gained popularity in recent times owing to their effectiveness in learning general representations that can then be further fine-tuned for downstream tasks to much success. However, training these models can be costly both from an economic and environmental standpoint. In this work, we investigate how to effectively use unlabeled data: by exploring the task-specific semi-supervised approach, Cross-View Training (CVT) and comparing it with task-agnostic BERT in multiple settings that include domain and task relevant English data. CVT uses a much lighter model architecture and we show that it achieves similar performance to BERT on a set of sequence tagging tasks, with lesser financial and environmental impact.


page 1

page 2

page 3

page 4


Semi-Supervised Sequence Modeling with Cross-View Training

Unsupervised representation learning algorithms such as word2vec and ELM...

Generate, Annotate, and Learn: Generative Models Advance Self-Training and Knowledge Distillation

Semi-Supervised Learning (SSL) has seen success in many application doma...

Which Student is Best? A Comprehensive Knowledge Distillation Exam for Task-Specific BERT Models

We perform knowledge distillation (KD) benchmark from task-specific BERT...

Noised Consistency Training for Text Summarization

Neural abstractive summarization methods often require large quantities ...

Semi-supervised learning with Bayesian Confidence Propagation Neural Network

Learning internal representations from data using no or few labels is us...

Semi-Supervised Music Tagging Transformer

We present Music Tagging Transformer that is trained with a semi-supervi...