The merits of Universal Language Model Fine-tuning for Small Datasets – a case with Dutch book reviews

10/02/2019
by   Benjamin van der Burgh, et al.
0

We evaluated the effectiveness of using language models, that were pre-trained in one domain, as the basis for a classification model in another domain: Dutch book reviews. Pre-trained language models have opened up new possibilities for classification tasks with limited labelled data, because representation can be learned in an unsupervised fashion. In our experiments we have studied the effects of training set size (100-1600 items) on the prediction accuracy of a ULMFiT classifier, based on a language models that we pre-trained on the Dutch Wikipedia. We also compared ULMFiT to Support Vector Machines, which is traditionally considered suitable for small collections. We found that ULMFiT outperforms SVM for all training set sizes and that satisfactory results ( 90 manually annotated within a few hours. We deliver both our new benchmark collection of Dutch book reviews for sentiment classification as well as the pre-trained Dutch language model to the community.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/11/2021

The Interplay of Variant, Size, and Task Type in Arabic Pre-trained Language Models

In this paper, we explore the effects of language variants, data sizes, ...
research
06/16/2023

Data Selection for Fine-tuning Large Language Models Using Transferred Shapley Values

Although Shapley values have been shown to be highly effective for ident...
research
04/19/2023

Catch Me If You Can: Identifying Fraudulent Physician Reviews with Large Language Models Using Generative Pre-Trained Transformers

The proliferation of fake reviews of doctors has potentially detrimental...
research
01/01/2023

Is word segmentation necessary for Vietnamese sentiment classification?

To the best of our knowledge, this paper made the first attempt to answe...
research
06/01/2023

Prompt Algebra for Task Composition

We investigate whether prompts learned independently for different tasks...
research
09/13/2023

Unsupervised Contrast-Consistent Ranking with Language Models

Language models contain ranking-based knowledge and are powerful solvers...
research
10/26/2020

Probing Task-Oriented Dialogue Representation from Language Models

This paper investigates pre-trained language models to find out which mo...

Please sign up or login with your details

Forgot password? Click here to reset