Universal Language Model Fine-Tuning with Subword Tokenization for Polish

10/24/2018
by   Piotr Czapla, et al.
0

Universal Language Model for Fine-tuning [arXiv:1801.06146] (ULMFiT) is one of the first NLP methods for efficient inductive transfer learning. Unsupervised pretraining results in improvements on many NLP tasks for English. In this paper, we describe a new method that uses subword tokenization to adapt ULMFiT to languages with high inflection. Our approach results in a new state-of-the-art for the Polish language, taking first place in Task 3 of PolEval'18. After further training, our final model outperformed the second best model by 35

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/03/2020

Language-agnostic BERT Sentence Embedding

We adapt multilingual BERT to produce language-agnostic sentence embeddi...
research
03/14/2019

To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks

While most previous work has focused on different pretraining objectives...
research
10/25/2019

FineText: Text Classification via Attention-based Language Model Fine-tuning

Training deep neural networks from scratch on natural language processin...
research
02/18/2023

Bag of Tricks for Effective Language Model Pretraining and Downstream Adaptation: A Case Study on GLUE

This technical report briefly describes our JDExplore d-team's submissio...
research
03/09/2022

PALI-NLP at SemEval-2022 Task 4: Discriminative Fine-tuning of Deep Transformers for Patronizing and Condescending Language Detection

Patronizing and condescending language (PCL) has a large harmful impact ...
research
02/18/2022

From FreEM to D'AlemBERT: a Large Corpus and a Language Model for Early Modern French

Language models for historical states of language are becoming increasin...
research
05/01/2020

Selecting Informative Contexts Improves Language Model Finetuning

We present a general finetuning meta-method that we call information gai...

Please sign up or login with your details

Forgot password? Click here to reset