Pre-training Is (Almost) All You Need: An Application to Commonsense Reasoning

04/29/2020
by   Alexandre Tamborrino, et al.
0

Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks. Most of the existing approaches rely on a randomly initialized classifier on top of such networks. We argue that this fine-tuning procedure is sub-optimal as the pre-trained model has no prior on the specific classifier labels, while it might have already learned an intrinsic textual representation of the task. In this paper, we introduce a new scoring method that casts a plausibility ranking task in a full-text format and leverages the masked language modeling head tuned during the pre-training phase. We study commonsense reasoning tasks where the model must rank a set of hypotheses given a premise, focusing on the COPA, Swag, HellaSwag and CommonsenseQA datasets. By exploiting our scoring method without fine-tuning, we are able to produce strong baselines (e.g. 80 are comparable to supervised approaches. Moreover, when fine-tuning directly on the proposed scoring function, we show that our method provides a much more stable training phase across random restarts (e.g × 10 standard deviation reduction on COPA test accuracy) and requires less annotated data than the standard classifier approach to reach equivalent performances.

READ FULL TEXT
research
04/23/2020

UHH-LT LT2 at SemEval-2020 Task 12: Fine-Tuning of Pre-Trained Transformer Networks for Offensive Language Detection

Fine-tuning of pre-trained transformer networks such as BERT yield state...
research
09/07/2021

Exploring Strategies for Generalizable Commonsense Reasoning with Pre-trained Models

Commonsense reasoning benchmarks have been largely solved by fine-tuning...
research
06/09/2023

Using Foundation Models to Detect Policy Violations with Minimal Supervision

Foundation models, i.e. large neural networks pre-trained on large text ...
research
08/19/2019

Align, Mask and Select: A Simple Method for Incorporating Commonsense Knowledge into Language Representation Models

Neural language representation models such as Bidirectional Encoder Repr...
research
05/03/2023

Plan, Eliminate, and Track – Language Models are Good Teachers for Embodied Agents

Pre-trained large language models (LLMs) capture procedural knowledge ab...
research
10/18/2022

Alibaba-Translate China's Submission for WMT 2022 Metrics Shared Task

In this report, we present our submission to the WMT 2022 Metrics Shared...
research
12/22/2020

Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning

Although pretrained language models can be fine-tuned to produce state-o...

Please sign up or login with your details

Forgot password? Click here to reset