Does the Objective Matter? Comparing Training Objectives for Pronoun Resolution

10/06/2020
by   Yordan Yordanov, et al.
0

Hard cases of pronoun resolution have been used as a long-standing benchmark for commonsense reasoning. In the recent literature, pre-trained language models have been used to obtain state-of-the-art results on pronoun resolution. Overall, four categories of training and evaluation objectives have been introduced. The variety of training datasets and pre-trained language models used in these works makes it unclear whether the choice of training objective is critical. In this work, we make a fair comparison of the performance and seed-wise stability of four models that represent the four categories of objectives. Our experiments show that the objective of sequence ranking performs the best in-domain, while the objective of semantic similarity between candidates and pronoun performs the best out-of-domain. We also observe a seed-wise instability of the model using sequence ranking, which is not the case when the other objectives are used.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/19/2022

Forging Multiple Training Objectives for Pre-trained Language Models via Meta-Learning

Multiple pre-training objectives fill the vacancy of the understanding c...
research
01/12/2023

KAER: A Knowledge Augmented Pre-Trained Language Model for Entity Resolution

Entity resolution has been an essential and well-studied task in data cl...
research
10/22/2022

DiscoSense: Commonsense Reasoning with Discourse Connectives

We present DiscoSense, a benchmark for commonsense reasoning via underst...
research
02/19/2023

Evaluating the Effectiveness of Pre-trained Language Models in Predicting the Helpfulness of Online Product Reviews

Businesses and customers can gain valuable information from product revi...
research
02/05/2020

Aligning the Pretraining and Finetuning Objectives of Language Models

We demonstrate that explicitly aligning the pretraining objectives to th...
research
12/19/2022

Explainable Fuzzer Evaluation

While the aim of fuzzer evaluation is to establish fuzzer performance in...
research
05/22/2020

L2R2: Leveraging Ranking for Abductive Reasoning

The abductive natural language inference task (αNLI) is proposed to eval...

Please sign up or login with your details

Forgot password? Click here to reset