Benchmarks for Pirá 2.0, a Reading Comprehension Dataset about the Ocean, the Brazilian Coast, and Climate Change

09/19/2023
by   Paulo Pirozelli, et al.
0

Pirá is a reading comprehension dataset focused on the ocean, the Brazilian coast, and climate change, built from a collection of scientific abstracts and reports on these topics. This dataset represents a versatile language resource, particularly useful for testing the ability of current machine learning models to acquire expert scientific knowledge. Despite its potential, a detailed set of baselines has not yet been developed for Pirá. By creating these baselines, researchers can more easily utilize Pirá as a resource for testing machine learning models across a wide range of question answering tasks. In this paper, we define six benchmarks over the Pirá dataset, covering closed generative question answering, machine reading comprehension, information retrieval, open question answering, answer triggering, and multiple choice question answering. As part of this effort, we have also produced a curated version of the original dataset, where we fixed a number of grammar issues, repetitions, and other shortcomings. Furthermore, the dataset has been extended in several new directions, so as to face the aforementioned benchmarks: translation of supporting texts from English into Portuguese, classification labels for answerability, automatic paraphrases of questions and answers, and multiple choice candidates. The results described in this paper provide several points of reference for researchers interested in exploring the challenges provided by the Pirá dataset.

READ FULL TEXT
research
05/03/2023

NorQuAD: Norwegian Question Answering Dataset

In this paper we present NorQuAD: the first Norwegian question answering...
research
12/19/2017

The NarrativeQA Reading Comprehension Challenge

Reading comprehension (RC)---in contrast to information retrieval---requ...
research
11/02/2021

UQuAD1.0: Development of an Urdu Question Answering Training Data for Machine Reading Comprehension

In recent years, low-resource Machine Reading Comprehension (MRC) has ma...
research
02/01/2021

Self-Teaching Machines to Read and Comprehend with Large-Scale Multi-Subject Question Answering Data

In spite of much recent research in the area, it is still unclear whethe...
research
08/29/2019

Ellipsis and Coreference Resolution as Question Answering

Coreference and many forms of ellipsis are similar to reading comprehens...
research
05/20/2023

VNHSGE: VietNamese High School Graduation Examination Dataset for Large Language Models

The VNHSGE (VietNamese High School Graduation Examination) dataset, deve...
research
05/11/2020

A Self-Training Method for Machine Reading Comprehension with Soft Evidence Extraction

Neural models have achieved great success on machine reading comprehensi...

Please sign up or login with your details

Forgot password? Click here to reset