Masked Measurement Prediction: Learning to Jointly Predict Quantities and Units from Textual Context

12/16/2021
by   Daniel Spokoyny, et al.
0

Physical measurements constitute a large portion of numbers in academic papers, engineering reports, and web tables. Current benchmarks fall short of properly evaluating numeracy of pretrained language models on measurements, hindering research on developing new methods and applying them to numerical tasks. To that end, we introduce a novel task, Masked Measurement Prediction (MMP), where a model learns to reconstruct a number together with its associated unit given masked text. MMP is useful for both training new numerically informed models as well as evaluating numeracy of existing systems. In order to address this task, we introduce a new Generative Masked Measurement (GeMM) model that jointly learns to predict numbers along with their units. We perform fine-grained analyses comparing our model with various ablations and baselines. We use linear probing of traditional pretrained transformer models (RoBERTa) to show that they significantly underperform jointly trained number-unit models, highlighting the difficulty of this new task and the benefits of our proposed pretraining approach. We hope this framework accelerates the progress towards building more robust numerical reasoning systems in the future.

READ FULL TEXT

page 7

page 11

research
05/13/2022

Improving the Numerical Reasoning Skills of Pretrained Language Models

State-of-the-art pretrained language models tend to perform below their ...
research
05/12/2023

Measuring Progress in Fine-grained Vision-and-Language Understanding

While pretraining on large-scale image-text data from the Web has facili...
research
10/23/2022

Do Language Models Understand Measurements?

Recent success of pre-trained language models (PLMs) has stimulated inte...
research
05/17/2020

TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data

Recent years have witnessed the burgeoning of pretrained language models...
research
08/29/2023

Measurement Tampering Detection Benchmark

When training powerful AI systems to perform complex tasks, it may be ch...
research
04/21/2023

KitchenScale: Learning to predict ingredient quantities from recipe contexts

Determining proper quantities for ingredients is an essential part of co...
research
07/05/2023

LOAF-M2L: Joint Learning of Wording and Formatting for Singable Melody-to-Lyric Generation

Despite previous efforts in melody-to-lyric generation research, there i...

Please sign up or login with your details

Forgot password? Click here to reset