Dataset and Baseline System for Multi-lingual Extraction and Normalization of Temporal and Numerical Expressions

03/31/2023
by   Sanxing Chen, et al.
0

Temporal and numerical expression understanding is of great importance in many downstream Natural Language Processing (NLP) and Information Retrieval (IR) tasks. However, much previous work covers only a few sub-types and focuses only on entity extraction, which severely limits the usability of identified mentions. In order for such entities to be useful in downstream scenarios, coverage and granularity of sub-types are important; and, even more so, providing resolution into concrete values that can be manipulated. Furthermore, most previous work addresses only a handful of languages. Here we describe a multi-lingual evaluation dataset - NTX - covering diverse temporal and numerical expressions across 14 languages and covering extraction, normalization, and resolution. Along with the dataset we provide a robust rule-based system as a strong baseline for comparisons against other models to be evaluated in this dataset. Data and code are available at <https://aka.ms/NTX>.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/03/2015

On TimeML-Compliant Temporal Expression Extraction in Turkish

It is commonly acknowledged that temporal expression extractors are impo...
research
05/20/2022

Multilingual Normalization of Temporal Expressions with Masked Language Models

The detection and normalization of temporal expressions is an important ...
research
02/23/2020

A Nepali Rule Based Stemmer and its performance on different NLP applications

Stemming is an integral part of Natural Language Processing (NLP). It's ...
research
05/03/2022

XLTime: A Cross-Lingual Knowledge Transfer Framework for Temporal Expression Extraction

Temporal Expression Extraction (TEE) is essential for understanding time...
research
05/18/2023

Multi-CrossRE A Multi-Lingual Multi-Domain Dataset for Relation Extraction

Most research in Relation Extraction (RE) involves the English language,...
research
08/31/2021

Automatic Rule Generation for Time Expression Normalization

The understanding of time expressions includes two sub-tasks: recognitio...
research
04/25/2021

Identifying Offensive Expressions of Opinion in Context

Classic information extraction techniques consist in building questions ...

Please sign up or login with your details

Forgot password? Click here to reset