Jamp: Controlled Japanese Temporal Inference Dataset for Evaluating Generalization Capacity of Language Models

06/19/2023
by   Tomoki Sugimoto, et al.
0

Natural Language Inference (NLI) tasks involving temporal inference remain challenging for pre-trained language models (LMs). Although various datasets have been created for this task, they primarily focus on English and do not address the need for resources in other languages. It is unclear whether current LMs realize the generalization capacity for temporal inference across languages. In this paper, we present Jamp, a Japanese NLI benchmark focused on temporal inference. Our dataset includes a range of temporal inference patterns, which enables us to conduct fine-grained analysis. To begin the data annotation process, we create diverse inference templates based on the formal semantics test suites. We then automatically generate diverse NLI examples by using the Japanese case frame dictionary and well-designed templates while controlling the distribution of inference patterns and gold labels. We evaluate the generalization capacities of monolingual/multilingual LMs by splitting our dataset based on tense fragments (i.e., temporal inference patterns). Our findings demonstrate that LMs struggle with specific linguistic phenomena, such as habituality, indicating that there is potential for the development of more effective NLI models across languages.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/09/2022

Compositional Evaluation on Japanese Textual Entailment and Similarity

Natural Language Inference (NLI) and Semantic Textual Similarity (STS) a...
research
05/10/2021

Assessing the Syntactic Capabilities of Transformer-based Multilingual Language Models

Multilingual Transformer-based language models, usually pretrained on mo...
research
04/20/2022

Compositional Semantics and Inference System for Temporal Order based on Japanese CCG

Natural Language Inference (NLI) is the task of determining whether a pr...
research
05/15/2023

Sensitivity and Robustness of Large Language Models to Prompt in Japanese

Prompt Engineering has gained significant relevance in recent years, fue...
research
10/06/2022

Not another Negation Benchmark: The NaN-NLI Test Suite for Sub-clausal Negation

Negation is poorly captured by current language models, although the ext...
research
06/05/2023

Analyzing Syntactic Generalization Capacity of Pre-trained Language Models on Japanese Honorific Conversion

Using Japanese honorifics is challenging because it requires not only kn...
research
01/14/2021

SICKNL: A Dataset for Dutch Natural Language Inference

We present SICK-NL (read: signal), a dataset targeting Natural Language ...

Please sign up or login with your details

Forgot password? Click here to reset