Can current NLI systems handle German word order? Investigating language model performance on a new German challenge set of minimal pairs

06/07/2023
by   Ines Reinig, et al.
0

Compared to English, German word order is freer and therefore poses additional challenges for natural language inference (NLI). We create WOGLI (Word Order in German Language Inference), the first adversarial NLI dataset for German word order that has the following properties: (i) each premise has an entailed and a non-entailed hypothesis; (ii) premise and hypotheses differ only in word order and necessary morphological changes to mark case and number. In particular, each premise andits two hypotheses contain exactly the same lemmata. Our adversarial examples require the model to use morphological markers in order to recognise or reject entailment. We show that current German autoencoding models fine-tuned on translated NLI data can struggle on this challenge set, reflecting the fact that translated NLI datasets will not mirror all necessary language phenomena in the target language. We also examine performance after data augmentation as well as on related word order phenomena derived from WOGLI. Our datasets are publically available at https://github.com/ireinig/wogli.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/02/2018

DEMorphy, German Language Morphological Analyzer

DEMorphy is a morphological analyzer for German. It is built onto large,...
research
03/23/2023

SwissBERT: The Multilingual Language Model for Switzerland

We present SwissBERT, a masked language model created specifically for p...
research
06/30/2021

Genre determining prediction: Non-standard TAM marking in football language

German and French football language display tense-aspect-mood (TAM) form...
research
10/21/2020

German's Next Language Model

In this work we present the experiments which lead to the creation of ou...
research
06/14/2023

Does mBERT understand Romansh? Evaluating word embeddings using word alignment

We test similarity-based word alignment models (SimAlign and awesome-ali...
research
10/07/2021

mRAT-SQL+GAP:A Portuguese Text-to-SQL Transformer

The translation of natural language questions to SQL queries has attract...
research
05/03/2020

Bootstrapping Techniques for Polysynthetic Morphological Analysis

Polysynthetic languages have exceptionally large and sparse vocabularies...

Please sign up or login with your details

Forgot password? Click here to reset