InDEX: Indonesian Idiom and Expression Dataset for Cloze Test

11/24/2022
by   Xinying Qiu, et al.
0

We propose InDEX, an Indonesian Idiom and Expression dataset for cloze test. The dataset contains 10438 unique sentences for 289 idioms and expressions for which we generate 15 different types of distractors, resulting in a large cloze-style corpus. Many baseline models of cloze test reading comprehension apply BERT with random initialization to learn embedding representation. But idioms and fixed expressions are different such that the literal meaning of the phrases may or may not be consistent with their contextual meaning. Therefore, we explore different ways to combine static and contextual representations for a stronger baseline model. Experimentations show that combining definition and random initialization will better support cloze test model performance for idioms whether independently or mixed with fixed expressions. While for fixed expressions with no special meaning, static embedding with random initialization is sufficient for cloze test model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/13/2022

HIT at SemEval-2022 Task 2: Pre-trained Language Model for Idioms Detection

The same multi-word expressions may have different meanings in different...
research
06/06/2023

Referring Expression Comprehension Using Language Adaptive Inference

Different from universal object detection, referring expression comprehe...
research
06/16/2020

EPIE Dataset: A Corpus For Possible Idiomatic Expressions

Idiomatic expressions have always been a bottleneck for language compreh...
research
02/25/2021

ZJUKLAB at SemEval-2021 Task 4: Negative Augmentation with Language Model for Reading Comprehension of Abstract Meaning

This paper presents our systems for the three Subtasks of SemEval Task4:...
research
02/14/2023

A Psycholinguistic Analysis of BERT's Representations of Compounds

This work studies the semantic representations learned by BERT for compo...
research
08/13/2020

MICE: Mining Idioms with Contextual Embeddings

Idiomatic expressions can be problematic for natural language processing...
research
09/27/2021

Does referent predictability affect the choice of referential form? A computational approach using masked coreference resolution

It is often posited that more predictable parts of a speaker's meaning t...

Please sign up or login with your details

Forgot password? Click here to reset