MICE: Mining Idioms with Contextual Embeddings

08/13/2020
by   Tadej Škvorc, et al.
0

Idiomatic expressions can be problematic for natural language processing applications as their meaning cannot be inferred from their constituting words. A lack of successful methodological approaches and sufficiently large datasets prevents the development of machine learning approaches for detecting idioms, especially for expressions that do not occur in the training set. We present an approach, called MICE, that uses contextual embeddings for that purpose. We present a new dataset of multi-word expressions with literal and idiomatic meanings and use it to train a classifier based on two state-of-the-art contextual word embeddings: ELMo and BERT. We show that deep neural networks using both embeddings perform much better than existing approaches, and are capable of detecting idiomatic word use, even for expressions that were not present in the training set. We demonstrate cross-lingual transfer of developed models and analyze the size of the required dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/16/2020

A Survey on Contextual Embeddings

Contextual embeddings, such as ELMo and BERT, move beyond global word re...
research
05/18/2020

Contextual Embeddings: When Are They Worth It?

We study the settings for which deep contextual embeddings (e.g., BERT) ...
research
01/31/2019

Decomposing Generalization: Models of Generic, Habitual, and Episodic Statements

We present a novel semantic framework for modeling linguistic expression...
research
11/05/2021

On the Impact of Temporal Representations on Metaphor Detection

State-of-the-art approaches for metaphor detection compare their literal...
research
11/22/2019

High Quality ELMo Embeddings for Seven Less-Resourced Languages

Recent results show that deep neural networks using contextual embedding...
research
04/22/2019

Understanding Roles and Entities: Datasets and Models for Natural Language Inference

We present two new datasets and a novel attention mechanism for Natural ...
research
11/24/2022

InDEX: Indonesian Idiom and Expression Dataset for Cloze Test

We propose InDEX, an Indonesian Idiom and Expression dataset for cloze t...

Please sign up or login with your details

Forgot password? Click here to reset