DeepAI AI Chat
Log In Sign Up

A Novel Challenge Set for Hebrew Morphological Disambiguation and Diacritics Restoration

by   Avi Shmidman, et al.

One of the primary tasks of morphological parsers is the disambiguation of homographs. Particularly difficult are cases of unbalanced ambiguity, where one of the possible analyses is far more frequent than the others. In such cases, there may not exist sufficient examples of the minority analyses in order to properly evaluate performance, nor to train effective classifiers. In this paper we address the issue of unbalanced morphological ambiguities in Hebrew. We offer a challenge set for Hebrew homographs – the first of its kind – containing substantial attestation of each analysis of 21 Hebrew homographs. We show that the current SOTA of Hebrew disambiguation performs poorly on cases of unbalanced ambiguity. Leveraging our new dataset, we achieve a new state-of-the-art for all 21 words, improving the overall average F1 score from 0.67 to 0.95. Our resulting annotated datasets are made publicly available for further research.


Word Segmentation and Morphological Parsing for Sanskrit

We describe our participation in the Word Segmentation and Morphological...

User-Generated Text Corpus for Evaluating Japanese Morphological Analysis and Lexical Normalization

Morphological analysis (MA) and lexical normalization (LN) are both impo...

Neural disambiguation of lemma and part of speech in morphologically rich languages

We consider the problem of disambiguating the lemma and part of speech o...

Morphological Analysis for the Maltese Language: The Challenges of a Hybrid System

Maltese is a morphologically rich language with a hybrid morphological s...

A Benchmark Corpus and Neural Approach for Sanskrit Derivative Nouns Analysis

This paper presents first benchmark corpus of Sanskrit Pratyaya (suffix)...

ChartDETR: A Multi-shape Detection Network for Visual Chart Recognition

Visual chart recognition systems are gaining increasing attention due to...