User-Generated Text Corpus for Evaluating Japanese Morphological Analysis and Lexical Normalization

04/08/2021
by   Shohei Higashiyama, et al.
0

Morphological analysis (MA) and lexical normalization (LN) are both important tasks for Japanese user-generated text (UGT). To evaluate and compare different MA/LN systems, we have constructed a publicly available Japanese UGT corpus. Our corpus comprises 929 sentences annotated with morphological and normalization information, along with category information we classified for frequent UGT-specific phenomena. Experiments on the corpus demonstrated the low performance of existing MA/LN methods for non-general words and non-standard forms, indicating that the corpus would be a challenging benchmark for further research on UGT.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/18/2020

hinglishNorm – A Corpus of Hindi-English Code Mixed Sentences for Text Normalization

We present hinglishNorm – a human annotated corpus of Hindi-English code...
research
05/13/2020

Validation and Normalization of DCS corpus using Sanskrit Heritage tools to build a tagged Gold Corpus

The Digital Corpus of Sanskrit records around 650,000 sentences along wi...
research
10/24/2020

A Benchmark Corpus and Neural Approach for Sanskrit Derivative Nouns Analysis

This paper presents first benchmark corpus of Sanskrit Pratyaya (suffix)...
research
05/25/2020

Dialect Text Normalization to Normative Standard Finnish

We compare different LSTMs and transformer models in terms of their effe...
research
10/06/2015

Analyzer and generator for Pali

This work describes a system that performs morphological analysis and ge...
research
10/06/2020

A Novel Challenge Set for Hebrew Morphological Disambiguation and Diacritics Restoration

One of the primary tasks of morphological parsers is the disambiguation ...
research
12/02/2019

Morphological Tagging and Lemmatization of Albanian: A Manually Annotated Corpus and Neural Models

In this paper, we present the first publicly available part-of-speech an...

Please sign up or login with your details

Forgot password? Click here to reset