On the Definition of Japanese Word

06/24/2019
by   Yugo Murawaki, et al.
0

The annotation guidelines for Universal Dependencies (UD) stipulate that the basic units of dependency annotation are syntactic words, but it is not clear what are syntactic words in Japanese. Departing from the long tradition of using phrasal units called bunsetsu for dependency parsing, the current UD Japanese treebanks adopt the Short Unit Words. However, we argue that they are not syntactic word as specified by the annotation guidelines. Although we find non-mainstream attempts to linguistically define Japanese words, such definitions have never been applied to corpus annotation. We discuss the costs and benefits of adopting the rather unfamiliar criteria.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/22/2020

Universal Dependencies v2: An Evergrowing Multilingual Treebank Collection

Universal Dependencies is an open community effort to create cross-lingu...
research
05/13/2016

Universal Dependencies for Learner English

We introduce the Treebank of Learner English (TLE), the first publicly a...
research
05/21/2023

A Pilot Study on Dialogue-Level Dependency Parsing for Chinese

Dialogue-level dependency parsing has received insufficient attention, e...
research
06/01/2021

An In-depth Study on Internal Structure of Chinese Words

Unlike English letters, Chinese characters have rich and specific meanin...
research
04/19/2016

Syntactic and semantic classification of verb arguments using dependency-based and rich semantic features

Corpus Pattern Analysis (CPA) has been the topic of Semeval 2015 Task 15...
research
11/03/2020

Treebanking User-Generated Content: a UD Based Overview of Guidelines, Corpora and Unified Recommendations

This article presents a discussion on the main linguistic phenomena whic...
research
11/26/2022

The distribution of syntactic dependency distances

The syntactic structure of a sentence can be represented as a graph wher...

Please sign up or login with your details

Forgot password? Click here to reset