Resources for Turkish Dependency Parsing: Introducing the BOUN Treebank and the BoAT Annotation Tool

02/24/2020
by   Utku Türk, et al.
0

In this paper, we describe our contributions and efforts to develop Turkish resources, which include a new treebank (BOUN Treebank) with novel sentences, along with the guidelines we adopted and a new annotation tool we developed (BoAT). The manual annotation process we employed was shaped and implemented by a team of four linguists and five NLP specialists. Decisions regarding the annotation of the BOUN Treebank were made in line with the Universal Dependencies framework, which originated from the works of De Marneffe et al. (2014) and Nivre et al. (2016). We took into account the recent unifying efforts based on the re-annotation of other Turkish treebanks in the UD framework (Türk et al., 2019). Through the BOUN Treebank, we introduced a total of 9,757 sentences from various topics including biographical texts, national newspapers, instructional texts, popular culture articles, and essays. In addition, we report the parsing results of a graph-based dependency parser obtained over each text type, the total of the BOUN Treebank, and all Turkish treebanks that we either re-annotated or introduced. We show that a state-of-the-art dependency parser has improved scores for identifying the proper head and the syntactic relationships between the heads and the dependents. In light of these results, we have observed that the unification of the Turkish annotation scheme and introducing a more comprehensive treebank improves performance with regards to dependency parsing

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/21/2020

The Persian Dependency Treebank Made Universal

We describe an automatic method for converting the Persian Dependency Tr...
research
12/31/2020

UCCA's Foundational Layer: Annotation Guidelines v2.1

This is the annotation manual for Universal Conceptual Cognitive Annotat...
research
01/15/2022

Automatic Correction of Syntactic Dependency Annotation Differences

Annotation inconsistencies between data sets can cause problems for low-...
research
03/17/2019

Technical notes: Syntax-aware Representation Learning With Pointer Networks

This is a work-in-progress report, which aims to share preliminary resul...
research
07/24/2022

Enhancements to the BOUN Treebank Reflecting the Agglutinative Nature of Turkish

In this study, we aim to offer linguistically motivated solutions to res...
research
04/23/2018

Parsing Tweets into Universal Dependencies

We study the problem of analyzing tweets with Universal Dependencies. We...
research
04/22/2022

Out-of-Domain Evaluation of Finnish Dependency Parsing

The prevailing practice in the academia is to evaluate the model perform...

Please sign up or login with your details

Forgot password? Click here to reset