ThamizhiUDp: A Dependency Parser for Tamil

This paper describes how we developed a neural-based dependency parser, namely ThamizhiUDp, which provides a complete pipeline for the dependency parsing of the Tamil language text using Universal Dependency formalism. We have considered the phases of the dependency parsing pipeline and identified tools and resources in each of these phases to improve the accuracy and to tackle data scarcity. ThamizhiUDp uses Stanza for tokenisation and lemmatisation, ThamizhiPOSt and ThamizhiMorph for generating Part of Speech (POS) and Morphological annotations, and uuparser with multilingual training for dependency parsing. ThamizhiPOSt is our POS tagger, which is based on the Stanza, trained with Amrita POS-tagged corpus. It is the current state-of-the-art in Tamil POS tagging with an F1 score of 93.27. Our morphological analyzer, ThamizhiMorph is a rule-based system with a very good coverage of Tamil. Our dependency parser ThamizhiUDp was trained using multilingual data. It shows a Labelled Assigned Score (LAS) of 62.39, 4 points higher than the current best achieved for Tamil dependency parsing. Therefore, we show that breaking up the dependency parsing pipeline to accommodate existing tools and resources is a viable approach for low-resource languages.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/12/2020

Exploiting Cross-Dialectal Gold Syntax for Low-Resource Historical Languages: Towards a Generic Parser for Pre-Modern Slavic

This paper explores the possibility of improving the performance of spec...
research
02/24/2020

A Hybrid Approach to Dependency Parsing: Combining Rules and Morphology with Deep Learning

Fully data-driven, deep learning-based models are usually designed as la...
research
04/26/2020

Semi-Supervised Neural System for Tagging, Parsing and Lematization

This paper describes the ICS PAS system which took part in CoNLL 2018 sh...
research
07/16/2021

POS tagging, lemmatization and dependency parsing of West Frisian

We present a lemmatizer/POS-tagger/dependency parser for West Frisian us...
research
01/29/2019

Universal Dependency Parsing from Scratch

This paper describes Stanford's system at the CoNLL 2018 UD Shared Task....
research
05/01/2020

Spatial Dependency Parsing for 2D Document Understanding

Information Extraction (IE) for document images is often approached as a...
research
05/26/2020

Analysis of the Penn Korean Universal Dependency Treebank (PKT-UD): Manual Revision to Build Robust Parsing Model in Korean

In this paper, we first open on important issues regarding the Penn Kore...

Please sign up or login with your details

Forgot password? Click here to reset