ThamizhiUDp: A Dependency Parser for Tamil

12/24/2020
by   Kengatharaiyer Sarveswaran, et al.
0

This paper describes how we developed a neural-based dependency parser, namely ThamizhiUDp, which provides a complete pipeline for the dependency parsing of the Tamil language text using Universal Dependency formalism. We have considered the phases of the dependency parsing pipeline and identified tools and resources in each of these phases to improve the accuracy and to tackle data scarcity. ThamizhiUDp uses Stanza for tokenisation and lemmatisation, ThamizhiPOSt and ThamizhiMorph for generating Part of Speech (POS) and Morphological annotations, and uuparser with multilingual training for dependency parsing. ThamizhiPOSt is our POS tagger, which is based on the Stanza, trained with Amrita POS-tagged corpus. It is the current state-of-the-art in Tamil POS tagging with an F1 score of 93.27. Our morphological analyzer, ThamizhiMorph is a rule-based system with a very good coverage of Tamil. Our dependency parser ThamizhiUDp was trained using multilingual data. It shows a Labelled Assigned Score (LAS) of 62.39, 4 points higher than the current best achieved for Tamil dependency parsing. Therefore, we show that breaking up the dependency parsing pipeline to accommodate existing tools and resources is a viable approach for low-resource languages.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

11/12/2020

Exploiting Cross-Dialectal Gold Syntax for Low-Resource Historical Languages: Towards a Generic Parser for Pre-Modern Slavic

This paper explores the possibility of improving the performance of spec...
02/24/2020

A Hybrid Approach to Dependency Parsing: Combining Rules and Morphology with Deep Learning

Fully data-driven, deep learning-based models are usually designed as la...
04/26/2020

Semi-Supervised Neural System for Tagging, Parsing and Lematization

This paper describes the ICS PAS system which took part in CoNLL 2018 sh...
07/16/2021

POS tagging, lemmatization and dependency parsing of West Frisian

We present a lemmatizer/POS-tagger/dependency parser for West Frisian us...
03/25/2018

Scene Graph Parsing as Dependency Parsing

In this paper, we study the problem of parsing structured knowledge grap...
01/29/2019

Universal Dependency Parsing from Scratch

This paper describes Stanford's system at the CoNLL 2018 UD Shared Task....
05/01/2020

Spatial Dependency Parsing for 2D Document Understanding

Information Extraction (IE) for document images is often approached as a...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.