Capturing divergence in dependency trees to improve syntactic projection

05/14/2016
by   Ryan Georgi, et al.
0

Obtaining syntactic parses is a crucial part of many NLP pipelines. However, most of the world's languages do not have large amounts of syntactically annotated corpora available for building parsers. Syntactic projection techniques attempt to address this issue by using parallel corpora consisting of resource-poor and resource-rich language pairs, taking advantage of a parser for the resource-rich language and word alignment between the languages to project the parses onto the data for the resource-poor language. These projection methods can suffer, however, when the two languages are divergent. In this paper, we investigate the possibility of using small, parallel, annotated corpora to automatically detect divergent structural patterns between two languages. These patterns can then be used to improve structural projection algorithms, allowing for better performing NLP tools for resource-poor languages, in particular those that may not have large amounts of annotated data necessary for traditional, fully-supervised methods. While this detection process is not exhaustive, we demonstrate that common patterns of divergence can be identified automatically without prior knowledge of a given language pair, and the patterns can be used to improve performance of projection algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/03/2017

Transferring Semantic Roles Using Translation and Syntactic Information

Our paper addresses the problem of annotation projection for semantic ro...
research
01/23/2014

Improving Statistical Machine Translation for a Resource-Poor Language Using Related Resource-Rich Languages

We propose a novel language-independent approach for improving machine t...
research
04/18/2015

A Knowledge-poor Pronoun Resolution System for Turkish

A pronoun resolution system which requires limited syntactic knowledge t...
research
05/24/2022

Universal Dependency Treebank for Odia Language

This paper presents the first publicly available treebank of Odia, a mor...
research
07/01/2016

Sharing Network Parameters for Crosslingual Named Entity Recognition

Most state of the art approaches for Named Entity Recognition rely on ha...
research
12/20/2022

T-Projection: High Quality Annotation Projection for Sequence Labeling Tasks

In the absence of readily available labeled data for a given task and la...
research
10/20/2020

Comparison of Interactive Knowledge Base Spelling Correction Models for Low-Resource Languages

Spelling normalization for low resource languages is a challenging task ...

Please sign up or login with your details

Forgot password? Click here to reset