Parsing Tweets into Universal Dependencies

04/23/2018
by   Yijia Liu, et al.
0

We study the problem of analyzing tweets with Universal Dependencies. We extend the UD guidelines to cover special constructions in tweets that affect tokenization, part-of-speech tagging, and labeled dependencies. Using the extended guidelines, we create a new tweet treebank for English (Tweebank v2) that is four times larger than the (unlabeled) Tweebank v1 introduced by Kong et al. (2014). We characterize the disagreements between our annotators and show that it is challenging to deliver consistent annotation due to ambiguity in understanding and explaining tweets. Nonetheless, using the new treebank, we build a pipeline system to parse raw tweets into UD. To overcome annotation noise without sacrificing computational efficiency, we propose a new method to distill an ensemble of 20 transition-based parsers into a single one. Our parser achieves an improvement of 2.2 in LAS over the un-ensembled baseline and outperforms parsers that are state-of-the-art on other treebanks in both accuracy and speed.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/21/2020

The Persian Dependency Treebank Made Universal

We describe an automatic method for converting the Persian Dependency Tr...
research
03/15/2017

SyntaxNet Models for the CoNLL 2017 Shared Task

We describe a baseline dependency parsing system for the CoNLL2017 Share...
research
02/24/2020

Resources for Turkish Dependency Parsing: Introducing the BOUN Treebank and the BoAT Annotation Tool

In this paper, we describe our contributions and efforts to develop Turk...
research
09/28/2022

Data-driven Parsing Evaluation for Child-Parent Interactions

We present a syntactic dependency treebank for naturalistic child and ch...
research
06/21/2022

Building an Endangered Language Resource in the Classroom: Universal Dependencies for Kakataibo

In this paper, we launch a new Universal Dependencies treebank for an en...
research
07/24/2022

Enhancements to the BOUN Treebank Reflecting the Agglutinative Nature of Turkish

In this study, we aim to offer linguistically motivated solutions to res...

Please sign up or login with your details

Forgot password? Click here to reset