Universal Dependencies for Learner English

05/13/2016
by   Yevgeni Berzak, et al.
0

We introduce the Treebank of Learner English (TLE), the first publicly available syntactic treebank for English as a Second Language (ESL). The TLE provides manually annotated POS tags and Universal Dependency (UD) trees for 5,124 sentences from the Cambridge First Certificate in English (FCE) corpus. The UD annotations are tied to a pre-existing error annotation of the FCE, whereby full syntactic analyses are provided for both the original and error corrected versions of each sentence. Further on, we delineate ESL annotation guidelines that allow for consistent syntactic treatment of ungrammatical English. Finally, we benchmark POS tagging and dependency parsing performance on the TLE dataset and measure the effect of grammatical errors on parsing accuracy. We envision the treebank to support a wide range of linguistic and computational research on second language acquisition as well as automatic processing of ungrammatical language. The treebank is available at universaldependencies.org. The annotation manual used in this project and a graphical query engine are available at esltreebank.org.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/27/2023

CGELBank Annotation Manual v1.0

CGELBank is a treebank and associated tools based on a syntactic formali...
research
10/21/2020

Classifying Syntactic Errors in Learner Language

We present a method for classifying syntactic errors in learner language...
research
04/06/2021

SERRANT: a syntactic classifier for English Grammatical Error Types

SERRANT is a system and code for automatic classification of English gra...
research
06/24/2019

On the Definition of Japanese Word

The annotation guidelines for Universal Dependencies (UD) stipulate that...
research
06/21/2022

Building an Endangered Language Resource in the Classroom: Universal Dependencies for Kakataibo

In this paper, we launch a new Universal Dependencies treebank for an en...
research
09/22/2021

Cross-linguistically Consistent Semantic and Syntactic Annotation of Child-directed Speech

While corpora of child speech and child-directed speech (CDS) have enabl...
research
05/26/2020

Analysis of the Penn Korean Universal Dependency Treebank (PKT-UD): Manual Revision to Build Robust Parsing Model in Korean

In this paper, we first open on important issues regarding the Penn Kore...

Please sign up or login with your details

Forgot password? Click here to reset