Cross-linguistically Consistent Semantic and Syntactic Annotation of Child-directed Speech

09/22/2021
by   Ida Szubert, et al.
6

While corpora of child speech and child-directed speech (CDS) have enabled major contributions to the study of child language acquisition, semantic annotation for such corpora is still scarce and lacks a uniform standard. We compile two CDS corpora with sentential logical forms, one in English and the other in Hebrew. In compiling the corpora we employ a methodology that enforces a cross-linguistically consistent representation, building on recent advances in dependency representation and semantic parsing. The corpora are based on a sizable portion of Brown's Adam corpus from CHILDES (about 80 child-directed utterances), and to all child-directed utterances from Berman's Hebrew CHILDES corpus Hagar. We begin by annotating the corpora with the Universal Dependencies (UD) scheme for syntactic annotation, motivated by its applicability to a wide variety of domains and languages. We then proceed by applying an automatic method for transducing sentential logical forms (LFs) from UD structures. The two representations have complementary strengths: UD structures are language-neutral and support direct annotation, whereas LFs are neutral as to the interface between syntax and semantics, and transparently encode semantic distinctions. We verify the quality of the annotated UD annotation using an inter-annotator agreement study. We then demonstrate the utility of the compiled corpora through a longitudinal corpus study of the prevalence of different syntactic and semantic phenomena.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/10/2017

Universal Semantic Parsing

Universal Dependencies (UD) offer a uniform cross-lingual syntactic repr...
research
05/13/2016

Universal Dependencies for Learner English

We introduce the Treebank of Learner English (TLE), the first publicly a...
research
05/29/2023

Syntax and Semantics Meet in the "Middle": Probing the Syntax-Semantics Interface of LMs Through Agentivity

Recent advances in large language models have prompted researchers to ex...
research
08/14/2020

Annotating for Hate Speech: The MaNeCo Corpus and Some Input from Critical Discourse Analysis

This paper presents a novel scheme for the annotation of hate speech in ...
research
10/27/2022

Creating a morphological and syntactic tagged corpus for the Uzbek language

Nowadays, creation of the tagged corpora is becoming one of the most imp...
research
09/12/2023

Widely Interpretable Semantic Representation: Frameless Meaning Representation for Broader Applicability

This paper presents a novel semantic representation, WISeR, that overcom...
research
12/24/2017

Semi-automatic definite description annotation: a first report

Studies in Referring Expression Generation (REG) often make use of corpo...

Please sign up or login with your details

Forgot password? Click here to reset