SemEval 2019 Shared Task: Cross-lingual Semantic Parsing with UCCA - Call for Participation

05/31/2018 ∙ by Daniel Hershcovich, et al. ∙ Hebrew University of Jerusalem 0

We announce a shared task on UCCA parsing in English, German and French, and call for participants to submit their systems. UCCA is a cross-linguistically applicable framework for semantic representation, which builds on extensive typological work and supports rapid annotation. UCCA poses a challenge for existing parsing techniques, as it exhibits reentrancy (resulting in DAG structures), discontinuous structures and non-terminal nodes corresponding to complex semantic units. Given the success of recent semantic parsing shared tasks (on SDP and AMR), we expect the task to have a significant contribution to the advancement of UCCA parsing in particular, and semantic parsing in general. Furthermore, existing applications for semantic evaluation that are based on UCCA will greatly benefit from better automatic methods for UCCA parsing. The competition website is https://competitions.codalab.org/competitions/19160

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Overview

Semantic representation is receiving growing attention in NLP in the past few years, and many proposals for semantic schemes have recently been put forth. Examples include Abstract Meaning Representation (AMR; Banarescu et al., 2013), Broad-coverage Semantic Dependencies (SDP; Oepen et al., 2016), Universal Decompositional Semantics (UDS; White et al., 2016), Parallel Meaning Bank Abzianidze et al. (2017), and Universal Conceptual Cognitive Annotation (UCCA; Abend and Rappoport, 2013). These advances in semantic representation, along with corresponding advances in semantic parsing, hold promise benefit essentially all text understanding tasks, and have already demonstrated applicability to summarization (Liu et al., 2015; Dohare and Karnick, 2017), paraphrase detection Issa et al. (2018), and semantic evaluation (using UCCA; see below).

In addition to their potential applicative value, work on semantic parsing poses interesting algorithmic and modeling challenges, which are often different from those tackled in syntactic parsing, including reentrancy (e.g., for sharing arguments across predicates), and the modeling of the interface with lexical semantics. Semantic parsing into such schemes has been much advanced by recent SemEval workshops, including two tasks on Broad-coverage Semantic Dependency Parsing Oepen et al. (2014, 2015) and two tasks on AMR parsing May (2016); May and Priyadarshi (2017). We expect that a SemEval task on UCCA parsing to have a similar effect. Moreover, given the conceptual similarity between the different semantic representations Abend and Rappoport (2017), it is likely that work on UCCA parsing will directly contribute to the development of other semantic parsing technology. Furthermore, conversion scripts are available between UCCA and the SDP and AMR formats.111https://github.com/huji-nlp/semstr/tree/master/semstr/convert.py We encourage teams that participated in past shared tasks on AMR and SDP to participate using similar systems and a conversion-based protocol.

UCCA is a cross-linguistically applicable semantic representation scheme, building on the established Basic Linguistic Theory typological framework Dixon (2010b, a, 2012). It has demonstrated applicability to multiple languages, including English, French and German (with pilot annotation projects on Czech, Russian and Hebrew), and stability under translation Sulem et al. (2015)

. It has proven useful for defining semantic evaluation measures for text-to-text generation tasks, including machine translation

Birch et al. (2016), text simplification Sulem et al. (2018) and grammatical error correction Choshen and Abend (2018) (see §3).

UCCA supports rapid annotation by non-experts, assisted by an accessible annotation interface Abend et al. (2017). The interface is powered by an open-source, flexible web-application for syntactic and semantic phrase-based annotation in general, and for UCCA annotation in particular.222https://github.com/omriabnd/UCCA-App

2 Task Definition

UCCA represents the semantics of linguistic utterances as directed acyclic graphs (DAGs), where terminal (childless) nodes correspond to the text tokens, and non-terminal nodes to semantic units that participate in some super-ordinate relation. Edges are labeled, indicating the role of a child in the relation the parent represents. Nodes and edges belong to one of several layers, each corresponding to a “module” of semantic distinctions.

UCCA’s foundational layer

covers the predicate-argument structure evoked by predicates of all grammatical categories (verbal, nominal, adjectival and others), the inter-relations between them, and other major linguistic phenomena such as semantic heads and multi-word expressions. It is the only layer for which annotated corpora exist at the moment, and will thus be the target of this shared task. The layer’s basic notion is the

Scene, describing a state, action, movement or some other relation that evolves in time. Each Scene contains one main relation (marked as either a Process or a State), as well as one or more Participants. For example, the sentence “After graduation, John moved to Paris” (Figure 1) contains two Scenes, whose main relations are “graduation” and “moved”. “John” is a Participant in both Scenes, while “Paris” only in the latter. Further categories account for inter-Scene relations and the internal structure of complex arguments and relations (e.g. coordination, multi-word expressions and modification).

UCCA distinguishes primary edges, corresponding to explicit relations, from remote edges (appear dashed in Figure 1) that allow for a unit to participate in several super-ordinate relations. Primary edges form a tree in each layer, whereas remote edges enable reentrancy, forming a DAG.

After

graduation

,

John

moved

to

Paris

width=.25margin=1pt,frame P process A participant H linked scene C center R relator N connector L scene linker U punctuation F function unit

Figure 1: Left: An example UCCA graph. The dashed edge is a remote edge. Pre-terminal nodes and edges are omitted for brevity. Right: legend of edge labels.

UCCA graphs may contain implicit units with no correspondent in the text. Figure 2 shows the annotation for the sentence “A similar technique is almost impossible to apply to other crops, such as cotton, soybeans and rice.”. The sentence was used by Oepen et al. (2015) to compare different semantic dependency schemes. It includes a single Scene, whose main relation is “apply”, a secondary relation “almost impossible”, as well as two complex arguments: “a similar technique” and the coordinated argument “such as cotton, soybeans, and rice.” In addition, the Scene includes an implicit argument, which represents the agent of the “apply” relation.

A

similar

technique

  

is

almost

impossible

IMPLICIT

to

apply

to

other

crops

,

such as

cotton

,

soybeans

and

rice

  

  

.

Figure 2: UCCA example with an implicit unit.

While parsing technology is well-established for syntactic parsing, UCCA has several distinct properties that distinguish it from syntactic representations, mostly UCCA’s tendency to abstract away from syntactic detail that do not affect argument structure. For instance, consider the following examples where the concept of a Scene has a different rationale from the syntactic concept of a clause. First, non-verbal predicates in UCCA are represented like verbal ones, such as when they appear in copula clauses or noun phrases. Indeed, in Figure 1, “graduation” and “moved” are considered separate Scenes, despite appearing in the same clause. Second, in the same example, “John” is marked as a (remote) Participant in the graduation Scene, despite not being explicitly mentioned. Third, consider the possessive construction in “John’s trip home”. While in UCCA “trip” evokes a Scene in which “John” is a Participant, a syntactic scheme would analyze this phrase similarly to “John’s shoes”.

The differences in the challenges posed by syntactic parsing and UCCA parsing, and more generally semantic parsing, motivate the development of targeted parsing technology to tackle it.

The only existing parser for UCCA is TUPA Hershcovich et al. (2017), a neural transition-based parser (see §5), which will serve as a baseline for the proposed task. Hershcovich et al. (2017) received considerable attention when presented in ACL last year, including an Outstanding Paper Award, underscoring the timeliness of this proposal.

3 Existing Application: Text-to-text Generation Evaluation

UCCA has shown applicability to text-to-text generation evaluation, in three recently published works. The HUME (Human UCCA-based MT Evaluation) measure Birch et al. (2016) is a human evaluation, which uses UCCA to decompose the source sentence into semantic units, which are then individually evaluated manually. It thus addresses the diminishing inter-annotator agreement rates of human ranking measures with the sentence length, as well as provides an indication of which parts of the source are incorrectly translated. UCCA is an appealing analysis for such ends, due to its cross-linguistic applicability. HUME is evaluated on four language pairs, demonstrating good inter-annotator agreement rates and correlation with human adequacy scores.

The SAMSA measure (Simplification Automatic evaluation Measure through Semantic Annotation) Sulem et al. (2018) for text simplification is the first measure to address structural aspects of text simplification. It uses UCCA on the source side to compare the input and the output of a simplification system. The use of UCCA allows measuring the extent to which the meaning of the input is preserved in the output, and whether the input sentence was split to semantic units of the right granularity. Experiments show that SAMSA correlates with human system-level ranking, unlike existing measures for text simplification, that seem to focus on lexical simplification.

For grammatical error correction, USim (UCCA similarity) Choshen and Abend (2018) is the first measure to allow for a reference-less complement to grammaticality measure, together creating a reference less measure that scores meaning preservation and grammaticality of corrections. Similarly to SAMSA, the use of UCCA provides means to measure the extent to which the meaning of the input is preserved in the output. It was shown that UCCA structures are hardly affected by human corrections but are greatly affected by corrections from systems with low reference-based corrections.

4 Data & Resources

Competitions will be carried out in three languages: English, German and French. For English, we will use the Wikipedia UCCA corpus (henceforth Wiki) and the UCCA Twenty Thousand Leagues Under the Sea English-French-German parallel corpus (henceforth 20K Leagues), which includes manual UCCA annotation for the entire book on the German side, and UCCA manual annotation for the first five chapters on the French and English sides. The statistics for the corpora are given in Table 4.

train/trial dev test total
corpus sentences tokens sentences tokens sentences tokens passages sentences tokens
English-Wiki 4113 124935 514 17784 515 15854 367 5142 158573
English-20K 0 0 0 0 492 12574 154 492 12574
French-20K 15 618 238 6374 239 5962 154 492 12954
German-20K 5211 119872 651 12334 652 12325 367 6514 144531

The corpora were manually annotated and reviewed by a second annotator, additionally, they passed automatic validation and normalization scripts. All UCCA corpora are freely available333https://github.com/UniversalConceptualCognitiveAnnotation under the Creative Commons Attribution-ShareAlike 3.0 Unported license,444http://creativecommons.org/licenses/by-sa/3.0.

In addition to the in-domain test set, the English part of the 20K Leagues corpus (12K tokens; Sulem et al., 2015) will serve as an out-of-domain test set. In German, the train, development, and test sets will be taken from the 20K Leagues corpus, and will jointly consist of 6004 sentences, corresponding to about 136K tokens. Given the small amount of annotated data available for French, we will only provide development and test sets for this setting. We expect systems for French to use semi-supervised approaches, such as cross-lingual learning or structure projection using the parallel corpus, or relying on datasets with related formalisms such as Universal Dependencies Nivre et al. (2016). We will also release the full unannotated 20K Leagues corpus, tokenized and aligned (automatically) in the three languages, in order to facilitate participants to take cross-lingual approaches.

We provide a validation script that can be used on system outputs.555https://github.com/huji-nlp/ucca/blob/master/scripts/validate.py Its goal is to rule out cases that are inconsistent with the UCCA annotation guidelines. For example, a Scene, defined by the presence of a Process or a State, should include at least one Participant. We will provide a full documentation of the validation rules, to allow participants to integrate them as constraints into their parsers.

Data sets are released XML format, including tokenized text automatically pre-processed using spaCy (see §6), and gold-standard UCCA annotation for the train and development sets.666https://github.com/UniversalConceptualCognitiveAnnotation/docs/blob/master/FORMAT.md An example XML is given in Figure 3.777https://github.com/UniversalConceptualCognitiveAnnotation/docs/blob/master/toy.xml These representations can be read and manipulated using the UCCA toolkit.888https://github.com/huji-nlp/ucca Test sets will follow the same format, but without layer 1 (i.e., the element <layer layerID="1"><).

<root annotationID="0" passageID="504">
  <attributes />
  <layer layerID="0">
    <attributes />
    <node ID="0.1" type="Word">
      <attributes paragraph="1"
        paragraph_position="1"
        text="After" />
      <extra dep="prep" ent_iob="2"
        ent_type="" head="4"
        lemma="after" pos="ADP"
        shape="Xxxxx" tag="IN" />
    </node>
    <node ID="0.2" type="Word">
      <attributes paragraph="1"
        paragraph_position="2"
        text="graduation" />
      <extra dep="pobj" ent_iob="2"
        ent_type="" head="-1"
        lemma="graduation" pos="NOUN"
        shape="xxxx" tag="NN" />
    </node>
    <node ID="0.3" type="Punctuation">
      <attributes paragraph="1"
        paragraph_position="3" text="," />
      <extra dep="punct" ent_iob="2"
        ent_type="" head="2" lemma=","
        pos="PUNCT" shape=","  tag="," />
    </node>
    <node ID="0.4" type="Word">
      <attributes paragraph="1"
        paragraph_position="4"
        text="Mary" />
      <extra dep="nsubj" ent_iob="3"
        ent_type="PERSON" head="1"
        lemma="mary" pos="PROPN"
        shape="Xxxx" tag="NNP" />
    </node>
    <node ID="0.5" type="Word">
      <attributes paragraph="1"
        paragraph_position="5"
        text="moved" />
      <extra dep="ROOT" ent_iob="2"
        ent_type="" head="0"
        lemma="move" pos="VERB"
        shape="xxxx" tag="VBD" />
    </node>
    <node ID="0.6" type="Word">
      <attributes paragraph="1"
        paragraph_position="6" text="to" />
      <extra dep="prep" ent_iob="2"
        ent_type="" head="-1" lemma="to"
        pos="ADP" shape="xx"  tag="IN" />
    </node>
    <node ID="0.7" type="Word">
      <attributes paragraph="1"
        paragraph_position="7"
        text="New" />
      <extra dep="compound" ent_iob="3"
        ent_type="GPE" head="1"
        lemma="new" pos="PROPN"
        shape="Xxx" tag="NNP" />
    </node>
    <node ID="0.8" type="Word">
      <attributes paragraph="1"
        paragraph_position="8"
        text="York" />
      <extra dep="compound" ent_iob="1"
        ent_type="GPE" head="1"
        lemma="york" pos="PROPN"
        shape="Xxxx" tag="NNP" />
    </node>
    <node ID="0.9" type="Word">
      <attributes paragraph="1"
        paragraph_position="9"
        text="City" />
      <extra dep="pobj" ent_iob="1"
        ent_type="GPE" head="-3"
        lemma="city" pos="PROPN"
        shape="Xxxx" tag="NNP" />
    </node>
    <node ID="0.10" type="Punctuation">
      <attributes paragraph="1"
        paragraph_position="10" text="." />
      <extra dep="punct" ent_iob="2"
        ent_type="" head="-5" lemma="."
        pos="PUNCT" shape="."  tag="." />
    </node>
  </layer>
  <layer layerID="1">
    <attributes />
    <node ID="1.1" type="FN">
      <attributes />
      <edge toID="1.2" type="L">
        <attributes />
      </edge>
      <edge toID="1.3" type="H">
        <attributes />
      </edge>
      <edge toID="1.5" type="U">
        <attributes />
      </edge>
      <edge toID="1.6" type="H">
        <attributes />
      </edge>
      <edge toID="1.12" type="U">
        <attributes />
      </edge>
    </node>
    <node ID="1.2" type="FN">
      <attributes />
      <edge toID="0.1" type="Terminal">
        <attributes />
      </edge>
    </node>
    <node ID="1.3" type="FN">
      <attributes />
      <edge toID="1.4" type="P">
        <attributes />
      </edge>
      <edge toID="1.7" type="A">
        <attributes remote="True" />
      </edge>
    </node>
    <node ID="1.4" type="FN">
      <attributes />
      <edge toID="0.2" type="Terminal">
        <attributes />
      </edge>
    </node>
    <node ID="1.5" type="PNCT">
      <attributes />
      <edge toID="0.3" type="Terminal">
        <attributes />
      </edge>
    </node>
    <node ID="1.6" type="FN">
      <attributes />
      <edge toID="1.7" type="A">
        <attributes />
      </edge>
      <edge toID="1.8" type="P">
        <attributes />
      </edge>
      <edge toID="1.9" type="A">
        <attributes />
      </edge>
    </node>
    <node ID="1.7" type="FN">
      <attributes />
      <edge toID="0.4" type="Terminal">
        <attributes />
      </edge>
    </node>
    <node ID="1.8" type="FN">
      <attributes />
      <edge toID="0.5" type="Terminal">
        <attributes />
      </edge>
    </node>
    <node ID="1.9" type="FN">
      <attributes />
      <edge toID="1.10" type="R">
        <attributes />
      </edge>
      <edge toID="1.11" type="C">
        <attributes />
      </edge>
    </node>
    <node ID="1.10" type="FN">
      <attributes />
      <edge toID="0.6" type="Terminal">
        <attributes />
      </edge>
    </node>
    <node ID="1.11" type="FN">
      <attributes />
      <edge toID="0.7" type="Terminal">
        <attributes />
      </edge>
      <edge toID="0.8" type="Terminal">
        <attributes />
      </edge>
      <edge toID="0.9" type="Terminal">
        <attributes />
      </edge>
    </node>
    <node ID="1.12" type="PNCT">
      <attributes />
      <edge toID="0.10" type="Terminal">
        <attributes />
      </edge>
    </node>
  </layer>
</root><
Figure 3: Example for XML representation of the UCCA passage from Figure 1.

5 Pilot Task

Two works have been published on UCCA parsing Hershcovich et al. (2017, 2018)

, presenting TUPA, a transition-based DAG parser based on a BiLSTM-based classifier.

999https://github.com/huji-nlp/tupa

Several baselines have been proposed, using different classifiers (sparse perceptron or feedforward neural network), and using conversion-based approaches that use existing parsers for other formalisms to parse UCCA by constructing a two-way conversion protocol between the formalisms. TUPA has shown superior performance over all such approaches, and will thus serve as a strong baseline for systems submissions to the shared task.

Experiments were done in several settings:

  1. English in-domain, training on the Wiki training set and testing on the Wiki test set,

  2. English out-of-domain, training on the Wiki training set but testing on the 20K Leagues test set,

  3. German in domain, training on the 20K Leagues training set and testing on its test set,

  4. French in domain, training on the 20K Leagues training set and testing on its test set.

The pilot task results are summarized in Table 1. It was carried out on version 1.2 of the Wiki dataset, and version 1.0 of the English 20K Leagues dataset. An unpublished experiment using a perceptron with randomly-initialized embedding inputs, instead of sparse features, yielded poor results (TUPADense). Another unpublished experiment with an ensemble of three BiLSTM models (TUPABiLSTM PoE) yielded the best results on primary edges in the in-domain setting (75% ), but no improvement on remote edges (48.7% ). An improvement was also observed in the out-of-domain setting, obtaining 69.6% on primary edges and 28% on remote edges.101010For ensembling, during inference we used Product of Experts (PoE; Hinton, 2002) to combine the predictions of three models trained in the same setting, but with different random seeds.

Taking a conversion-based approach using existing parsers yielded varying performance. The best performing a conversion to bi-lexical trees, followed by a stack-LSTM dependency tree parser Dyer et al. (2015), which after conversion reached 69.9% on primary edges in the in-domain setting. Nevertheless, given the parser produces trees, rather than DAGs, it was unable to produce any remote edges.

Wiki (in-domain) 20K Leagues (out-of-domain)
Primary Remote Primary Remote
LP LR LF LP LR LF LP LR LF LP LR LF
TUPASparse 64.5 63.7 64.1 19.8 13.4 16 59.6 59.9 59.8 22.2 7.7 11.5
TUPADense 59.1 58.9 59 17.4 12.4 14.5 57.0 57.9 57.4 10.8 4.2 6.0
TUPAMLP 65.2 64.6 64.9 23.7 13.2 16.9 62.3 62.6 62.5 20.9 6.3 9.7
TUPABiLSTM 74.4 72.9 73.6 53 50 51.5 69 69 69 41.2 19.8 26.7
TUPABiLSTM PoE 75.9 74.1 75 50.7 46.8 48.7 69.7 69.5 69.6 50.5 19.4 28
TUPABiLSTM MTL 75.6 73.1 74.4 50.9 53.2 52 71.2 70.9 71 45.1 22 29.6
Bilexical Approximation (Dependency DAG Parsers)
Upper Bound 91 58.3 91.3 43.4
DAGParser 61.8 55.8 58.6 9.5 0.5 1 56.4 50.6 53.4 0 0
TurboParser 57.7 46 51.2 77.8 1.8 3.7 50.3 37.7 43.1 100 0.4 0.8
Tree Approximation (Constituency Tree Parser)
Upper Bound 100 100
uparse 60.9 61.2 61.1 52.7 52.8 52.8
Bilexical Tree Approximation (Dependency Tree Parsers)
Upper Bound 91 91.3
MaltParser 62.8 57.7 60.2 57.8 53 55.3
LSTM Parser 73.2 66.9 69.9 66.1 61.1 63.5
Table 1: Pilot task results, in percents, on v1.2 of the English Wiki test set (left) and v1.0 of the 20K Leagues set (right). Columns correspond to labeled precision, recall and , for both primary and remote edges. upper bounds are reported for the conversions. TUPASparse and TUPAMLP are from Hershcovich et al. (2017), as well as the conversion-based baselines: DAGParser Ribeyre et al. (2014), TurboParser Almeida and Martins (2015), uparse Maier and Lichte (2016), MaltParser Nivre et al. (2007), and the LSTM parser Dyer et al. (2015). TUPADense and TUPABiLSTM PoE are unpublished results. TUPABiLSTM and TUPABiLSTM MTL are from Hershcovich et al. (2018).
Primary Remote
LP LR LF LP LR LF
French (in-domain)
TUPABiLSTM 68.2 67 67.6 26  9.4 13.9
TUPABiLSTM MTL 70.3 70 70.1 43.8 13.2 20.3
German (in-domain)
TUPABiLSTM 73.3 71.7 72.5 57.1 17.7 27.1
TUPABiLSTM MTL 73.7 72.6 73.2 61.8 24.9 35.5
Table 2: Results on v1.0 of the the French 20K Leagues test set and v0.9 of the German 20K Leagues test set, from Hershcovich et al. (2018). Columns correspond to labeled precision, recall and , for both primary and remote edges.

Finally, Hershcovich et al. (2018) showed improvements from multitask learning to UCCA parsing, using AMR Banarescu et al. (2013), SDP Oepen et al. (2016) and UD Nivre et al. (2016) as auxiliary tasks (TUPABiLSTM MTL). They also carried out experiments on version 1.0 of the French 20K Leagues dataset (splitting it to training, development and test, despite its small size), and on version 0.9 (pre-release) of the German 20K dataset. For French and German, only UD was used as an auxiliary task in multitask learning (TUPABiLSTM MTL).

TUPA will be used as a baseline for the French and German settings as well. Since the task will include no training data for French, we will use a version trained on English and German, with a delexicalized feature set (not included in the pilot task).

6 Evaluation

Submission conditions.

Participants in the task will be evaluated in four settings:

  1. English in-domain setting, using the Wiki corpus.

  2. English out-of-domain setting, using the Wiki corpus as training and development data, and 20K Leagues as test data.

  3. German in-domain setting, using the 20K Leagues corpus.

  4. French setting with no training data, using the 20K Leagues as development and test data.

In order to allow both even ground comparison between systems and using hitherto untried resources, we will hold both an open and a closed track for submissions in the English and German settings. Closed track submissions will only be allowed to use the gold-standard UCCA annotation distributed for the task in the target language, and will be limited in their use of additional resources. Concretely, the additional data they will be allowed to use will only consist of that used by TUPA, which consists of automatic annotations provided by spaCy Honnibal and Montani (2018):111111http://spacy.io POS tags, syntactic dependency relations, and named entity types and spans. In addition, the closed track will allow the use of word embeddings provided by fastText Bojanowski et al. (2017)121212http://fasttext.cc for all languages.

Systems in the open track, on the other hand, will be allowed to use any additional resource, such as UCCA annotation in other languages, dictionaries or datasets for other tasks, provided that they make sure not to use any additional gold standard annotation over the same text used in the UCCA corpora.131313We are not aware of any such annotation, but include this restriction for completeness. In both tracks, we will require that submitted systems will not be trained on the development data. Due to the absence of an established pilot study for French, we will only hold an open track for this setting.

The four settings and two tracks result in a total of 7 competitions, where a team may participate in anywhere between 1 and 7 of them. We will encourage submissions in each track to use their systems to produce results in all settings. In addition, we will encourage closed-track submissions to also submit to the open track.

Scoring.

In order to evaluate how similar an output graph is to a gold one, we use DAG . Formally, over two UCCA annotations and that share their set of leaves (tokens) and for a node in or , define its yield as its set of leaf descendants. Define a pair of edges and to be matching if

and they have the same label. Labeled precision and recall are defined by dividing the number of matching edges in

and by and respectively. DAG

, their harmonic mean, collapses to the common parsing

if are trees.

This measure disregards implicit nodes. We aim to extend it to include them, defining implicit units in and in to be matching, if and only if they have the same label and , where denotes the parent of a unit.

For a more fine-grained evaluation, Precision, Recall and of specific category (edge labels) sets will also be reported. UCCA labels can be divided into categories that correspond to Scene elements (States, Processes, Participants, Adverbials), non-Scene elements (Elaborators, Connectors, Centers), and inter-Scene Linkage (Parallel Scenes, Linkage, Ground). We will report performance for each of these sets separately, leaving out Function and Relator units that do not belong to any particular model.

An official evaluation script providing both coarse-grained and fine-grained scores is available.141414https://github.com/huji-nlp/ucca/blob/master/scripts/evaluate_standard.py

7 Task organizers

  1. Daniel Hershcovich. PhD candidate at the Hebrew University of Jerusalem. danielh@cs.huji.ac.il. Research interests: semantic parsing, structure prediction, transition-based parsing. Relevant experience: first author on the TUPA papers Hershcovich et al. (2017, 2018). Development and maintenance of the UCCA toolkit Python codebase.

  2. Leshem Choshen. PhD candidate at the Hebrew University of Jerusalem. leshem.choshen@mail.huji.ac.il. Research interests: multi-lingual text-to-text generation, multi-modal semantics. Relevant experience: first author on the GEC UCCA-based evaluation measure USim Choshen and Abend (2018).

  3. Elior Sulem. PhD candidate at the Hebrew University of Jerusalem. eliors@cs.huji.ac.il. Research interests: sentence-level semantics, cross-linguistic divergences, text simplification, machine translation. Relevant experience: First author on a cross-linguistic divergence study with UCCA annotation Sulem et al. (2015) and on the paper presenting the UCCA-based structural semantic evaluation for Text Simplification SAMSA Sulem et al. (2018).

  4. Zohar Aizenbud. MSc student at the Hebrew University of Jerusalem. zohara@cs.huji.ac.il. Research interests: machine translation, computational semantics.

  5. Ari Rappoport. Associate Professor of Computer Science at The Hebrew University of Jerusalem. arir@cs.huji.ac.il. Research interests: computational semantics, semantic parsing, language in the brain. Relevant experience: co-developer of the UCCA scheme. Published about 60 papers in NLP conferences (ACL, NAACL, EMNLP etc.) in the last 11 years. Supervised 25 graduate student theses in NLP.

  6. Omri Abend. Senior Lecturer (Assistant Professor) of Computer Science and Cognitive Science in the Hebrew University of Jerusalem. oabend@cs.huji.ac.il. Research interests: computational semantics and specifically, cross-linguistically applicable semantic and grammatical representation, semantic parsing, corpus annotation, evaluation. Relevant experience: co-developer of the UCCA scheme, partner in all annotation, parsing and application efforts related to UCCA, publishes regularly in NLP conferences (ACL, NAACL, EMNLP etc.).

References