A survey of cross-lingual features for zero-shot cross-lingual semantic parsing

08/27/2019 ∙ by Jingfeng Yang, et al. ∙ Georgia Institute of Technology SAMSUNG 0

The availability of corpora to train semantic parsers in English has lead to significant advances in the field. Unfortunately, for languages other than English, annotation is scarce and so are developed parsers. We then ask: could a parser trained in English be applied to language that it hasn't been trained on? To answer this question we explore zero-shot cross-lingual semantic parsing where we train an available coarse-to-fine semantic parser (Liu et al., 2018) using cross-lingual word embeddings and universal dependencies in English and test it on Italian, German and Dutch. Results on the Parallel Meaning Bank - a multilingual semantic graphbank, show that Universal Dependency features significantly boost performance when used in conjunction with other lexical features but modelling the UD structure directly when encoding the input does not.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Semantic parsing is a task of transducing natural language to meaning representations, which in turn can be expressed through many different semantic formalisms including lambda calculus Zettlemoyer and Collins (2012), DCS Liang et al. (2013), Discourse Representation Theory (DRT) Kamp and Reyle (2013), AMR Banarescu et al. (2013) and so on. This availability of annotated data in English has translated into the development of a plethora of models, including encoder-decoders Dong and Lapata (2016); Jia and Liang (2016) as well as tree or graph-structured decoders Dong and Lapata (2016, 2018); Liu et al. (2018); Yin and Neubig (2017). *Work done when Jingfeng Yang was an intern and Federico Fancellu a post-doc at the University of Edinburgh

Whereas the majority of semantic banks focus on English, recent effort has focussed on building multilingual representations, e.g. PMB Abzianidze et al. (2017), MRS Copestake et al. (1995) and FrameNetPadó and Lapata (2005). However, manually annotating meaning representations in a new language is a painstaking process which explains why there are only a few datasets available for different formalisms in languages other than English. As a consequence, whereas the field has made great advances for English, little work has been done in other languages.

                   sit_down()
Agent(, “speaker”)       open()
laptop()
Owner(, “speaker”)
Agent(, “speaker”)
Theme(, )
CONTINUATION(, )

Figure 1: The Discourse Representation Structure (DRS) for “I sat down and opened my laptop”. For simplicity, we have omitted any time reference.

We ask: can we learn a semantic parser for English and test it where in another where annotations are not available? What would that require?

To answer this question, previous work have leveraged machine translation techniques to map the semantics from a language to another (e.g. Damonte and Cohen, 2018). However, these methods require parallel corpora to extract automatic alignments which are often noisy or not available at all.

In this paper we explore parameter-shared models instead, where a model is trained on English using language independent features and tested in a target language.

To show how this approach performs, we focus on the Parallel Meaning Bank (PMB Abzianidze et al., 2017) – a multilingual semantic bank, where sentences in English, German, Italian and Dutch have been annotated with their meaning representations. The annotations in the PMB are based on Discourse Representation Theory (DRT, Kamp and Reyle, 2013), a popular theory of meaning representation designed to account for intra and inter-sentential phenomena, like temporal expressions and anaphora. Figure 1 shows an example DRT for the sentence ‘I sat down and opened my laptop’ in its canonical ‘box’ representation. A DRS is a nested structure with the top part containing the discourse references and the bottom with unary and binary predicates, as well as semantic constants (e.g. ‘speaker’). DRS can be linked to each other via logic operator (e.g. , , ) or, as in this case, discourse relations (e.g. CONTINUATION, RESULT, ELABORATION, etc.).

To test our approach we leverage the DRT parser of liu2018discourse, an encoder-decoder architecture where the meaning representation is reconstructed in three stages, coarse-to-fine, by first building the DRS skeleton (i.e. the ‘box’ structures) and then fill each DRS with predicates and variables. Whereas the original parser utilizes a sequential Bi-LSTM encoder with monolingual lexical features, we experiment with language-independent features in the form of cross-lingual word-embeddings, universal PoS tags and universal dependencies. In particular, we also make use of tree encoders to assess whether modelling syntax can be beneficial in cross-lingual settings, as shown for other semantic tasks (e.g. negation scope detection Fancellu et al. (2018)).

Results show that language-independent features are a valid alternative to projection methods for cross-lingual semantic parsing. We show that adding dependency relation as features is beneficial, even when they are the only feature used during encoding. However, we also show that modeling the dependency structure directly via tree encoders does not outperform a sequential BiLSTM architecture for the three languages we have experimented with.

2 Methods

2.1 Model

In this section, we describe the modifications to the coarse-to-fine encoder-decoder architecture of Liu et al. (2018); for more detail, we refer the reader to the original paper.

2.1.1 Encoder

BiLSTM. We use Liu et al. (2018)’s Bi-LSTM as baseline. However, whereas the original model represents each token in the input sentence as the concatenation of word () and lemma embeddings, we discard the latter and add a POS tag embedding () and dependency relation embedding () feature. These embeddings are concatenated to represent the input token. The final encoder representation is obtained by concatenating both final forward and backward hidden states.

TreeLSTM. To model the dependency structure directly, we use a child-sum tree-LSTM Tai et al. (2015), where each word in the input sentence corresponds to a node in the dependency tree. In particular, summing across children is advantageous for cross-lingual tasks since languages might display different word orders. Computation follows Equation (1).

(1)

Po/treeLSTM. Completely discarding word order might hurt performance for related languages, where a soft notion of positioning can help. To this end, we add a positional embeddings Vaswani et al. (2017) that helps the child-sum tree-LSTM discriminating between the left and right child of a parent node. This is computed following Equation (2) where is the position of the word, is the dimension in total dimensions.

(2)

Bi/treeLSTM. Finally, similarly to chen2017improved, we combine tree-LSTM and Bi-LSTM, where a tree-LSTM come is initialized using the last layer of a Bi-LSTM, which encodes order information. Computation is shown in Equation (3).

(3)

2.1.2 Decoder

The decoder of liu2018discourse reconstructs the DRS in three steps, by first predicting the overall structure (the ‘boxes’), then the predicates and finally the referents, with each subsequent step being conditioned on the output of the previous. During predicate prediction, the decoder uses a copying mechanism to predict those unary predicates that are also lemmas in the input sentence (e.g. ‘eat’). For the those that are not, soft attention is used instead. No modifications were done to the decoder; for more detail, we refer the reader to the original paper.

2.2 Data

We use the PMB v.2.1.0 for the experiments. The dataset consists of 4405 English sentences, 1173 German sentences, 633 Italian sentences and 583 Dutch sentences. We divide the English sentences into 3072 training sentences, 663 development and 670 testing sentences. We consider all the sentences in other languages as test set.

In order to be used as input to the parser, liu2018discourse first convert the DRS into tree-based representations, which are subsequently linearized into PTB-style bracketed sequences. This transformation is lossless in that re-entrancies are duplicated to fit in the tree structure. We use the same conversion in this work; for further detail we refer the reader to the original paper.

Finally, it is worth noting that lexical predicates in PMB are in English, even for non-English languages. Since this is not compatible with our copy mechanism, we revert predicates to their original language by substituting them with the lemmas of the tokens they are aligned to (since gold alignment information is included in the PMB).

2.3 Cross-lingual features

In order to make the model directly transferable to the German, Italian and Dutch test data, we use the following language-independent features.

Multilingual word embeddings. We use the MUSE Conneau et al. (2017) pre-trained multilingual word embeddings and keep them fixed during training.

UD relations and structure. We use UDPipe Straka and Straková (2017) to obtain parses for English, German, Italian and Dutch. UD relation embeddings are randomly initialized and updated.

Universal POS tags. We use the Universal POS tags Petrov et al. (2011) obtained with UDPipe parser. Universal POS tag embeddings are randomly initialized and updated during training.

2.4 Model comparison

We use the BiLSTM model as baseline (Bi) and compare it to the child-sum tree-LSTM (tree) with positional information added (Po/tree), as well as to a treeLSTM initialized with the hidden states of the BiLSTM(Bi/tree). We also conduct an ablation study on the features used, where WE, PE and DE are the word-embedding, PoS embedding and dependency relation embedding respectively. For completeness, along with the results for the cross-lingual task, we also report results for monolingual English semantic parsing, where word embedding features are randomly initialized.

2.5 Evaluation

We use Counter Van Noord et al. (2018) to evaluate the performance of our models. Counter looks for the best alignment between the predicted and gold DRS and computes precision, recall and F1. For further details about Counter, the reader is referred to van2018evaluating. It is worth reminding that unlike other work on the PMB (e.g. van Noord et al., 2018), Liu et al. (2018) does not deal with presupposition. In the PMB, presupposed variables are extracted from a main box and included in a separate one. In our work, we revert this process so to ignore presupposed boxes. Similarly, we also do not deal with sense tags which we aim to include in future work.

3 Results and Analysis

Model German Italian Dutch
P R F P R F P R F
0.4996 0.4614 0.4797 0.5102 0.5319 0.5208 0.4219 0.4780 0.4482
0.4457 0.375 0.4075 0.5088 0.4257 0.4636 0.4627 0.3592 0.4044
0.5911 0.4546 0.5139 0.5955 0.4894 0.5373 0.5027 0.4296 0.4633
0.5482 0.4587 0.4995 0.4986 0.5498 0.5229 0.4627 0.4943 0.4780
0.6763 0.6060 0.6392 0.7129 0.6669 0.6891 0.6286 0.5381 0.5798
0.6767 0.6080 0.6405 0.6885 0.6429 0.6649 0.5926 0.5437 0.5690
0.6750 0.5280 0.5925 0.6724 0.5637 0.6133 0.6096 0.4728 0.5360
0.6496 0.5950 0.6211 0.6534 0.6393 0.6463 0.5722 0.5369 0.5540
0.6532 0.6290 0.6409 0.6926 0.6749 0.6836 0.5792 0.5318 0.5545
0.6695 0.5822 0.6228 0.6965 0.6133 0.6523 0.6048 0.5609 0.5820
0.6453 0.6250 0.6350 0.6896 0.6622 0.6756 0.5915 0.5671 0.5790
0.6708 0.5921 0.6290 0.6997 0.7002 0.6999 0.6202 0.5919 0.6057
0.6466 0.6335 0.6400 0.7072 0.6902 0.6986 0.6070 0.5729 0.5895
0.6520 0.6294 0.6405 0.7079 0.6793 0.6933 0.6209 0.5828 0.6012
0.6750 0.6169 0.6446 0.7110 0.6622 0.6857 0.6175 0.5481 0.5807
Table 1: Results of zero-shot cross-lingual semantic parsing for models trained in English and tested in German, Italian and Dutch.333Given that we need either word or PoS tag to initialize the BiLSTM, a Bi/tree model cannot be used when testing with dependency features only (DE)
Model P R F
0.8825 0.8453 0.8635
0.8512 0.8154 0.8329
0.8592 0.8296 0.8441
0.8670 0.8433 0.8550
0.8919 0.8584 0.8748
0.8590 0.8362 0.8474
0.8503 0.8305 0.8403
0.8602 0.8369 0.8484
0.6629 0.6417 0.6521
0.6550 0.6589 0.6569
0.6522 0.6591 0.6556
0.8764 0.8593 0.8678
0.8569 0.8356 0.8461
0.8540 0.8396 0.8467
0.8655 0.8369 0.8510
Table 2: Results for monolingual semantic parsing (i.e. trained and tested in English)
German Italian Dutch
P R F P R F P R F
operators 0.7158 0.3778 0.4945 0.9302 0.3846 0.5442 0.5833 0.1892 0.2857
non-lexical predicate 0.6507 0.5887 0.6182 0.6625 0.6848 0.6735 0.6468 0.5970 0.6209
unary 0.7700 0.6641 0.7131 0.7974 0.7730 0.7850 0.7615 0.6645 0.7097
binary 0.5626 0.5281 0.5448 0.5627 0.6117 0.5862 0.5640 0.5433 0.5535
lexical predicate 0.7286 0.7326 0.7306 0.6622 0.7705 0.7123 0.4833 0.6070 0.5381
Table 3: Error analysis.

Table 1 shows the performance of our cross-lingual models in German, Italian and Dutch. We summarize the results as follows:

Dependency features are crucial for zero-shot cross-lingual semantic parsing. Adding dependency features dramatically improves the performance in all three languages, when compared to using multilingual word-embedding and universal PoS embeddings alone. We hypothesize that the quality of the multilingual word-embeddings is poor, given that models using embeddings for the dependency relations alone outperform those using the other two features.

TreeLSTMs slightly improve performance only for German. TreeLSTMs do not outperform a baseline BiLSTM for Italian and Dutch and they show little improvement in performance for German. This might be due to different factors that deserve more analysis including the performance of the parsers and syntactic similarity between these languages. When only dependency features are available, we found treeLSTM to boost performance only for Dutch.

BiLSTM are still state-of-the-art for monolingual semantic parsing for English. Table 2 shows the result for the models trained and tested in English. Dependency features in conjunction with word and PoS embeddings lead to the best performance; however, in all settings explored treeLSTMs do not outperform a BiLSTM.

3.1 Error Analysis

We perform an error analysis to assess the quality of the prediction for operators (i.e. logic operators like “Not” as well as discourse relations “Contrast”), non-lexical predicates, such as binary predicates (e.g. Agent(e,x)) as well as unary predicates (e.g. time(t), entity(x), etc.), as well as for lexical predicates (e.g. open(e)). Results in Table  3 show that predicting operators and binary predicates across language is hard, compared to the other two categories. Prediction of lexical predicates is relatively good even though most tokens in the test set where never seen during training; this can be attributable to the copy mechanism that is able to transfer tokens from the input directly during predication.

4 Related work

Previous work have explored two main methods for cross-lingual semantic parsing. One method requires parallel corpora to extract alignments between source and target languages using machine translation Padó and Lapata (2005); Damonte and Cohen (2017); Zhang et al. (2018) The other method is to use parameter-shared models in the target language and the source language by leveraging language-independent features such as multilingual word embeddings, Universal POS tags and UD Reddy et al. (2017); Duong et al. (2017); Susanto and Lu (2017); Mulcaire et al. (2018).

For semantic parsing, encoder-decoder models have achieved great success. Amongst these, tree or graph-structured decoders have recently shown to be state-of-the-art Dong and Lapata (2016, 2018); Liu et al. (2018); Cheng et al. (2017); Yin and Neubig (2017).

5 Conclusions

We go back to the questions in the introduction:

Can we train a semantic parser in a language where annotation is available?. In this paper we show that this is indeed possible and we propose a zero-shot cross-lingual semantic parsing method based on language-independent features, where a parser trained in English – where labelled data is available, is used to parse sentences in three languages, Italian, German and Dutch.

What would that require? We show that universal dependency features can dramatically improve the performance of a cross-lingual semantic parser but modelling the tree structure directly does not outperform sequential BiLSTM architectures, not even when the two are combined together.

We are planning to extend this initial survey to other DRS parsers that does not exclude presupposition and sense as well as to other semantic formalisms (e.g. AMR, MRS) where data sets annotated in languages other than English are available. Finally, we want to understand whether adding a bidirectionality to the treeLSTM will help improving the performance on modelling the dependency structure directly.

Acknowledgements

This work was done while Federico Fancellu was a post-doctoral researcher at the University of Edinburgh. The views expressed are his own and do not necessarily represent the views of Samsung Research.

References

  • L. Abzianidze, J. Bjerva, K. Evang, H. Haagsma, R. Van Noord, P. Ludmann, D. Nguyen, and J. Bos (2017) The parallel meaning bank: towards a multilingual corpus of translations annotated with compositional meaning representations. arXiv preprint arXiv:1702.03964. Cited by: §1, §1.
  • L. Banarescu, C. Bonial, S. Cai, M. Georgescu, K. Griffitt, U. Hermjakob, K. Knight, P. Koehn, M. Palmer, and N. Schneider (2013) Abstract meaning representation for sembanking. In Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse, pp. 178–186. Cited by: §1.
  • J. Cheng, S. Reddy, V. Saraswat, and M. Lapata (2017) Learning structured natural language representations for semantic parsing. arXiv preprint arXiv:1704.08387. Cited by: §4.
  • A. Conneau, G. Lample, M. Ranzato, L. Denoyer, and H. Jégou (2017) Word translation without parallel data. arXiv preprint arXiv:1710.04087. Cited by: §2.3.
  • A. Copestake, D. Flickinger, R. Malouf, S. Riehemann, and I. Sag (1995) Translation using minimal recursion semantics. In Proceedings of the Sixth International Conference on Theoretical and Methodological Issues in Machine Translation, pp. 15–32. Cited by: §1.
  • M. Damonte and S. B. Cohen (2017) Cross-lingual abstract meaning representation parsing. arXiv preprint arXiv:1704.04539. Cited by: §4.
  • M. Damonte and S. B. Cohen (2018) Cross-lingual abstract meaning representation parsing. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 1146–1155. Cited by: §1.
  • L. Dong and M. Lapata (2016) Language to logical form with neural attention. arXiv preprint arXiv:1601.01280. Cited by: §1, §4.
  • L. Dong and M. Lapata (2018) Coarse-to-fine decoding for neural semantic parsing. arXiv preprint arXiv:1805.04793. Cited by: §1, §4.
  • L. Duong, H. Afshar, D. Estival, G. Pink, P. Cohen, and M. Johnson (2017) Multilingual semantic parsing and code-switching. In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017), pp. 379–389. Cited by: §4.
  • F. Fancellu, A. Lopez, and B. Webber (2018) Neural networks for cross-lingual negation scope detection. arXiv preprint arXiv:1810.02156. Cited by: §1.
  • R. Jia and P. Liang (2016) Data recombination for neural semantic parsing. See DBLP:conf/acl/2016-1, External Links: Link Cited by: §1.
  • H. Kamp and U. Reyle (2013) From discourse to logic: introduction to modeltheoretic semantics of natural language, formal logic and discourse representation theory. Vol. 42, Springer Science & Business Media. Cited by: §1, §1.
  • P. Liang, M. I. Jordan, and D. Klein (2013) Learning dependency-based compositional semantics. Computational Linguistics 39 (2), pp. 389–446. Cited by: §1.
  • J. Liu, S. B. Cohen, and M. Lapata (2018) Discourse representation structure parsing. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vol. 1, pp. 429–439. Cited by: A survey of cross-lingual features for zero-shot cross-lingual semantic parsing, §1, §2.1.1, §2.1, §2.5, §4.
  • P. Mulcaire, S. Swayamdipta, and N. Smith (2018) Polyglot semantic role labeling. arXiv preprint arXiv:1805.11598. Cited by: §4.
  • S. Padó and M. Lapata (2005) Cross-linguistic projection of role-semantic information. In

    Proceedings of the conference on human language technology and empirical methods in natural language processing

    ,
    pp. 859–866. Cited by: §1, §4.
  • S. Petrov, D. Das, and R. McDonald (2011) A universal part-of-speech tagset. arXiv preprint arXiv:1104.2086. Cited by: §2.3.
  • S. Reddy, O. Täckström, S. Petrov, M. Steedman, and M. Lapata (2017) Universal semantic parsing. arXiv preprint arXiv:1702.03196. Cited by: §4.
  • M. Straka and J. Straková (2017) Tokenizing, pos tagging, lemmatizing and parsing ud 2.0 with udpipe. In Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, Vancouver, Canada, pp. 88–99. External Links: Link Cited by: §2.3.
  • R. H. Susanto and W. Lu (2017) Neural architectures for multilingual semantic parsing. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Vol. 2, pp. 38–44. Cited by: §4.
  • K. S. Tai, R. Socher, and C. D. Manning (2015)

    Improved semantic representations from tree-structured long short-term memory networks

    .
    arXiv preprint arXiv:1503.00075. Cited by: §2.1.1.
  • R. Van Noord, L. Abzianidze, H. Haagsma, and J. Bos (2018) Evaluating scoped meaning representations. arXiv preprint arXiv:1802.08599. Cited by: §2.5.
  • R. van Noord, L. Abzianidze, A. Toral, and J. Bos (2018) Exploring neural methods for parsing discourse representation structures. Transactions of the Association for Computational Linguistics 6, pp. 619–633. Cited by: §2.5.
  • A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin (2017) Attention is all you need. In Advances in Neural Information Processing Systems, pp. 5998–6008. Cited by: §2.1.1.
  • P. Yin and G. Neubig (2017) A syntactic neural model for general-purpose code generation. arXiv preprint arXiv:1704.01696. Cited by: §1, §4.
  • L. S. Zettlemoyer and M. Collins (2012) Learning to map sentences to logical form: structured classification with probabilistic categorial grammars. arXiv preprint arXiv:1207.1420. Cited by: §1.
  • S. Zhang, K. Duh, and B. Van Durme (2018) Cross-lingual semantic parsing. arXiv preprint arXiv:1804.08037. Cited by: §4.