Entity-Augmented Distributional Semantics for Discourse Relations

by   Yangfeng Ji, et al.
Georgia Institute of Technology

Discourse relations bind smaller linguistic elements into coherent texts. However, automatically identifying discourse relations is difficult, because it requires understanding the semantics of the linked sentences. A more subtle challenge is that it is not enough to represent the meaning of each sentence of a discourse relation, because the relation may depend on links between lower-level elements, such as entity mentions. Our solution computes distributional meaning representations by composition up the syntactic parse tree. A key difference from previous work on compositional distributional semantics is that we also compute representations for entity mentions, using a novel downward compositional pass. Discourse relations are predicted not only from the distributional representations of the sentences, but also of their coreferent entity mentions. The resulting system obtains substantial improvements over the previous state-of-the-art in predicting implicit discourse relations in the Penn Discourse Treebank.



page 1

page 2

page 3

page 4


One Vector is Not Enough: Entity-Augmented Distributional Semantics for Discourse Relations

Discourse relations bind smaller linguistic units into coherent texts. H...

Event in Compositional Dynamic Semantics

We present a framework which constructs an event-style dis- course seman...

Towards Compositional Distributional Discourse Analysis

Categorical compositional distributional semantics provide a method to d...

Role Semantics for Better Models of Implicit Discourse Relations

Predicting the structure of a discourse is challenging because relations...

Evaluating the Impact of a Hierarchical Discourse Representation on Entity Coreference Resolution Performance

Recent work on entity coreference resolution (CR) follows current trends...

In Search of Meaning: Lessons, Resources and Next Steps for Computational Analysis of Financial Discourse

We critically assess mainstream accounting and finance research applying...

Composition of Sentence Embeddings:Lessons from Statistical Relational Learning

Various NLP problems -- such as the prediction of sentence similarity, e...

Code Repositories


Encoder Decoder for DRR

view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The high-level organization of text can be characterized in terms of discourse relations between adjacent spans of text (Knott, 1996; Mann, 1984; Webber et al., 1999). Identifying these relations has been shown to be relevant to tasks such as summarization (Louis et al., 2010; Yoshida et al., 2014)

, sentiment analysis 

(Somasundaran et al., 2009), and coherence evaluation (Lin et al., 2011). While the Penn Discourse Treebank (PDTB) now provides a large dataset annotated for discourse relations (Prasad et al., 2008), the automatic identification of implicit discourse relations is a difficult task, with state-of-the-art performance at roughly 40% (Lin et al., 2009).

One reason for this poor performance is that predicting implicit discourse relations is a fundamentally semantic task, and the relevant semantics may be difficult to recover from surface level features. For example, consider the discourse relation between the following two sentences in Example (1), where a discourse connector like “because” seems appropriate to indicate the relationship. However, without discourse connector, there is little surface information to signal the relationship. We address this issue by applying a discriminatively-trained model of compositional distributional semantics to discourse relation classification (Socher et al., 2013; Baroni et al., 2014)

. The meaning of each sentence is represented as a vector 

(Turney et al., 2010), which is computed through a series of compositional operations over the parse tree.

Example (1) : Bob gave Tina the burger. Example (2) : Bob gave Tina the burger.
She was hungry. He was hungry.

We further argue that purely vector-based representations on sentences are insufficiently expressive to capture discourse relations. To see why, consider what happens in Example (2), where a tiny change is made based on Example (1). After changing the subject of the second sentence to Bob, the original discourse relation seems no longer holding in Example (2). But despite the radical difference in meaning, the distributional representation of the second sentence will be almost unchanged: the syntactic structure remains identical, and the words “he” and “she” have very similar word representations. We address this issue by computing vector representations not only for each sentence, but also for each coreferent entity mention within the sentences. These representations are meant to capture the role played by the entity in the text. We compute entity-role representations using a novel feed-forward compositional model, which combines upward and downward passes through the syntactic structure. Representations for these coreferent mentions are then combined into a classification model, and help to predict the implicit discourse relation. In combination, our approach achieves a 3% improvement in accuracy over the best previous work (Lin et al., 2009) on the second-level discourse relation identification in the PDTB.111For more details, please refer to the long version of this paper (Ji & Eisenstein, 2015).

Our model requires a syntactic parse tree, which is produced automatically from the Stanford CoreNLP parser Klein & Manning (2003)

. A reviewer asked whether it might be better to employ a left-to-right recurrent neural network, which would obviate the need for this language-specific resource. While it would clearly be preferable to avoid the use of language-specific resources whenever possible, we think this approach is unlikely to succeed in this case. A key difference between language and other types of data is that language has inherent recursive structure. A rich literature in both linguistics and natural language processing elaborates on the close relationship between (recursively-structured) syntax and semantics. Therefore, we see strong theoretical evidence — as well as practical evidence from the history of natural language processing — that syntactic parse structures are central to capturing the meaning in text.

Regarding the multilingual question, there are now accurate parsers and annotated treebanks for dozens of languages,222http://universaldependencies.github.io/docs/#language-other and training accurate parsers for “low resource” languages is a hot research topic, with substantial interest from both industry and academia. Languages differ substantially in the importance of word ordering, with English emphasizing word order more than most other languages (Bender, 2013). To our knowledge, it is an open question as to whether left-to-right recurrent neural networks will successfully extract meaning in languages where word order is more free.

2 Entity augmented distributional semantics for Relation Identification

We briefly describe our approach to entity-augmented distributional semantics and to discourse relation identification. Our relation identification model is named as disco2, since it is a distributional compositional approach to discourse relations.

2.1 Entity augmented distributional semantics

The entity-augmented distributional semantics includes two passes in composition procedure: the upward pass for distributional representation of sentence, while the downward pass for distributional representation of entities shared between sentences.

Upward pass

Distributional representations for sentences are computed in a feed-forward upward

pass: each non-terminal in the binarized syntactic parse tree has a

-dimensional distributional representation that is computed from the distributional representations of its children, bottoming out in representations of individual words. We follow the Recursive Neural Network (RNN) model proposed by Socher et al. (2011). Specifically, for a given parent node , we denote the left child as , and the right child as . We compose their representations to obtain, , where is the element-wise hyperbolic tangent function, and is the upward composition matrix. We apply this compositional procedure from the bottom up, ultimately obtaining the sentence-level representation .

Downward pass

As seen in the contrast between Examples (1) and (2), a model that uses a single vector representation for each sentence would find little to distinguish between “she was hungry” and “he was hungry”. It would therefore almost certainly fail to identify the correct discourse relation for at least one of these cases, which requires tracking the roles played by the entities that are coreferent in each pair of sentences. To address this issue, we augment the representation of each sentence with additional vectors, representing the semantics of the role played by each coreferent entity in each sentence. Rather than represent this information in a logical form — which would require robust parsing to a logical representation — we represent it through additional distributional vectors. The role of a constituent can be viewed as a combination of information from two neighboring nodes in the parse tree: its parent , and its sibling . We can make a downward pass, computing the downward vector from the downward vector of the parent , and the upward vector of the sibling : , where is the downward composition matrix. The base case of this recursive procedure occurs at the root of the parse tree, which is set equal to the upward representation, .

2.2 Relation identification model

To predict the discourse relation between an sentence pair , the decision function is a sum of bilinear products,


where the predicted relation is given by , and are the classification parameters for relation . A scalar is used as the bias term for relation , and is the set of coreferent entity mentions shared among the sentence pair . For the cases where there are no coreferent entity mentions between two sentences, , the classification model considers only the upward vectors at the root. We also use the surface features vector in the decision function, as we find that, this approach outperforms prior work on the classification of implicit discourse relations in the PDTB, when combined with a small number of surface features.

3 Experiments

Model +Entity semantics +Surface features Accuracy(%)
Prior work
1. Lin et al. (2009) Yes 40.2
Our work
2. Surface feature model Yes 39.69
3. disco2 No No 50 36.98
4. disco2 Yes No 50 37.63
5. disco2 No Yes 50 42.53
6. disco2 Yes Yes 50 43.56
signficantly better than Lin et al. (2009) with
Table 1: Experimental results on multiclass classification of second-level discourse relations. The results of Lin et al. (2009) are shown in line 1; the results for our reimplementation of this system are shown in line 2.

We evaluate our approach on the implicit discourse relation identification in the Penn Discourse Treebank (PDTB). PDTB relations may be explicit, meaning that they are signaled by discourse connectives (e.g., because); alternatively, they may be implicit

, meaning that the connective is absent. We focus on the more challenging problem of classifying implicit discourse relations. Aiming to build a discourse parser in future, we follow the same experimental setting proposed by

Lin et al. (2009), and evaluate our relation identification model on the second-level relation types.

We run the Stanford parser (Klein & Manning, 2003) and the Berkeley coreference system (Durrett & Klein, 2013) to obtain syntactic trees and coreference results respectively. In the PDTB, each discourse relation is annotated between two argument spans. For non-sentence argument span, we identify the syntactic subtrees with the span, and construct a right-branching superstructure to unify them into a tree.

Table 1 presents results for multiclass identification of second-level PDTB relations. As shown in lines 5 and 6, disco2 outperforms the prior state-of-the-art (line 1). The strongest performance is obtained by including the entity distributional semantics, with a 3.4% improvement over the accuracy reported by Lin et al. (2009) (). The improvement over our reimplementation of this work (line 2) is even greater, which shows how the distributional representation provides additional value over the surface features. The contribution of entity semantics is shown in Table 1 by the accuracy differences between lines 3 and 4, and between lines 5 and 6.

4 Conclusion

Discourse relations are determined by the meaning of their arguments, and progress on discourse parsing therefore requires computing representations of the argument semantics. We present a compositional method for inducing distributed representations not only of discourse arguments, but also of the entities that thread through the discourse. By jointly learning the relation classification weights and the compositional operators, this approach outperforms prior work based on hand-engineered surface features. More discussion and experimental results can be found in a forthcoming journal paper (Ji & Eisenstein, 2015).


  • Baroni et al. (2014) Baroni, Marco, Bernardi, Raffaella, and Zamparelli, Roberto. Frege in space: A program for compositional distributional semantics. Linguistic Issues in Language Technologies, 2014.
  • Bender (2013) Bender, Emily M. Linguistic Fundamentals for Natural Language Processing: 100 Essentials from Morphology and Syntax, volume 6 of Synthesis Lectures on Human Language Technologies. Morgan & Claypool Publishers, June 2013. doi: 10.2200/s00493ed1v01y201303hlt020. URL http://dx.doi.org/10.2200/s00493ed1v01y201303hlt020.
  • Durrett & Klein (2013) Durrett, Greg and Klein, Dan. Easy Victories and Uphill Battles in Coreference Resolution. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Seattle, Washington, October 2013. Association for Computational Linguistics.
  • Ji & Eisenstein (2015) Ji, Yangfeng and Eisenstein, Jacob. One vector is not enough: Entity-augmented distributional semantics for discourse relations. Conditionally accepted to Transactions of the Association for Computational Linguistics (TACL), 2015.
  • Klein & Manning (2003) Klein, Dan and Manning, Christopher D. Accurate unlexicalized parsing. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 1, pp. 423–430. Association for Computational Linguistics, 2003.
  • Knott (1996) Knott, Alistair. A data-driven methodology for motivating a set of coherence relations. PhD thesis, The University of Edinburgh, 1996.
  • Lin et al. (2009) Lin, Ziheng, Kan, Min-Yen, and Ng, Hwee Tou. Recognizing implicit discourse relations in the Penn Discourse Treebank. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1-Volume 1, pp. 343–351. Association for Computational Linguistics, 2009.
  • Lin et al. (2011) Lin, Ziheng, Ng, Hwee Tou, and Kan, Min-Yen. Automatically Evaluating Text Coherence Using Discourse Relations. In Proceedings of ACL, pp. 997–1006, Portland, Oregon, USA, June 2011. Association for Computational Linguistics.
  • Louis et al. (2010) Louis, Annie, Joshi, Aravind, and Nenkova, Ani. Discourse indicators for content selection in summarization. In Proceedings of the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pp. 147–156. Association for Computational Linguistics, 2010.
  • Mann (1984) Mann, William.

    Discourse structures for text generation.

    In Proceedings of the 10th International Conference on Computational Linguistics and 22nd annual meeting on Association for Computational Linguistics, pp. 367–375. Association for Computational Linguistics, 1984.
  • Prasad et al. (2008) Prasad, Rashmi, Dinesh, Nikhil, Lee, Alan, Miltsakaki, Eleni, Robaldo, Livio, Joshi, Aravind, and Webber, Bonnie. The Penn Discourse Treebank 2.0. In LREC, 2008.
  • Socher et al. (2011) Socher, Richard, Lin, Cliff C, Manning, Chris, and Ng, Andrew Y. Parsing natural scenes and natural language with recursive neural networks. In

    Proceedings of the 28th International Conference on Machine Learning

    , pp. 129–136, 2011.
  • Socher et al. (2013) Socher, Richard, Perelygin, Alex, Wu, Jean Y, Chuang, Jason, Manning, Christopher D, Ng, Andrew Y, and Potts, Christopher. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of Empirical Methods for Natural Language Processing (EMNLP), 2013.
  • Somasundaran et al. (2009) Somasundaran, Swapna, Namata, Galileo, Wiebe, Janyce, and Getoor, Lise. Supervised and unsupervised methods in employing discourse relations for improving opinion polarity classification. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pp. 170–179. Association for Computational Linguistics, 2009.
  • Turney et al. (2010) Turney, Peter D, Pantel, Patrick, et al. From frequency to meaning: Vector space models of semantics.

    Journal of artificial intelligence research

    , 37(1):141–188, 2010.
  • Webber et al. (1999) Webber, Bonnie, Knott, Alistair, Stone, Matthew, and Joshi, Aravind. Discourse relations: A structural and presuppositional account using lexicalised tag. In Proceedings of the Association for Computational Linguistics (ACL), pp. 41–48, 1999.
  • Yoshida et al. (2014) Yoshida, Yasuhisa, Suzuki, Jun, Hirao, Tsutomu, and Nagata, Masaaki.

    Dependency-based Discourse Parser for Single-Document Summarization.

    In EMNLP, 2014.