Using Semantic Role Knowledge for Relevance Ranking of Key Phrases in Documents: An Unsupervised Approach

08/09/2019 ∙ by Prateeti Mohapatra, et al. ∙ ibm 0

In this paper, we investigate the integration of sentence position and semantic role of words in a PageRank system to build a key phrase ranking method. We present the evaluation results of our approach on three scientific articles. We show that semantic role information, when integrated with a PageRank system, can become a new lexical feature. Our approach had an overall improvement on all the data sets over the state-of-art baseline approaches.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Graph based approaches are currently the state-of-the-art for unsupervised key phrase ranking  Hasan and Ng (2014). TextRank  Mihalcea and Tarau (2004) and SingleRank  Wan and Xiao (2008) build a graph from the document and rank its nodes according to their importance. PositionRank uses a word position information approach to rank key phrases  Florescu and Caragea (2017). More recently,  Boudin (2018) used a multipartite graph structure to represent documents as tightly connected sets of topic related candidates. For intra-topic key phrase ranking, they used features like position information of phrases, where, key phrases are promoted if they occur at the beginning of the document.

The above-cited papers make the assumption that words found early in a document are more important than words occurring later. However, this assumption is not always true. Consider the following example from a document:
Qualified planning of a power supply concept is the key to the efficiency of electric power supply. Here, word position would give higher relevance to qualified planning; however, intuitively electric power supply should have got the higher relevance score for this sentence.

There can be other scenarios where sentences can be written or expressed in different ways for the same intention or meaning. Take, for example, the following two sentences:
Sent1: Folic Acid can improve anxiety.
Sent2: Anxiety can be controlled by Folic Acid.

For Sent2, using word position, Anxiety gets a higher relevance than Folic Acid, since it appears earlier. However, in the first sentence Folic Acid gets a higher relevance than Anxiety, even though both sentences convey the same meaning.

Hence, lack of semantic information in the current system may yield incorrect results. In order to identify the relevance of a key phrase in a sentence, one needs to understand the event representations embedded in the text, and its lexical semantic relations within the sentence. Motivated by this, we investigate here the effect of integrating sentence position of words and semantic role information for key phrase(s) relevance ranking.

2 Related Work

TextRank Mihalcea and Tarau (2004), SingleRank Wan and Xiao (2008) and TopicRank Liu et al. (2010) algorithms, to name a few, are the state-of-the-art techniques being used for graph-based key phrase extraction. Others have incorporated word position information as node weights to score key phrases  Florescu and Caragea (2017); Danesh et al. (2015). Other approaches have also used word clustering and topic modeling techniques and then extracted key phrases from each topic  Bougouin et al. (2013); Boudin (2018).

None of the above approaches consider the semantic role of a word in a sentence, the heuristics being that the relevance of key phrases should be based on their corresponding roles in the sentence.

3 Semantic Role Labeling (SRL)

Semantic role labeling  Pradhan et al. (2004); Gildea and Jurafsky (2002) is a shallow semantic parsing task describing who did what to whom, where, when, etc. For each predicate in a sentence, SRL identifies constituents which either play a semantic role (agent, patient, instrument, etc.) or act as an adjunct (location, manner, temporal etc.). Our SRL realization is based on the implementation by Roth  Roth and Woodsend (2014).

There are five semantic roles numbered from Arg0 to Arg5, and a secondary agent tag (ArgA) for proto-agents. ArgA is used with respect to a predicate where the verb indicates a secondary agent (proto-agent). For example, in the sentence , the predicate walk is referring to the agent dog, but not the primary agent John. This scenario of proto-agent role is very rarely seen with other verbs, hence, we give lower priority to ArgA. As per PropBank guidelines, the numbered arguments are ranked inverse to their suffix numbers.

There are eighteen adjunct labels, ranging from Cause and Purpose to Manner and Locative. It is a rare phenomenon to have an adjunct as a key phrase. So a rank for adjunct is given a value lower than that for the numbered arguments. We show in Figure 1 the semantic roles with numbered arguments and adjuncts (e.g. A0, A1, A2) for the example mentioned in Section 1. As shown, the semantic roles of Folic Acid and anxiety are same irrespective of how the sentence is expressed.

Figure 1: Semantic Role Labels for the examples

4 SemanticRank

This section describes our method for key phrase extraction and ranking by using position information of key phrases in a sentence, their frequency in the document, and their semantic roles.

4.1 Graph Generation

Given a document, we first do sentence tokenization and use the Stanford parser to apply part-of-speech filter, and then select, as candidate words, only the nouns and adjectives. We then use the Porter stemmer to stem all the words by removing the words’ suffixes.

Similar to previous works, a graph is generated for the document using the filtered words as nodes. An edge between two nodes is formed if they co-occur within a window of contiguous words in the document. Edge weight between the nodes is assigned based on the number of co-occurrences of the nodes. As the previous works have established that the direction of the edges do not impact the performance Mihalcea and Tarau (2004), we construct an undirected graph for simplicity.

4.2 PageRank

PageRank is used in order to identify the importance of a node, based on the vertex degree Page et al. (1999). PageRank (PR) of a given node () is calculated recursively using:

where, is the damping factor to make sure that the process does not go into infinity loop, is the number of nodes in the graph, represents an edge from node to node and represents the number of (outgoing) edges from the node . The of each vertex is initialized with the node weights calculated using the formulae (defined in the below section). PageRank algorithm is iterated until there is no or negligible () change of scores between two successive iterations. In our case, we fixed to , to and number of iterations to 100 in the interest of time.

4.3 Node Weight Assignment

We first weigh each candidate word with its inverse sentence position in the document. The position scores are then multiplied with the frequency score to get the sentence score of a word. Mathematically, for a document, let denote the set of candidate words, and be the frequency of in the sentence position. The sentence position based score of the candidate word is

where, is the th sentence where appears.

We incorporate the semantic roles of the candidate words by parsing the candidate set of sentences using SRL to get the constituents. The SRL-based score of a word in a sentence is

where, is the rank of Arg , is the total number of predicates for the word with respect to the sentence and is the highest predicate rank that is assigned. The SRL scores of a word in each sentence is then combined to get the final SRL score of the candidate word. The SRL-based score of a word is

where, is the number of sentences in the document where the word appeared.

The sentence position scores and the SRL-based scores of a word are divided by the number of sentences to get normalized scores. Both the scores are then multiplied to get the final relevance score of the word. The final score of a node (candidate word) in the graph is

where, is the number of sentences in the document where the word appeared.

4.4 Candidate Phrase Generation

As in Florescu and Caragea (2017), phrases in the form of regular expressions are considered as potential key phrases. The length of the phrases are restricted to three by allowing only unigrams, bigrams, and trigrams to increase the precision by keeping the recall constant. The overall score of a key phrase is considered as the sum of the PageRank scores of individual words in the phrase. The overall score of a given phrase is

where, is the total number of words in a phrase, and is the PageRank score of the node.

5 Evaluation

Here, we evaluate our relevance ranking method based on its key phrase ranking efficacy on various datasets and baselines.

5.1 Data

The experimental datasets for key phrase ranking consisted of titles and abstracts from three datasets of scientific articles: KDD  Florescu and Caragea (2017), WWW Florescu and Caragea (2017), and Inspec Hulth (2003). Inspec had controlled and uncontrolled set of annotated key phrases. We used both for our analysis. Table 1 summarizes all the datasets, where Ct_Docs is the number of documents in each dataset, Ct_CandPhrases is the number of candidate phrases in each dataset, Avg_CandPhrases is the average number of candidate phrases per document, Ct_KeyPhrases is the number of key phrases in each dataset, Avg_KeyPhrases is the average number of key phrases per document, Ct_Sents is the number of sentences per document, and Avg_Sents is the average number of sentences per document.

Dataset KDD WWW Inspec
Ct_Docs 834 1350 2000
Ct_CandPhrases 53642 80519 61234
Avg_CandPhrases 64.3 59.6 30.6
Ct_KeyPhrases 3093 6405 28222
Avg_KeyPhrases 3.7 4.7 14.1
Ct_Sents 6665 9896 13888
Avg_Sents 7.9 7.3 6.9
Table 1: Datasets used in our Experiments

5.2 Experimental Setups

For each experiment setup, we find precision (P), recall (R), and F1 measure (F1). Precision is the percentage of correctly extracted key phrases by the total extracted key phrases, Recall is the percentage of correctly extracted key phrases by the total author-labeled key phrases, and F1 is the harmonic mean of precision and recall. We also get the Mean Reciprocal Rank (MRR) and Mean Average Precision (MAP) scores between the ground truth key phrases and the key phrases relevance ranking  

Boudin (2018); Liu et al. (2010).

Figure 2: MRR and MAP comparison for SemanticRank and baselines on the three datasets.
Dataset Method Top2(%) Top4(%) Top6(%) Top8(%)
P R F1 P R F1 P R F1 P R F1
KDD TextRank 8.1 4.0 5.3 8.3 8.5 8.1 8.1 12.3 9.4 7.6 15.3 9.8
SingleRank 9.1 4.6 6.0 9.3 9.4 9.0 8.7 13.1 10.1 8.1 16.4 10.6
TopicRank 9.3 4.8 6.2 9.1 9.3 8.9 8.8 13.4 10.3 8.0 16.2 10.4
PositionRank 11.1 5.6 7.3 10.8 11.1 10.6 9.8 15.3 11.6 9.2 18.9 12.1
Multipartite Rank 15.2 7.7 10.0 11.3 11.4 11 8.8 13.3 10.2 7.2 14.6 9.4
SemanticRank 13.1 6.7 8.6 11.5 11.8 11.3 10.4 15.9 12.2 9.6 19.2 12.8
WWW TextRank 7.7 3.7 4.8 8.6 7.9 8.0 8.1 12.3 9.8 8.2 15.2 10.2
SingleRank 9.1 4.2 5.6 9.6 8.9 8.9 9.3 13.0 10.5 8.8 16.3 11.0
TopicRank 8.8 4.2 5.5 9.6 8.9 8.9 9.5 13.2 10.7 9.0 16.5 11.2
PositionRank 11.3 5.3 7.0 11.3 10.5 10.5 10.8 14.9 12.1 9.9 18.1 12.3
Multipartite Rank 22.6 7.9 11.3 17 11.8 13.3 13.6 13.9 13.1 11.1 15 12.2
SemanticRank 18.7 6.5 9.3 17 11.8 13.3 15.5 16 15 14.1 19.5 15.6
Inspec TextRank 18.7 3.6 6.0 16.1 5.3 8.0 16.3 5.7 8.5 17.5 9.5 12.3
SingleRank 20.1 4.3 7.1 18.5 6.0 9.1 18.2 9.8 12.7 17.0 10.5 13.0
TopicRank 25.9 4.4 7.3 22.6 7.4 10.7 20 9.7 12.5 18.3 11.7 13.6
PositionRank 36.5 6.2 10.2 32.5 10.6 15.4 29.3 14.1 18.1 26.6 16.8 19.6
Multipartite Rank 27.7 4.6 7.7 23.7 7.8 11.2 21 10.2 13.1 19 12.2 14.1
SemanticRank 36.5 6.2 10.3 32.5 10.6 15.4 29.4 14.1 18.2 26.9 17.1 19.8
Table 2: Performance Comparison

5.3 Results

We compare the performance of our model against five baselines: TextRank Mihalcea and Tarau (2004), SingleRank Wan and Xiao (2008), TopicRank Liu et al. (2010), PositionRank Florescu and Caragea (2017) and MultipartiteRank Boudin (2018). Table 2 gives the performance of the different methods in terms of , and for top predicted key phrases. The best scores are highlighted in bold. On both KDD and WWW data sets, for and , our method has an improvement in F1 score over all baseline methods. For the WWW dataset, there is a statistically significant improvement in F1 score over all the baseline methods (). For the KDD dataset, a p-value of and was obtained for and respectively, indicating that semantic roles influence the ranking of key phrases but significantly only at level of significance. For , our method performs on-par with MultipartiteRank on both KDD and WWW. Finally, on the Inspec dataset, our method achieves competitive and comparable results with PositionRank.

Figure 2 shows, in a bar chart form, the MRR and MAP values of all the methods on the three datasets. As shown in the figure, for both KDD and WWW datasets, SemanticRank acheived better MRR score than the other baseline methods. For Inspec, SemanticRank’s performance is comparable to PositionRank. The MAP scores for TopicRank and Multipartite Rank (MPR) are high, as they filter out and present only representative key phrases for each topic of candidate phrases, where as other techniques try to assign scores for each candidate key phrase.

We assigned different ranks to different roles in order of their importance, with ‘Arg0’ having the maximum rank, ‘Arg1’ having the next maximum rank, and had assigned ranks to all other roles in decreasing order of ranks. We changed the ranks of these roles to test the sensitivity of the method. We assigned same ranks to roles ‘Arg0’ and ‘Arg1’; for the other roles we kept the ranks as same as before (SentSRL1). We also experimented with a setting where ’Arg0’ and ‘Arg1’ roles had the same high rank, ‘Arg2’, ‘Arg3’ and ‘Arg4’ have the same low rank and rest of the roles are assigned a very small rank (SentSRL2). Table  LABEL:tab:SAperf2 shows the precision, recall and F1 measure performance comparison of the different combinations of semantic roles on the KDD, WWW and Storwize datasets. For the different combinations of rank assignment to the semantic roles, the precision, recall, and F1 measures are practically the same for KDD and WWW. This indicates that the key phrase ranking method is rather insensitive to variation in the semantic roles’ rank values.

6 Conclusions and Future Work

This paper introduced a linguistically motivated approach to key phrase relevance ranking based on semantic roles of words. The method considered the candidate phrase sentence position and its semantic roles for relevance ranking of phrases. Experimental results show that using semantic role knowledge can effectively improve the quality of key phrases extracted from documents.

Limitations of our approach include: (1) The SRL implementation that we used for our experiments has a precision of 87% for in-domain documents and 77% for out-of-domain documents. (2) SRL works well for well-formed and grammatically correct sentences, and gives inferior results for texts that are not grammatically correct. In scientific articles title holds key information, but are not grammatically complete/correct most of the time. In spite of the limitations in the realization of SRL, sentence-based position information along with SRL gave improved results for well structured datasets. In future work we plan to investigate approaches to integrate semantic roles with the other features, possibly using neural networks to combine them.


  • F. Boudin (2018) Unsupervised keyphrase extraction with multipartite graphs. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pp. 667–672. External Links: Document, Link Cited by: §1, §2, §5.2, §5.3.
  • A. Bougouin, F. Boudin, and B. Daille (2013) Topicrank: graph-based topic ranking for keyphrase extraction. In

    International Joint Conference on Natural Language Processing

    pp. 543–551. Cited by: §2.
  • S. Danesh, T. Sumner, and J. H. Martin (2015) Sgrank: combining statistical and graphical methods to improve the state of the art in unsupervised keyphrase extraction. Lexical and Computational Semantics. Cited by: §2.
  • C. Florescu and C. Caragea (2017) PositionRank: an unsupervised approach to keyphrase extraction from scholarly documents. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1105–1115. External Links: Document, Link Cited by: §1, §2, §4.4, §5.1, §5.3.
  • D. Gildea and D. Jurafsky (2002) Automatic labeling of semantic roles. Comput. Linguist. 28 (3), pp. 245–288. External Links: Link, Document Cited by: §3.
  • K. S. Hasan and V. Ng (2014) Automatic keyphrase extraction: a survey of the state of the art. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Baltimore, Maryland, pp. 1262–1273. External Links: Link Cited by: §1.
  • A. Hulth (2003)

    Improved automatic keyword extraction given more linguistic knowledge

    In Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, EMNLP ’03, Stroudsburg, PA, USA, pp. 216–223. External Links: Link, Document Cited by: §5.1.
  • Z. Liu, W. Huang, Y. Zheng, and M. Sun (2010) Automatic keyphrase extraction via topic decomposition. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, EMNLP ’10, Stroudsburg, PA, USA, pp. 366–376. External Links: Link Cited by: §2, §5.2, §5.3.
  • R. Mihalcea and P. Tarau (2004) TextRank: bringing order into texts. In Proceedings of EMNLP-04and the 2004 Conference on Empirical Methods in Natural Language Processing, Cited by: §1, §2, §4.1, §5.3.
  • L. Page, S. Brin, R. Motwani, and T. Winograd (1999) The pagerank citation ranking: bringing order to the web.. Technical report Stanford InfoLab. Cited by: §4.2.
  • S. S. Pradhan, W. H. Ward, K. Hacioglu, J. H. Martin, and D. Jurafsky (2004)

    Shallow semantic parsing using support vector machines.

    In HLT-NAACL, pp. 233–240. Cited by: §3.
  • M. Roth and K. Woodsend (2014) Composition of word representations improves semantic role labelling. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 407–413. Cited by: §3.
  • X. Wan and J. Xiao (2008) Single document keyphrase extraction using neighborhood knowledge. In

    Proceedings of the 23rd National Conference on Artificial Intelligence - Volume 2

    Cited by: §1, §2, §5.3.