Keyphrase generation (KG), a fundamental task in the field of natural language processing (NLP), refers to the generation of a set of keyphrases that expresses the crucial semantic meaning of a document. These keyphrases can be further categorized into present keyphrases that appear in the document and absent keyphrases that do not. Current KG approaches generally adopt an encoder-decoder frameworkSutskever et al. (2014) with attention mechanism Bahdanau et al. (2015); Luong et al. (2015) and copy mechanism Gu et al. (2016); See et al. (2017) to simultaneously predict present and absent keyphrases Meng et al. (2017); Chen et al. (2018); Chan et al. (2019); Chen et al. (2019b, a); Yuan et al. (2020).
Although the proposed methods for keyphrase generation have shown promising results on present keyphrase predictions, they often generate uncontrollable and inaccurate predictions on the absent ones. The main reason is that there are numerous candidates of absent keyphrases that have implicit relationships (e.g., technology hypernyms or task hypernyms) with the concepts in the document. For instance, for a document discussing “LSTM”, all the technology hypernyms like “Neural Network”, “RNN” and “Recurrent Neural Network” can be its absent keyphrases candidates. When dealing with scarce training data or limited model size, it is non-trivial for the model to summarize and memorize all the candidates accurately. Thus, one can expect that the generated absent keyphrases are often sub-optimal when the candidate set in model’s mind is relatively small or inaccurate. This problem is crucial because absent keyphrases account for a large proportion of all the ground-truth keyphrases. As shown in Figure1, in some datasets, up to 50% of the keyphrases are absent.
To address this problem, we propose a novel graph-based method to capture explicit knowledge from related references. Each reference is a retrieved document-keyphrases pair from a predefined index (e.g., the training set) that similar to the source document. This is motivated by the fact that the related references often contain candidate or even ground-truth absent keyphrases of the source document. Empirically, we find three retrieved references cover up to 27% of the ground-truth absent keyphrases on average (see Section 4.3 for details).
Our heterogeneous graph is designed to incorporate knowledge from the related references. It contains source document, reference and keyword nodes, and has the following advantages: (a) different reference nodes can interact with the source document regarding the explicit shared keyword information, which can enrich the semantic representation of the source document; (b) a powerful structural prior is introduced as the keywords are highly overlapped with the ground-truth keyphrases. Statistically, we collect the top five keywords from each document on the validation set, and we find that these keywords contain 68% of the tokens in the ground-truth keyphrases. On the decoder side, as a portion of absent keyphrases directly appear in the references, we propose a hierarchical attention and copy mechanism for copying appropriate words from both source document and its references based on their relevance and significance.
The main contributions of this paper can be summarized as follows: (1) we design a heterogeneous graph network for keyphrase generation, which can enrich the source document node through keyword nodes and retrieved reference nodes; (2) we propose a hierarchical attention and copy mechanism to facilitate the decoding process, which can copy appropriate words from both the source document and retrieved references; and (3) our proposed method outperforms other state-of-the-art methods on multiple benchmarks, and especially excels in absent keyphrase prediction. Our codes are publicly available at Github111https://github.com/jiacheng-ye/kg_gater.
In this work, we propose a heterogeneous Graph ATtention network basEd on References (Gater) for keyphrase generation, as shown in Figure 2. Given a source document, we first retrieve related document from a predefined index222We use the training set as our reference index in our experiment, which can also be easily extended to open corpus. and concatenate each retrieved document with its keyphrases to serve as a reference. Then we construct a heterogeneous graph that contains document nodes333Note that source document and reference are the two specific contents of the document node. and keyword nodes based on the source document and its references. The graph is updated iteratively to enhance the representations of the source document node. Finally, the source document node is extracted to decode the keyphrase sequence. To facilitate the decoding process, we also introduce a hierarchical attention and copy mechanism, with which the model directly attends to and copies from both the source document and its references. The hierarchical arrangement ensures that more semantically relevant words and those in more relevant references will be given larger weights for the current decision.
2.1 Reference Retriever
Given a source document
, we first use a reference retriever to output several related references from the training set. To make full use of both the retrieved document and retrieved keyphrases, we denote a reference as the concatenation of the two. We find that the use of a term frequency–inverse document frequency (TF-IDF)-based retriever provides a simple but efficient means to accomplish the retrieval task. Specifically, we first represent the source document and all the reference candidates as TF-IDF weighted uni/bi-gram vectors. Then, the most similarreferences
are retrieved by comparing the cosine similarities of the vectors of the source document and all the references.
2.2 Heterogeneous Graph Encoder
2.2.1 Graph Construction
Given the source document and its references , we select the top- unique words as keywords based on their TF-IDF weights from the source document and each reference. The additional keyword nodes can enrich the semantic representation of the source document through message passing, and introduce prior knowledge for generating keyphrase as the highly overlap between keywords and keyphrases. We then build a heterogeneous graph based on the source document, references and keywords.
Formally, our undirected heterogeneous graph can be defined as , and . Specifically, () denotes unique keyword nodes of the source document and references, corresponds to the source document node and reference nodes, () and represents the edge weight between the -th reference and source document, and () and indicates the edge weight between the -th keyword and the -th document.
2.2.2 Graph Initializers
There are two types of nodes in our heterogeneous graph (i.e., document nodes and keyword nodes ). For each document node, the same as previous works Meng et al. (2017); Chen et al. (2019a), an embedding lookup table
is first applied to each word, and then a bidirectional Gated Recurrent Unit (GRU)Cho et al. (2014) is used to obtain the context-aware representation of each word. The representation for document and each word is defined as the concatenation of the forward and backward hidden states (i.e., and , respectively). For each keyword node, since the same keyword may appear in multiple documents, we simply use the word embedding as its initial node representation .
There are two types of edges in our heterogeneous graph (i.e., document-to-document edge and document-to-keyword ). To include information about the significance of the relationships between keyword and document nodes, we infuse TF-IDF values in the edge weights. Similarly, we also infuse TF-IDF values in the edge weights of as a prior statistical -gram similarity between documents. The two types of floating TF-IDF weights are then transformed into integers and mapped to dense vectors using embedding matrices and .
2.2.3 Graph Aggregating and Updating
Graph attention networks (GAT) Velickovic et al. (2018) are used to aggregate information for each node. We denote the hidden states of input nodes as , where . With the additional edge feature, the aggregator is defined as follows:
where is the embedding of edge feature, is the attention weight between and , and is the aggregated feature. For simplicity, we will use to denote the GAT aggregating layer, where is used for query, key, and value, and is used as edge features.
To update the node state, similar to the approach used in the Transformer Vaswani et al. (2017)with node features and edge features , we update each types of nodes separately as follows:
with word nodes updated first by aggregating document-level information from document nodes, then document nodes updated by the updated word nodes, and finally document nodes updated again by the updated document nodes. The above process is executed iteratively for steps to realize better document representation.
When the heterogeneous graph encoder finished, we seperate into and as the representation of source document and each reference. We denote as the encoder hidden state of each word in the source document, and denotes the encoder hidden state of each word of the -th reference. All the features described above (i.e., , , and ) will be used in the reference-aware decoder.
2.3 Reference-aware Decoder
After encoding the document into a reference-aware representation , we propose a hierarchical attention and copy mechanism to further incorporate the reference information by attending to and copying words from both the source document and the references.
We use as the initial hidden state of a GRU decoder, and the decoding process in time step is described as follows:
where is the context vector and the hierarchical attention mechanism is defined as follows:
where is a word-level attention distribution over words from the source document using , is an attention distribution over references using , which gives greater weights to more relevant references, is a word-level attention distribution over words from -th reference using , which can be considered as the importance of each word in -th reference, and is a soft gate for determining the importance of the context vectors from source document and references. All the attention distributions described above are computed as in Bahdanau et al. (2015).
To alleviate the out-of-vocabulary (OOV) problem, a copy mechanism See et al. (2017) is generally adopted. To further guide the decoding process by copying appropriate words from references based on their relevance and significance, we propose a hierarchical copy mechanism. Specifically, a dynamic vocabulary is constructed by merging the predefined vocabulary , the words in source document and all the words in the references
. Thus, the probability of predicting a wordis computed as follows:
where is the generative probability over predefined vocabulary , is the copy probability from the source document, is the copy probability from all the references, and serves as a soft switcher that determines the preference for selecting the word from the predefined vocabulary, source document or references.
The proposed Gater model is independent of any specific training method, so we can use either the One2One training paradigm Meng et al. (2017), where the target keyphrase set are split into multiple training targets for a source document :
where is the concatenation of the keyphrases in by a delimiter.
|CopyRNN Meng et al. (2017)||0.311||0.266||0.058||0.116||0.293||0.304||0.043||0.067||0.333||0.262||0.125||0.211|
|CorrRNN Chen et al. (2018)||0.318||0.278||0.059||-||0.320||0.320||0.041||-||-||-||-||-|
|TG-Net Chen et al. (2019b)||0.349||0.295||0.075||0.137||0.318||0.322||0.045||0.076||0.372||0.315||0.156||0.268|
|KG-KE-KR-M Chen et al. (2019a)||0.344||0.287||0.123||0.193||0.329||0.327||0.049||0.090||0.400||0.327||0.177||0.278|
paradigm. The best results are bold. The subscript are corresponding standard deviation (e.g., 0.285means 0.2850.001).
|catSeq Yuan et al. (2020)||0.323||0.397||0.016||0.028||0.242||0.283||0.020||0.028||0.291||0.367||0.015||0.032|
|catSeqD Yuan et al. (2020)||0.321||0.394||0.014||0.024||0.233||0.274||0.016||0.024||0.285||0.363||0.015||0.031|
|catSeqCorr Chan et al. (2019)||0.319||0.390||0.014||0.024||0.246||0.290||0.018||0.026||0.289||0.365||0.015||0.032|
|catSeqTG Chan et al. (2019)||0.325||0.393||0.011||0.018||0.246||0.290||0.019||0.027||0.292||0.366||0.015||0.032|
|SenSeNet Luo et al. (2020)||0.348||0.403||0.018||0.032||0.255||0.299||0.024||0.032||0.296||0.370||0.017||0.036|
3 Experimental Setup
We conduct our experiments on four scientific article datasets, including NUS Nguyen and Kan (2007), Krapivin Krapivin et al. (2009), SemEval Kim et al. (2010) and KP20k Meng et al. (2017). Each sample from these datasets consists of a title, an abstract, and some keyphrases given by the authors of the papers. Following previous works Meng et al. (2017); Chen et al. (2019b, a); Yuan et al. (2020), we concatenate the title and abstract as a source document. We use the largest dataset (i.e., KP20k) for model training, and the testing sets of all the four datasets for evaluation. After preprocessing (i.e., lowercasing, replacing all the digits with the symbol and removing the duplicated data), the final KP20k dataset contains 509,818 samples for training, 20,000 for validation and 20,000 for testing. The number of test samples in NUS, Krapivin and SemEval is 211, 400 and 100, respectively.
For a comprehensive evaluation, we verify our method under both training paradigms (i.e., One2One and One2Seq) and compare with the following methods444We didn’t compare with Chen et al. (2020) since they use a different preprocessing method with others, see the discussion on github for details.:
3.3 Implementation Details
Following previous works Chan et al. (2019); Yuan et al. (2020), when training under the One2Seq paradigm, the target keyphrase sequence is the concatenation of present and absent keyphrases, with the present keyphrases are sorted according to the orders of their first occurrences in the document and the absent keyphrase kept in their original order.
We keep all the parameters the same as those reported in Chan et al. (2019), hence, we only report the parameters in the additional graph module. We retrieve 3 references and extract the top 20 keywords from source document and each reference to construct the graph. We set the number of attention heads to 5 and the number of iterations to 2, based on the valid set. During training, we use a dropout rate of 0.3 for the graph layer, the batch size of 12 and 64 for One2Seq and One2One training paradigm, respectively. During testing, we use greedy search for One2Seq, and beam search with a maximum depth of 6 and a beam size of 200 for One2One. We repeat the experiments of our model three times using different random seeds and report the averaged results.
3.4 Evaluation Metrics
For the model trained under One2One paradigm, as in previous works Meng et al. (2017); Chen et al. (2018, 2019b), we use macro-averaged and for present keyphrase predictions, and and for absent keyphrase predictions. For the model trained under One2Seq paradigm, we follow Chan et al. (2019) and use and for both present and absent keyphrase predictions, where compares all the keyphrases predicted by the model with the ground-truth keyphrases, which means it considers the number of predictions. We apply the Porter Stemmer before determining whether two keyphrases are identical and remove all the duplicated keyphrases after stemming.
4 Results and Analysis
4.1 Present and Absent Keyphrase Predictions
Table 1 and Table 2 show the performance evaluations of the present and absent keyphrase predicted by the model trained under One2One paradigm and One2Seq paradigm, respectively.555Due to the space limitations, the results on the Krapivin dataset can be found in Appendix A. For the results on absent keyphrases, as noted by previous works Chan et al. (2019); Yuan et al. (2020) that predicting absent keyphrases for a document is an extremely challenging task, the proposed Gater model still outperforms the state-of-the-art baseline models on all the metrics under both training paradigms, which demonstrates the effectiveness of our methods that includes the knowledge of references. Compared to KG-KE-KR-M, CopyRNN-Gater achieves the same or better results on all the datasets. This suggests that both the retrieved document and keyphrases are useful for predicting absent keyphrases.
For present keyphrase prediction, we find that Gater outperforms most of the baseline methods on both training paradigms, which indicates that the related references also help the model to understand the source document and to predict more accurate present keyphrases.
4.2 Ablation Study
To examine the contribution of each component in Gater, we conduct ablation experiments on the largest dataset KP20k, the results of which are presented in Table 3. For the input references, the model’s performance is degraded if either the retrieved documents or retrieved keyphrases are removed, which indicates that both are useful for keyphrases prediction. For the heterogeneous graph encoder, the graph becomes a heterogeneous bipartite graph when the edges are removed, and a homogeneous graph when the edges are removed. We can see that both result in degraded performance due to the lack of interaction. Removing both the edges and the edges means that the reference information is only used on the decoder side with the reference-aware decoder, which further degrades the results. For the reference-aware decoder, we find the hierarchical attention and copy mechanism to be essential to the performance of Gater. This indicates the importance of integrating knowledge from references on the decoder side.
|- retrieved documents||0.293||0.377||0.026||0.052|
|- retrieved keyphrases||0.291||0.369||0.018||0.037|
|Heterogeneous Graph Encoder|
|- hierarchical copy||0.293||0.373||0.022||0.042|
|- hierarchical attention||0.291||0.368||0.018||0.036|
4.3 Quality and Influence of References
As our graph is based on the retrieved references, we also investigated the quality and influence of the references. We define the quality of the retrieved references as the transforming rate of absent keyphrase (i.e., the proportion of absent keyphrases that appear in the retrieved references). Intuitively, the references that contain more absent keyphrases provide more explicit knowledge for the model generation. As shown on the left part in Figure 3, the simple sparse retriever based on TF-IDF outperforms the random retriever by a large margin regarding the reference quality. We also use a dense retriever Specter666https://github.com/allenai/specter Cohan et al. (2020), which is a BERT-based model pretrained using scientific documents. We find that using a dense retriever further helps in the transforming rate of absent keyphrases. On the right part of Figure 3, we show the influence of the references, and we note that random references degrade the model performance as they contain a lot of noise. Surprisingly, we can obtain a 2.6% performance boost in the prediction of absent keyphrase by considering only the most similar references with a sparse or dense retriever, and the introduction of more than three references does not further improve the performance. One possible explanation is that although more references lead to a higher transforming rate of the absent keyphrase, they also introduce more irrelevant information, which interferes with the judgment of the model.
4.4 Incorporating Baselines with Gater
Our proposed Gater can be considered as an extra plugin for incorporating knowledge from references on both the encoder and decoder sides, which can also be easily applied to other models. We investigate the effects of adding Gater to other baseline models in Table 4. We note that Gater enhances the performance of all the baseline models in both predicting present and absent keyphrases. This further demonstrates the effectiveness and portability of the proposed method.
4.5 Case Study
We display a prediction example by baseline models and CopyRNN-Gater in Figure 4. Our model generates more accurate present and absent keyphrases comparing to the baselines. For instance, we observe that CopyRNN-Gater successfully predicts the absent keyphrase “porous medium” as it appears in the retrieved documents, while both CopyRNN and KG-KE-KR-M fail. This demonstrates that using both the retrieved documents and keyphrases as references provides more knowledge (e.g., candidates of the ground-truth absent keyphrases) compared with using keyphrases alone as in KG-KE-KR-M.
5 Related Work
5.1 Keyphrase Extraction and Generation
Existing approaches for keyphrase prediction can be broadly divided into extraction and generation methods. Early work mostly use a two-step approach for keyphrase extraction. First, they extract a large set of candidate phrases by hand-crafted rules Mihalcea and Tarau (2004); Medelyan et al. (2009); Liu et al. (2011). Then, these candidates are scored and reranked based on unsupervised methods Mihalcea and Tarau (2004); Wan and Xiao (2008) or supervised methods Hulth (2003); Nguyen and Kan (2007). Other extractive approaches utilize neural-based sequence labeling methods Zhang et al. (2016); Gollapalli et al. (2017).
Keyphrase generation is an extension of keyphrase extraction which considers the absent keyphrase prediction. Meng et al. (2017) proposed a generative model CopyRNN based on the encoder-decoder framework Sutskever et al. (2014). They employed an One2One paradigm that uses a single keyphrase as the target sequence. Since CopyRNN uses beam search to perform independently prediction, it’s lack of dependency on the generated keyphrases, which results in many duplicated keyphrases. CorrRNN Chen et al. (2018) proposed a review mechanism to consider the hidden states of the previously generated keyphrase. Ye and Wang (2018) proposed to use a seperator to concatnate all keyphrases as a sequence in training. With this setup, the seq2seq model is capable to generate all possible keyphrases in one sequence as well as capture the contextual information between the keyphrases. However, it still use beam search to generate multiple keyphrases sequences with a fixed beam depth, and then perform keyphrase ranking to select top-k keyphrases as output. Yuan et al. (2020) proposed catSeq with One2Seq paradigm by adding a special token at the end to terminate the decoding process. They further introduce catSeqD by maximizing mutual information between all the keyphrases and source text and using orthogonal constraints Bousmalis et al. (2016) to ensure the coverage and diversity of the generated keyphrase. Many works are conducted based on the One2Seq paradigm Chen et al. (2019a); Chan et al. (2019); Chen et al. (2020); Meng et al. (2021); Luo et al. (2020). Chen et al. (2019a)
proposed to use the keyphrases of retrieved documents as an external input. However, the keyphrase alone lacks semantic information, and the potential knowledge in the retrieved documents are also ignored. In contrast, our method makes full use of both retrieved documents and keyphrases as references. Since catSeq tends to generate shorter sequences, a reinforcement learning approach is introduced byChan et al. (2019) to encourage their model to generate the correct number of keyphrases with an adaptive reward (i.e., and ). More recently, Luo et al. (2021) introduced a two-stage reinforcement learning-based fine-tuning approach with a fine-grained reward score, which also considers the semantic similarities between predictions and targets. Ye et al. (2021) proposed a One2Set paradigm to predict the keyphrases as a set, which eliminates the bias caused by the predefined order in One2Seq paradigm. Our method can also be integrated into these methods to further improve performance, as shown in section 4.4.
5.2 Heterogeneous Graph for NLP
Different from homogeneous graph that only considers a single type of nodes or links, heterogeneous graph can deal with multiple types of nodes or links Shi et al. (2016). Linmei et al. (2019) constructed a topic-entity heterogeneous neural graph for semi-supervised short text classification. Tu et al. (2019) introduced a heterogeneous graph neural network to encode documents, entities, and candidates together for multi-hop reading comprehension. Wang et al. (2020)
presented heterogeneous graph neural network with words, sentences, and documents nodes for extractive summarization. In our paper, we study the keyword-document heterogeneous graph network for keyphrase generation, which has not been explored before.
In this paper, we propose a graph-based method that can capture explicit knowledge from related references. Our model consists of a heterogeneous graph encoder to model different granularity of relations among the source document and its references, and a hierarchical attention and copy mechanism to guide the decoding process. Extensive experiments demonstrate the effectiveness and portability of our method on both the present and absent keyphrase predictions.
The authors wish to thank the anonymous reviewers for their helpful comments. This work was partially funded by China National Key R&D Program (No. 2017YFB1002104), National Natural Science Foundation of China (No. 61976056, 62076069), Shanghai Municipal Science and Technology Major Project (No.2021SHZDZX0103).
- Neural machine translation by jointly learning to align and translate. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Y. Bengio and Y. LeCun (Eds.), External Links: Cited by: §1, §2.3.
- Domain separation networks. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain, D. D. Lee, M. Sugiyama, U. von Luxburg, I. Guyon, and R. Garnett (Eds.), pp. 343–351. External Links: Cited by: 2nd item, §5.1.
- Neural keyphrase generation via reinforcement learning with adaptive rewards. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, pp. 2163–2174. External Links: Cited by: Table 5, §1, Table 2, 3rd item, 4th item, §3.3, §3.3, §3.4, §4.1, §5.1.
- Keyphrase generation with correlation constraints. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, pp. 4057–4066. External Links: Cited by: Table 5, §1, Table 1, 3rd item, §3.4, §5.1.
- An integrated approach for keyphrase generation via exploring the power of retrieval and extraction. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota, pp. 2846–2856. External Links: Cited by: Table 5, §1, §2.2.2, Table 1, 5th item, §3.1, §5.1.
- Exclusive hierarchical decoding for deep keyphrase generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, pp. 1095–1105. External Links: Cited by: §5.1, footnote 4.
Title-guided encoding for keyphrase generation.
The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27 - February 1, 2019, pp. 6268–6275. External Links: Cited by: Table 5, §1, Table 1, 4th item, §3.1, §3.4.
- Learning phrase representations using RNN encoder–decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, pp. 1724–1734. External Links: Cited by: §2.2.2.
- SPECTER: document-level representation learning using citation-informed transformers. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, pp. 2270–2282. External Links: Cited by: §4.3.
- Incorporating expert knowledge into keyphrase extraction. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4-9, 2017, San Francisco, California, USA, S. P. Singh and S. Markovitch (Eds.), pp. 3180–3187. External Links: Cited by: §5.1.
- Incorporating copying mechanism in sequence-to-sequence learning. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany, pp. 1631–1640. External Links: Cited by: §1.
Improved automatic keyword extraction given more linguistic knowledge. In Proceedings of the 2003 conference on Empirical methods in natural language processing, pp. 216–223. Cited by: §5.1.
- SemEval-2010 task 5 : automatic keyphrase extraction from scientific articles. In Proceedings of the 5th International Workshop on Semantic Evaluation, Uppsala, Sweden, pp. 21–26. External Links: Cited by: §3.1.
- Large dataset for keyphrases extraction. Technical report University of Trento. Cited by: §3.1.
- Heterogeneous graph attention networks for semi-supervised short text classification. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, pp. 4821–4830. External Links: Cited by: §5.2.
- Automatic keyphrase extraction by bridging vocabulary gap. In Proceedings of the Fifteenth Conference on Computational Natural Language Learning, Portland, Oregon, USA, pp. 135–144. External Links: Cited by: §5.1.
- SenSeNet: neural keyphrase generation with document structure. In arXiv, Cited by: Table 5, Table 2, 6th item, §5.1.
- Keyphrase generation with fine-grained evaluation-guided reinforcement learning. arXiv preprint arXiv:2104.08799. Cited by: §5.1.
- Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, pp. 1412–1421. External Links: Cited by: §1.
- Human-competitive tagging using automatic keyphrase extraction. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, Singapore, pp. 1318–1327. External Links: Cited by: §5.1.
- An empirical study on neural keyphrase generation. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online, pp. 4985–5007. External Links: Cited by: §5.1.
- Deep keyphrase generation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, Canada, pp. 582–592. External Links: Cited by: Table 5, §1, §2.2.2, §2.4, Table 1, 1st item, §3.1, §3.4, §5.1.
- TextRank: bringing order into text. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain, pp. 404–411. External Links: Cited by: §5.1.
- Keyphrase extraction in scientific publications. In International conference on Asian digital libraries, Cited by: §3.1, §5.1.
- Get to the point: summarization with pointer-generator networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, Canada, pp. 1073–1083. External Links: Cited by: §1, §2.3.
- A survey of heterogeneous information network analysis. In TKDE, Cited by: §5.2.
- Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger (Eds.), pp. 3104–3112. External Links: Cited by: §1, §5.1.
- Multi-hop reading comprehension across multiple documents by reasoning over heterogeneous graphs. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, pp. 2704–2713. External Links: Cited by: §5.2.
- Attention is all you need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, I. Guyon, U. von Luxburg, S. Bengio, H. M. Wallach, R. Fergus, S. V. N. Vishwanathan, and R. Garnett (Eds.), pp. 5998–6008. External Links: Cited by: §2.2.3.
- Graph attention networks. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings, External Links: Cited by: §2.2.3.
- Single document keyphrase extraction using neighborhood knowledge.. In AAAI, Cited by: §5.1.
- Heterogeneous graph neural networks for extractive document summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, pp. 6209–6219. External Links: Cited by: §5.2.
- Semi-supervised learning for neural keyphrase generation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, pp. 4142–4153. External Links: Cited by: §2.4, §5.1.
- One2Set: Generating diverse keyphrases as a set. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Online, pp. 4598–4608. External Links: Cited by: §5.1.
- One size does not fit all: generating and evaluating variable number of keyphrases. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, pp. 7961–7975. External Links: Cited by: Table 5, §1, §2.4, Table 2, 1st item, 2nd item, §3.1, §3.3, §4.1, §5.1.
- Keyphrase extraction using deep recurrent neural networks on Twitter. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, Texas, pp. 836–845. External Links: Cited by: §5.1.
Appendix A Results on Krapivin Dataset
|CopyRNN (Meng et al., 2017)||0.334||0.326||0.113||0.202|
|CorrRNN (Chen et al., 2018)||0.358||0.330||0.108||-|
|TG-Net (Chen et al., 2019b)||0.406||0.370||0.146||0.253|
|KG-KE-KR-M (Chen et al., 2019a)||0.431||0.378||0.153||0.251|
|catSeq (Yuan et al., 2020)||0.269||0.354||0.018||0.036|
|catSeqD (Yuan et al., 2020)||0.264||0.349||0.018||0.037|
|catSeqCorr (Chan et al., 2019)||0.265||0.349||0.020||0.038|
|catSeqTG (Chan et al., 2019)||0.282||0.366||0.018||0.034|
|SenSeNet (Luo et al., 2020)||0.279||0.354||0.024||0.046|