Learning to Summarize Passages: Mining Passage-Summary Pairs from Wikipedia Revision Histories

by   Qingyu Zhou, et al.

In this paper, we propose a method for automatically constructing a passage-to-summary dataset by mining the Wikipedia page revision histories. In particular, the method mines the main body passages and the introduction sentences which are added to the pages simultaneously. The constructed dataset contains more than one hundred thousand passage-summary pairs. The quality analysis shows that it is promising that the dataset can be used as a training and validation set for passage summarization. We validate and analyze the performance of various summarization systems on the proposed dataset. The dataset will be available online at https://res.qyzhou.me.



There are no comments yet.


page 1

page 2

page 3

page 4


WikiLingua: A New Benchmark Dataset for Cross-Lingual Abstractive Summarization

We introduce WikiLingua, a large-scale, multilingual dataset for the eva...

Query-controllable Video Summarization

When video collections become huge, how to explore both within and acros...

Extractive Summarization using Deep Learning

This paper proposes a text summarization approach for factual reports us...

Gold Standard Online Debates Summaries and First Experiments Towards Automatic Summarization of Online Debate Data

Usage of online textual media is steadily increasing. Daily, more and mo...

Summary Explorer: Visualizing the State of the Art in Text Summarization

This paper introduces Summary Explorer, a new tool to support the manual...

MLSUM: The Multilingual Summarization Corpus

We present MLSUM, the first large-scale MultiLingual SUMmarization datas...

WikiRef: Wikilinks as a route to recommending appropriate references for scientific Wikipedia pages

The exponential increase in the usage of Wikipedia as a key source of sc...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The area of automatic text summarization has received a lot of attention recently 

Rush et al. (2015); Cheng and Lapata (2016); Nallapati et al. (2017); See et al. (2017); Zhou et al. (2017); Tan et al. (2017). Many recent summarization models are working on two types of input, i.e., sentence level summarization Rush et al. (2015); Chopra et al. (2016); Nallapati et al. (2016a); Zhou et al. (2017) and single document level summarization Cheng and Lapata (2016); See et al. (2017); Nallapati et al. (2017)

. The development of these neural network summarization systems requires relatively large datasets 

Rush et al. (2015); Hermann et al. (2015).

The sentence level summarization dataset is constructed automatically by pairing the title and the first sentence in a news article Rush et al. (2015)

. The input sentence and the output title are extracted and cleaned heuristically from Annotated English Gigawords 

Napoles et al. (2012). The document level datasets being used frequently are newswire datasets such as CNN, Daily Mail and NY Times, which are usually used to produce several sentences as the summary. However, to the best of our knowledge, no prior work has discussed summarizing a text passage which has the potential use for long document summarization, slides highlight generation Wang et al. (2017), language teaching Huang (2015) and so on. The above-mentioned datasets are either for sentence or document summarization, which ignores the passage granularity.

In this paper, we introduce a new summarization dataset which aims to explore the passage-to-summary granularity of text summarization task. We make the key observation that in two temporally adjacent Wikipedia page revisions, the passage in the article body and the sentence in the introduction, which are added simultaneously to a Wikipedia page, are possibly a passage-summary pair. Based on this assumption, we mine the English Wikipedia history dump to extract possible pairs. By cleaning and filtering the extracted data, we created a new passage-to-summary (Psg2Sum) dataset which contains 100,118 examples. Quality analysis and the comparison to other summarization datasets show that it is promising that Psg2Sum can be used as a training and evaluation dataset.

The collision between trains 608 and 653 happened on kilometer 8.055 at 17:42 (some sources says at 17:44). The speed of the steam train 608 was about 55 km/h, train 653 about 60 km/h. Both drivers tried to slow in the loose , but it was too late.
A passenger steam train 608 at speed 55 km/h abreast collided with a diesel railcar 653 at speed 60 km/h.
Table 1: A passage-to-summary example in the Psg2Sum dataset. The passage (top) in the article and the sentence (bottom) in the lead section were added to the Wikipedia page simultaneously. The key information in the passage is highlighted.

Our primary contributions are:

  • A scalable, language agnostic method to create a passage-to-summary dataset from Wikipedia revision history.

  • Fill the granularity vacancy of summarization datasets that we first present an open-domain, passage-to-summary corpus.

  • Publicly release of the English Psg2Sum dataset on an anonymous URL for double-blind review.

  • The English version of Psg2Sum dataset will be available online at https://res.qyzhou.me/.

  • We validate the performance of various summarization methods on Psg2Sum.

2 The Psg2Sum Dataset

2.1 Dataset Creation

Wikipedia maintains the history of its pages which contains a list of the pages’ previous revisions111https://en.wikipedia.org/wiki/Help:Page_history. The page revisions have been exploited for some NLP tasks, such as sentence splitting Botha et al. (2018), sentence compression Yamangil and Nelken (2008) and sentence simplification Woodsend and Lapata (2011); Yatskar et al. (2010)

Most of the Wikipedia articles have lead sections222https://en.wikipedia.org/wiki/Wikipedia:Manual_of_Style/Lead_section (also known as the lead or introduction, screenshot available in the Appendix). It serves as an introduction to the article and a summary of its most important contents. Therefore, we pair the passages in the main body and the sentences in the lead section to construct the Psg2Sum corpus. We make the assumption that in a page revision, a sentence added to the lead section is possibly the summary of one passage added to the article at the same time. Based on this assumption, we compare two temporally adjacent revisions of a page to extract the additions.

We first extract and clean text by stripping the Wikipedia markup language (wikicode) using mwparserfromhell333https://github.com/earwig/mwparserfromhell. Then the text in the lead section is split into sentences using the sentence splitting algorithm in Moses Koehn et al. (2007)444We use a python implementation: https://github.com/berkmancenter/mediacloud-sentence-splitter. The sentences are then tokenized with the spaCy tokenizer. We compare the processed two page revisions using Python’s difflib to extract the added sentences and passages.

Given all the added sentences in the lead section and the passages in the article, we use some heuristics to mine passage-summary pairs from them. Similar to Rush et al. (2015), we find the possible candidates by calculating the unigram overlap to ensure the passage-summary relationship. Specifically, for the candidate passage-summary pair , we first remove the stopwords from both the sentence and passage to get The candidate score is defined as the overlap rate:


For the candidate sentence in the lead section, we choose the passage with the maximum overlap score . To filter out the misaligned passage-summary pairs, we set a minimum overlap rate threshold . Specifically, if is less than , we discard the candidate pair .

Dataset Granularity Domain Corpus Size avg. Input Length Output Length Reference
sentences words sentences words Number
DUC2002 (task 1) Doc News 567 27.37 629.64 10.19 215.09 1.96
Gigawords Sentence News 3.8m 1 31.35 1 8.23 1
CNN Doc News 92,579 33.98 760.50 3.59 45.70 1
Daily Mail Doc News 219,506 29.33 653.33 3.86 54.65 1
NY Times Doc News 654,759 35.55 800.04 2.44 45.54 1
Psg2Sum Passage Open 100,118 4.83 118.26 1 22.20 1
Table 2: A comparison of current summarization datasets and Psg2Sum.

2.2 Quality and Statistics of Psg2Sum

As the heuristic method cannot guarantee all the pairs are true passage-summary pairs, we manually check the quality of the constructed dataset. We randomly sample 50 examples and label them as the following:

  • Good: The sentence is a summary of the given passage.

  • Unsupported: The sentence is irrelevant to the passage. Or, some important content cannot be found in the passage, such as dates and places, which makes it not understandable.

Furthermore, we do the same labeling on 50 random examples from the English Gigawords sentence summarization dataset Rush et al. (2015) which is also created automatically and cleaned with heuristics.

Thresh. Good (%) Unsup. Size
0.5 27 (54%) 23 117,026
0.6 33 (66%) 17 100,118
0.7 34 (68%) 16 68,070

28 (56%) 22 3.8m
Table 3: Quality vs corpus size trade-off when setting the minimum overlap threshold value . The Good and Unsupported numbers are counted in a 50 random sampled subset.

As shown in Table 3, overlap threshold is a good trade-off between the Good rate and the corpus size. For the 50 random examples, increase the threshold from 0.5 to 0.6 leads to a 12% absolute Good rate improvement with only 6,808 examples filtered. When increasing the threshold from 0.6 to 0.7, we only observe 2% Good rate improvement but the corpus size drastically shrinks to 68,070. Compared to the 56% good rate of the successful English Gigawords dataset, we choose the threshold value .

After filtering and cleaning, the final Psg2Sum dataset contains 100,118 passage-summary pairs. We randomly split the dataset into training, validation and testing sets, which have 92,118, 4000 and 4000 passage-summary pairs respectively.

2.3 Comparison to Other Datasets

Since 2001, NIST had organized the DUC summarization tasks Over et al. (2007). They provided high-quality, human-created document/multi-document summarization datasets. However, DUC dataset is too small to train an abstractive summarization system using artificial neural networks. For example, DUC 2002 task 1 only contains 567 documents associated with around 1.96 references. Therefore, large scale datasets are necessary for training neural abstractive summarization systems.

Abstractive sentence summarization has attracted research focus in recent years Rush et al. (2015); Toutanova et al. (2016); Chopra et al. (2016); Nallapati et al. (2016a). Rush et al. (2015) propose constructing a sentence summarization (or headline generation) dataset by pairing the first sentence and the title in a news article. They use the Annotated English Gigawords Napoles et al. (2012) as the article source. As shown in Table 3, though the Gigawords corpus contains some noise, it is still useful as a training and evaluation dataset. Considering the Good rate of Psg2Sum is about 10% higher than the English Gigawords dataset, it is promising that Psg2Sum can achieve the same goal.

Recently, newswire websites such as CNN, Daily Mail and NY Times have been used as sources for single document summarization. The NY Times is currently the largest summarization dataset as shown in Table 2. However, it is bias toward extractive strategies, and limited work has used this dataset for summarization Grusky et al. (2018). CNN and Daily Mail Hermann et al. (2015) have been frequently used in recent document summarization research. These datasets have been used for summarization as is See et al. (2017), or after pre-processing for entity anonymization Nallapati et al. (2017). Additionally, some systems mix CNN and Daily Mail as training data Nallapati et al. (2017); See et al. (2017); Paulus et al. (2017), whereas others use only Daily Mail articles Cheng and Lapata (2016); Nallapati et al. (2016b). Therefore, it would be challenging for systems to make comparisons considering that previous works are using different versions of datasets.

Models Rouge-1 Rouge-2 Rouge-L
R P F1 R P F1 R P F1
s2s 36.310.66 35.240.71 33.350.60 18.080.66 17.950.68 16.780.60 31.570.66 30.640.70 29.000.60
s2s+copy 35.510.73 37.380.79 33.780.66 18.680.67 19.870.76 17.800.65 30.840.69 32.640.75 29.430.65
PGN 36.270.76 36.570.77 33.990.66 19.050.72 19.560.75 17.990.67 31.340.73 31.820.73 29.490.64
LEAD1 42.970.77 35.350.73 35.970.66 22.760.72 18.710.71 19.010.65 36.060.73 29.880.70 30.290.63
TextRank 39.950.77 33.560.73 33.740.64 20.010.75 16.950.71 16.920.65 33.290.73 28.130.69 28.180.62
NN-SE 43.760.80 35.110.74 36.210.67 23.190.75 19.140.73 19.400.68 36.590.74 29.610.73 30.400.63
Table 4: Rouge evaluation results on Psg2Sum

of various summarization models. The scores with 95% confidence interval are given by the official

Rouge script. The best results are in bold.

All the above-mentioned datasets, including both the sentence level and the document level summarization datasets, are constructed or labeled using the newswire source, which leads to the fact that they are all in the news domain. The proposed Psg2Sum is constructed with the open-domain Wikipedia Chen et al. (2017); Yang et al. (2015). As far as we know, this is the first open-domain text summarization dataset. Table 2 summarizes the key features of existing summarization datasets and Psg2Sum. To the best of our knowledge, Psg2Sum is the first passage-to-summary dataset, which is with the same magnitude with the current frequently used CNN and Daily Mail datasets. The average input length of Psg2Sum is 4.83 sentences (118.26 words), compared with the average length 33.98 sentences (760.50 words) of CNN and 29.33 sentences (653.33 words) of Daily Mail corpus.

3 Experiments

3.1 Models

We evaluate several summary models on the Psg2Sum dataset and the detailed model configurations can be found in the Appendix:


(sequence-to-sequence) is a basic neural text generation model proposed by sutskever2014sequence. In this work, we use the RNN-based s2s model with attention mechanism 

Bahdanau et al. (2015).


is an extension of s2s incorporated with copying mechanism Gu et al. (2016); Gulcehre et al. (2016).


(Pointer-Generator Network) See et al. (2017) is an extension of s2s with copying and coverage Tu et al. (2016) mechanisms.


extracts the first sentence as the summary. The leading sentences baseline is also a strong baseline on newswire datasets such as CNN, Daily Mail and NY Times.


Mihalcea and Tarau (2004) is an unsupervised extractive method. We use the implementation in the Gensim package Řehůřek and Sojka (2010).


Cheng and Lapata (2016)

is an extractive neural model with a hierarchy architecture. It predicts the probability of being extracted for each sentence.

3.2 Evaluation Metric

We use Rouge (version 1.5.5) (Lin, 2004)

as our evaluation metric.

Rouge measures the quality of summary by computing overlapping lexical units, such as unigram, bigram, trigram, and longest common subsequence (LCS). Following previous works, we report Rouge-1, Rouge-2 and Rouge-L metrics in the experiments.

3.3 Results

We validate various models on the Psg2Sum dataset, including abstractive models (s2s), extractive models (LEAD1, TextRank, NN-SE) and mixed models (s2s+copy, PGN). Table 4 shows the Rouge evaluation results. We observe that extractive methods perform better in terms of Rouge Recall. For example, the NN-SE model achieves the best recall performance among all the baseline models, i.e., 43.76 Rouge-1 recall and 23.19 Rouge-2 recall. In the meanwhile, the abstractive models achieve better Rouge Precision scores. The s2s + copy model has the best precision performance in Rouge-1, -2 and -L. Surprisingly, we find that using coverage mechanism (PGN) leads to the precision drop but higher recall score (with longer output), although s2s+copy and PGN are statistically indistinguishable in terms of F1 score.

4 Conclusion

In this paper, we present a heuristic approach to automatic constructing a passage-to-summary dataset, Psg2Sum, by mining the Wikipedia page revision histories. The quality analysis shows that it is capable of being a training and evaluation corpus despite the imperfection that it contains some noise. Experiments on Psg2Sum show that extractive models tend to select longer sentences and achieves higher recall score, comparing with the abstractive and mixed models’ tendency to generate high precision outputs.


  • D. Bahdanau, K. Cho, and Y. Bengio (2015) Neural machine translation by jointly learning to align and translate. In Proceedings of 3rd International Conference for Learning Representations, San Diego. Cited by: §B.1, item s2s.
  • J. A. Botha, M. Faruqui, J. Alex, J. Baldridge, and D. Das (2018) Learning to split and rephrase from wikipedia edit history. In

    Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

    pp. 732–737. Cited by: §2.1.
  • D. Chen, A. Fisch, J. Weston, and A. Bordes (2017) Reading wikipedia to answer open-domain questions. arXiv preprint arXiv:1704.00051. Cited by: §2.3.
  • J. Cheng and M. Lapata (2016) Neural summarization by extracting sentences and words. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany, pp. 484–494. Cited by: §B.5, §1, §2.3, item NN-SE.
  • K. Cho, B. van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio (2014) Learning phrase representations using rnn encoder–decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, pp. 1724–1734. Cited by: §B.1.
  • S. Chopra, M. Auli, and A. M. Rush (2016)

    Abstractive sentence summarization with attentive recurrent neural networks

    In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, California, pp. 93–98. Cited by: §1, §2.3.
  • M. Grusky, M. Naaman, and Y. Artzi (2018) NEWSROOM: a dataset of 1.3 million summaries with diverse extractive strategies. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, Louisiana, pp. 708–719. Cited by: §2.3.
  • J. Gu, Z. Lu, H. Li, and V. O.K. Li (2016) Incorporating copying mechanism in sequence-to-sequence learning. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany, pp. 1631–1640. Cited by: §B.2, item s2s+copy.
  • C. Gulcehre, S. Ahn, R. Nallapati, B. Zhou, and Y. Bengio (2016) Pointing the unknown words. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany, pp. 140–149. Cited by: §B.2, item s2s+copy.
  • K. M. Hermann, T. Kocisky, E. Grefenstette, L. Espeholt, W. Kay, M. Suleyman, and P. Blunsom (2015) Teaching machines to read and comprehend. In Advances in Neural Information Processing Systems, pp. 1693–1701. Cited by: §1, §2.3.
  • D. Huang (2015) A study on the application of task-based language teaching method in a comprehensive english class in china. Journal of Language Teaching and Research 7 (1), pp. 118–127. Cited by: §1.
  • D. Kingma and J. Ba (2015) Adam: a method for stochastic optimization. In Proceedings of 3rd International Conference for Learning Representations, San Diego. Cited by: §B.1.
  • P. Koehn, H. Hoang, A. Birch, C. Callison-Burch, M. Federico, N. Bertoldi, B. Cowan, W. Shen, C. Moran, R. Zens, et al. (2007)

    Moses: open source toolkit for statistical machine translation

    In Proceedings of the 45th annual meeting of the association for computational linguistics companion volume proceedings of the demo and poster sessions, pp. 177–180. Cited by: §2.1.
  • C. Lin (2004) Rouge: a package for automatic evaluation of summaries. In Text summarization branches out: Proceedings of the ACL-04 workshop, Vol. 8. Cited by: §3.2.
  • R. Mihalcea and P. Tarau (2004) Textrank: bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing, Cited by: item TextRank.
  • R. Nallapati, F. Zhai, and B. Zhou (2017) SummaRuNNer: a recurrent neural network based sequence model for extractive summarization of documents.. In AAAI, pp. 3075–3081. Cited by: §1, §2.3.
  • R. Nallapati, B. Zhou, Ç. glar Gulçehre, and B. Xiang (2016a) Abstractive text summarization using sequence-to-sequence rnns and beyond. In Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning, Cited by: §1, §2.3.
  • R. Nallapati, B. Zhou, and M. Ma (2016b) Classify or select: neural architectures for extractive document summarization. arXiv preprint arXiv:1611.04244. Cited by: §2.3.
  • C. Napoles, M. Gormley, and B. Van Durme (2012) Annotated gigaword. In Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction, AKBC-WEKEX ’12, Stroudsburg, PA, USA, pp. 95–100. Cited by: §1, §2.3.
  • P. Over, H. Dang, and D. Harman (2007) DUC in context. Information Processing & Management 43 (6), pp. 1506–1520. Cited by: §2.3.
  • R. Paulus, C. Xiong, and R. Socher (2017) A deep reinforced model for abstractive summarization. arXiv preprint arXiv:1705.04304. Cited by: §2.3.
  • R. Řehůřek and P. Sojka (2010) Software Framework for Topic Modelling with Large Corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, Valletta, Malta, pp. 45–50 (English). Cited by: §B.4, item TextRank.
  • A. M. Rush, S. Chopra, and J. Weston (2015)

    A neural attention model for abstractive sentence summarization

    In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, pp. 379–389. Cited by: §1, §1, §2.1, §2.2, §2.3.
  • A. See, P. J. Liu, and C. D. Manning (2017) Get to the point: summarization with pointer-generator networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, Canada, pp. 1073–1083. Cited by: §B.3, §1, §2.3, item PNG.
  • N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov (2014) Dropout: a simple way to prevent neural networks from overfitting..

    Journal of Machine Learning Research

    15 (1), pp. 1929–1958.
    Cited by: §B.1.
  • J. Tan, X. Wan, and J. Xiao (2017) Abstractive document summarization with a graph-based attentional neural model. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vol. 1, pp. 1171–1181. Cited by: §1.
  • K. Toutanova, C. Brockett, K. M. Tran, and S. Amershi (2016) A dataset and evaluation metrics for abstractive compression of sentences and short paragraphs. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, Texas, pp. 340–350. Cited by: §2.3.
  • Z. Tu, Z. Lu, Y. Liu, X. Liu, and H. Li (2016) Modeling coverage for neural machine translation. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 76–85. Cited by: item PNG.
  • S. Wang, X. Wan, and S. Du (2017) Phrase-based presentation slides generation for academic papers. In

    Thirty-First AAAI Conference on Artificial Intelligence

    Cited by: §1.
  • K. Woodsend and M. Lapata (2011) Learning to simplify sentences with quasi-synchronous grammar and integer programming. In Proceedings of the conference on empirical methods in natural language processing, pp. 409–420. Cited by: §2.1.
  • E. Yamangil and R. Nelken (2008) Mining wikipedia revision histories for improving sentence compression. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers, pp. 137–140. Cited by: §2.1.
  • Y. Yang, W. Yih, and C. Meek (2015) Wikiqa: a challenge dataset for open-domain question answering. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 2013–2018. Cited by: §2.3.
  • M. Yatskar, B. Pang, C. Danescu-Niculescu-Mizil, and L. Lee (2010) For the sake of simplicity: unsupervised extraction of lexical simplifications from wikipedia. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 365–368. Cited by: §2.1.
  • Q. Zhou, N. Yang, F. Wei, and M. Zhou (2017) Selective encoding for abstractive sentence summarization. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, Canada, pp. 1095–1104. Cited by: §1.

Appendix A Lead Section of Wikipedia

Figure 1 shows a screenshot example of the lead section of a Wikipedia article about Wikipedia.

Figure 1: A screenshot example of the lead section of a Wikipedia article about Wikipedia.

Appendix B Model Configurations

b.1 s2s

We use the model architecture introduced in Bahdanau et al. (2015)

. The encoder and decoder are built with Gated Recurrent Units (GRU) 

Cho et al. (2014)

. The encoder is bidirectional, with 256 dimensional forward and 256 dimensional backward backward GRU. The decoder’s hidden size is 512. The word vector size of encoder and decoder is 300. We use dropout 

Srivastava et al. (2014) rate 0.5 to prevent model overfitting. During training, we use the Adam Kingma and Ba (2015) optimizer to learn the model with its default hyper-parameters. The mini-batch size is set to 64. During testing, we use beam search and the beam size is set to 5.

b.2 s2s+copy

The s2s+copy model is based on s2s, and augmented with copying mechanism Gu et al. (2016); Gulcehre et al. (2016). The training and testing configurations are identical to the s2s model.

b.3 Pgn

We implement Pointer-Generator Networks (PGN) See et al. (2017)

based on s2s+copy model by adding the coverage loss function following 

See et al. (2017). The other configurations are identical to the s2s+copy model.

b.4 TextRank

We use the open-source implementation of TextRank in the Gensim Řehůřek and Sojka (2010) toolkit. It refuse to summarization passages with less than three sentences. Therefore, we randomly select one sentence as the summary for passages shorter than three sentences.

b.5 Nn-Se

We implement NN-SE model as mentioned in the paper Cheng and Lapata (2016). During testing, we select the sentence with highest extraction score as the passage summary.

Appendix C Psg2Sum Data Samples

Table 5 shows 5 random examples in the Psg2Sum dataset.

Example 1
PSG A recording of the musical with 19 tracks was issued in the U.S. on Scepter Records in 1971 . It was a reissue of the 1969 Decca UK album , capitalizing on the success of 1970 ’s Jesus Christ Superstar in the U.S. It featured David Daltrey as Joseph , Tim Rice as Pharaoh , Dr. William S. Lloyd Webber on the Hammond organ , Alan Doggett conducting , various solo vocalists and instrumentalists , and the Colet Court choir as the chorus.”Joseph And The Amazing Technicolor Dreamcoat Listing , Scepter Records , SPS-588X , 1971 ” discogs.com , accessed March 17 , 2011Q&A regarding the original Decca and Scepter albums
SUM Joseph and the Amazing Technicolor Dreamcoat is a musical with lyrics by Tim Rice and music by Andrew Lloyd Webber .
Example 2
PSG In 1994 , Bush took a leave of absence from the Rangers to run for Governor of Texas against the popular incumbent , Democrat Ann Richards . On November 8 , 1994 , he defeated Richards , 53 % to 46 % . As Governor , Bush forged a legislative alliance with powerful Texas Lt . Governor Bob Bullock , a longtime Democrat . In 1998 Bush went on to win re - election in a landslide victory with nearly 69 % of the vote , becoming the first Texas governor to be elected for two consecutive four - year terms . During Bush ’s governorship , he undertook significant legislative changes in criminal justice , tort law , and school financing . Bush took a hard line on capital punishment and received much criticism from advocates wanting to abolish the death penalty . Under Bush , Texas ’ incarceration rate was 1014 inmates per 100,000 state population in 1999 , the second highest in the nation , owing mainly to strict enforcement of drug laws . In September 1999 , Bush signed the Texas Futile Care Law . Bush ’s transformative agenda and family pedigree now provided an opportunity to advance his political career to the national level .
SUM Bush was elected 46th Governor of Texas in 1994 and re - elected in 1998 .
Example 3
PSG The group ’s first single , ” Saturday Night Party ( Read My Lips ) ” , was an immediate success , and became an Ibiza anthem during the summer of 1993 . It became their first Top 40 hit in the United Kingdom , peaking at # 29 . After introducing a singer to the group ( Shanie Campbell ) , they released the single ” Do n’t Give Me Your Life ” in 1994 , being an extended remix to the original ” Alex Party ” track . It reached # 2 in both Ireland and the United Kingdom ( their highest charting hit in those countries ) and # 13 in Australia , plus it topped the Club Record category at Music Week ’s 1995 Awards . It was included in many compilation albums all over the world , and remains their most famous release .
SUM Their most famous single to date is ” Do n’t Give Me Your Life ” , a # 2 hit in both Ireland and the United Kingdom in early 1995.
Example 4
PSG Throughout the existence of medieval Livonia there was a constant struggle for superiority in the rule over the lands by the Church , the order , the secular nobles of German descent who ruled the fiefs and the citizens of the Hanseatic town of Riga . Two major civil wars were fought in 1296 - 1330 , 1313 - 1330 , and in 1343 - 1345 the Estonian revolt resulted in the annexation of the Danish Duchy of Estonia within the Teutonic Ordensstaat .
SUM Throughout the existence of medieval Livonia there was a constant struggle over the supremacy of ruling the lands by the Church , the Order , the secular German nobility and the citizens of the Hanseatic towns of Riga and Reval .
Example 5
PSG Along with Matsumoto Castle and Kumamoto Castle , Himeji Castle is considered one of Japan ’s three premier castles . It is the most visited castle in Japan , receiving over 820,000 visitors annually . Starting in April 2010 , Himeji Castle underwent restoration work to preserve the castle buildings , and reopened to the public on 27 March 2015 .
SUM In order to preserve the castle buildings , it underwent restoration work for several years and reopened to the public on March 27 , 2015 .
Table 5: 5 random examples from the Psg2Sum dataset.