Text Summarization Techniques: A Brief Survey

07/07/2017 ∙ by Mehdi Allahyari, et al. ∙ University of Georgia 0

In recent years, there has been a explosion in the amount of text data from a variety of sources. This volume of text is an invaluable source of information and knowledge which needs to be effectively summarized to be useful. In this review, the main approaches to automatic text summarization are described. We review the different processes for summarization and describe the effectiveness and shortcomings of the different methods.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

With the dramatic growth of the Internet, people are overwhelmed by the tremendous amount of online information and documents. This expanding availability of documents has demanded exhaustive research in the area of automatic text summarization. According to Radef et al. (Radev et al., 2002) a summary is defined as “a text that is produced from one or more texts, that conveys important information in the original text(s), and that is no longer than half of the original text(s) and usually, significantly less than that”.

Automatic text summarization is the task of producing a concise and fluent summary while preserving key information content and overall meaning. In recent years, numerous approaches have been developed for automatic text summarization and applied widely in various domains. For example, search engines generate snippets as the previews of the documents (Turpin et al., 2007). Other examples include news websites which produce condensed descriptions of news topics usually as headlines to facilitate browsing or knowledge extractive approaches (Allahyari et al., 2017; Savova et al., 2010; Trippe et al., 2017).

Automatic text summarization is very challenging, because when we as humans summarize a piece of text, we usually read it entirely to develop our understanding, and then write a summary highlighting its main points. Since computers lack human knowledge and language capability, it makes automatic text summarization a very difficult and non-trivial task.

Automatic text summarization gained attraction as early as the 1950s. An important research of these days was (Luhn, 1958) for summarizing scientific documents. Luhn et al. (Luhn, 1958) introduced a method to extract salient sentences from the text using features such as word and phrase frequency. They proposed to weight the sentences of a document as a function of high frequency words, ignoring very high frequency common words. Edmundson et al. (Edmundson, 1969) described a paradigm based on key phrases which in addition to standard frequency depending weights, used the following three methods to determine the sentence weight:

  1. Cue Method: The relevance of a sentence is calculated based on the presence or absence of certain cue words in the cue dictionary.

  2. Title Method: The weight of a sentence is computed as the sum of all the content words appearing in the title and headings of a text.

  3. Location Method: This method assumes that sentences appearing in the beginning of document as well as the beginning of individual paragraphs have a higher probability of being relevant.

Since then, many works have been published to address the problem of automatic text summarization (see (Gupta and Lehal, 2010; Erkan and Radev, 2004) for more information about more advanced techniques until 2000s).

In general, there are two different approaches for automatic summarization: extraction and abstraction. Extractive summarization methods work by identifying important sections of the text and generating them verbatim; thus, they depend only on extraction of sentences from the original text. In contrast, abstractive summarization methods aim at producing important material in a new way. In other words, they interpret and examine the text using advanced natural language techniques in order to generate a new shorter text that conveys the most critical information from the original text. Even though summaries created by humans are usually not extractive, most of the summarization research today has focused on extractive summarization. Purely extractive summaries often times give better results compared to automatic abstractive summaries (Erkan and Radev, 2004)

. This is because of the fact that abstractive summarization methods cope with problems such as semantic representation, inference and natural language generation which are relatively harder than data-driven approaches such as sentence extraction. As a matter of fact, there is no completely abstractive summarization system today. Existing abstractive summarizers often rely on an extractive preprocessing component to produce the abstract of the text

(Knight and Marcu, 2000; Berg-Kirkpatrick et al., 2011).

Consequently, in this paper we focus on extractive summarization methods and provide an overview of some of the most dominant approaches in this category. There are a number of papers that provide extensive overviews of text summarization techniques and systems (Spärck Jones, 2007; Lloret and Palomar, 2012; Nenkova and McKeown, 2012; Saggion and Poibeau, 2013).

2. Extractive Summarization

As mentioned before, extractive summarization techniques produce summaries by choosing a subset of the sentences in the original text. These summaries contain the most important sentences of the input. Input can be a single document or multiple documents.

In order to better understand how summarization systems work, we describe three fairly independent tasks which all summarizers perform (Nenkova and McKeown, 2012): 1) Construct an intermediate representation of the input text which expresses the main aspects of the text. 2) Score the sentences based on the representation. 3) select a summary comprising of a number of sentences.

2.1. Intermediate Representation

Every summarization system creates some intermediate representation of the text it intends to summarize and finds salient content based on this representation. There are two types of approaches based on the representation: topic representation and indicator representation. Topic representation approaches transform the text into an intermediate representation and interpret the topic(s) discussed in the text. Topic representation-based summarization techniques differ in terms of their complexity and representation model, and are divided into frequency-driven approaches, topic word approaches, latent semantic analysis and Bayesian topic models (Nenkova and McKeown, 2012). We elaborate topic representation approaches in the following sections. Indicator representation approaches describe every sentence as a list of features (indicators) of importance such as sentence length, position in the document, having certain phrases, etc.

2.2. Sentence Score

When the intermediate representation is generated, we assign an importance score

to each sentence. In topic representation approaches, the score of a sentence represents how well the sentence explains some of the most important topics of the text. In most of the indicator representation methods, the score is computed by aggregating the evidence from different indicators. Machine learning techniques are often used to find indicator weights.

2.3. Summary Sentences Selection

Eventually, the summarizer system selects the top most important sentences to produce a summary. Some approaches use greedy algorithms to select the important sentences and some approaches may convert the selection of sentences into an optimization problem where a collection of sentences is chosen, considering the constraint that it should maximize overall importance and coherency and minimize the redundancy. There are other factors that should be taken into consideration while selecting the important sentences. For example, context in which the summary is created may be helpful in deciding the importance. Type of the document (e.g. news article, email, scientific paper) is another factor which may impact selecting the sentences.

3. Topic Representation Approaches

In this section we describe some of the most widely used topic representation approaches.

3.1. Topic Words

The topic words technique is one of the common topic representation approaches which aims to identify words that describe the topic of the input document. (Luhn, 1958) was one the earliest works that leveraged this method by using frequency thresholds to locate the descriptive words in the document and represent the topic of the document. A more advanced version of Luhn’s idea was presented in (Dunning, 1993) in which they used log-likelihood ratio test to identify explanatory words which in summarization literature are called the “topic signature”. Utilizing topic signature words as topic representation was very effective and increased the accuracy of multi-document summarization in the news domain (Harabagiu and Lacatusu, 2005). For more information about log-likelihood ratio test, see (Nenkova and McKeown, 2012).

There are two ways to compute the importance of a sentence: as a function of the number of topic signatures it contains, or as the proportion of the topic signatures in the sentence. Both sentence scoring functions relate to the same topic representation, however, they might assign different scores to sentences. The first method may assign higher scores to longer sentences, because they have more words. The second approach measures the density of the topic words.

3.2. Frequency-driven Approaches

When assigning weights of words in topic representations, we can think of binary (0 or 1) or real-value (continuous) weights and decide which words are more correlated to the topic. The two most common techniques in this category are: word probability and TFIDF (Term Frequency Inverse Document Frequency).

3.2.1. Word Probability

The simplest method to use frequency of words as indicators of importance is word probability. The probability of a word is determined as the number of occurrences of the word, , divided by the number of all words in the input (which can be a single document or multiple documents):

(1)

Vanderwende et al. (Vanderwende et al., 2007) proposed the SumBasic system which uses only the word probability approach to determine sentence importance. For each sentence, , in the input, it assigns a weight equal to the average probability of the words in the sentence:

(2)

where is the weight of sentence .

In the next step, it picks the best scoring sentence that contains the highest probability word. This step ensures that the highest probability word, which represents the topic of the document at that point, is included in the summary. Then for each word in the chosen sentence, the weight is updated:

(3)

This word weight update indicates that the probability of a word appearing in the summary is lower than a word occurring once. The aforementioned selection steps will repeat until the desired length summary is reached. The sentence selection approach used by SumBasic is based on the greedy strategy. Yih et al. (Yih et al., 2007) used an optimization approach (as sentence selection strategy) to maximize the occurrence of the important words globally over the entire summary. (Alguliev et al., 2011) is another example of using an optimization approach.

3.2.2. Tfidf

Since word probability techniques depend on a stop word list in order to not consider them in the summary and because deciding which words to put in the stop list is not very straight forward, there is a need for more advanced techniques. One of the more advanced and very typical methods to give weight to words is TFIDF (Term Frequency Inverse Document Frequency). This weighting technique assesses the importance of words and identifies very common words (that should be omitted from consideration) in the document(s) by giving low weights to words appearing in most documents. The weight of each word in document is computed as follows:

(4)

where is term frequency of word in the document , is the number of documents that contain word and is the number of documents in the collection . For more information about TFIDF and other term weighting schemes, see (Salton and Buckley, 1988). TFIDF weights are easy and fast to compute and also are good measures for determining the importance of sentences, therefore many existing summarizers (Erkan and Radev, 2004; Alguliev et al., 2011; Alguliev et al., 2013) have utilized this technique (or some form of it).

Centroid-based summarization, another set of techniques which has become a common baseline, is based on TFIDF topic representation. This kind of method ranks sentences by computing their salience using a set of features. A complete overview of the centroid-based approach is available in (Radev et al., 2004) but we outline briefly the basic idea.

The first step is topic detection and documents that describe the same topic clustered together. To achieve this goal, TFIDF vector representations of the documents are created and those words whose TFIDF scores are below a threshold are removed. Then, a clustering algorithm is run over the TFIDF vectors, consecutively adding documents to clusters and recomputing the centroids according to:

(5)

where is the centroid of the th cluster and is the set of documents that belong to that cluster. Centroids can be considered as pseudo-documents that consist of those words whose TFIDF scores are higher than the threshold and form the cluster.

The second step is using centroids to identify sentences in each cluster that are central to topic of the entire cluster. To accomplish this goal, two metrics are defined (Radev et al., 2000): cluster-based relative utility (CBRU) and cross-sentence informational subsumption (CSIS). CBRU decides how relevant a particular sentence is to the general topic of the entire cluster and CSIS measure redundancy among sentences. In order to approximate two metrics, three features (i.e. central value, positional value and first-sentence overlap) are used. Next, the final score of each sentence is computed and the selection of sentences is determined. For another related work, see (Wan and Yang, 2008).

3.3. Latent Semantic Analysis

Latent semantic analysis (LSA) which is introduced by (Deerwester et al., 1990), is an unsupervised method for extracting a representation of text semantics based on observed words. Gong and Liu (Gong and Liu, 2001) initially proposed a method using LSA to select highly ranked sentences for single and multi-document summarization in the news domain. The LSA method first builds a term-sentence matrix ( by matrix), where each row corresponds to a word from the input ( words) and each column corresponds to a sentence ( sentences). Each entry of the matrix is the weight of the word in sentence

. The weights of the words are computed by TFIDF technique and if a sentence does not have a word the weight of that word in the sentence is zero. Then singular value decomposition (SVD) is used on the matrix and transforms the matrix

into three matrices: .

Matrix () represents a term-topic matrix having weights of words. Matrix is a diagonal matrix () where each row corresponds to the weight of a topic . Matrix is the topic-sentence matrix. The matrix describes how much a sentence represent a topic, thus, shows the weight of the topic in sentence .

Gong and Liu’s method was to choose one sentence per each topic, therefore, based on the length of summary in terms of sentences, they retained the number of topics. This strategy has a drawback due to the fact that a topic may need more than one sentence to convey its information. Consequently, alternative solutions were proposed to improve the performance of LSA-based techniques for summarization. One enhancement was to leverage the weight of each topic to decide the relative size of the summary that should cover the topic, which gives the flexibility of having a variable number of sentences. Another advancement is described in (Steinberger et al., 2007). Steinberger et al. (Steinberger et al., 2007) introduced a LSA-based method which achieves a significantly better performance than the original work. They realized that the sentences that discuss some of important topics are good candidates for summaries, thus, in order to locate those sentences they defined the weight of the sentence as follows:

Let be the ”weight” function, then

(6)

For other variations of LSA technique, see (Hachey et al., 2006; Ozsoy et al., 2010).

3.4. Bayesian Topic Models

Many of the existing multi-document summarization methods have two limitations (Wang et al., 2009): 1) They consider the sentences as independent of each other, so topics embedded in the documents are disregarded. 2)

Sentence scores computed by most existing approaches typically do not have very clear probabilistic interpretations, and many of the sentence scores are calculated using heuristics.

Bayesian topic models are probabilistic models that uncover and represent the topics of documents. They are quite powerful and appealing, because they represent the information (i.e. topics) that are lost in other approaches. Their advantage in describing and representing topics in detail enables the development of summarizer systems which can determine the similarities and differences between documents to be used in summarization (Mani and Bloedorn, 1999).

Apart from enhancement of topic and document representation, topic models often utilize a distinct measure for scoring the sentence called Kullbak-Liebler (KL). The KL is a measure of difference (divergence) between two probability distributions

and (Kullback and Leibler, 1951). In summarization where we have probability of words, the KL divergence of Q from P over the words is defined as :

(7)

where and are probabilities of in and .

KL divergence is an interesting method for scoring sentences in the summarization, because it shows the fact that good summaries are intuitively similar to the input documents. It describes how the importance of words alters in the summary in comparison with the input, i.e. the KL divergence of a good summary and the input will be low.

Probabilistic topic models have gained dramatic attention in recent years in various domains (Na et al., 2014; Chua and Asur, 2013; Ren et al., 2013; Hannon et al., 2011; Allahyari and Kochut, 2015, 2016c, 2016b, 2016a). Latent Dirichlet allocation (LDA) model is the state of the art unsupervised technique for extracting thematic information (topics) of a collection of documents. A complete review for LDA can be found in (Blei et al., 2003; Steyvers and Griffiths, 2007), but the main idea is that documents are represented as a random mixture of latent topics, where each topic is a probability distribution over words.

LDA has been extensively used for multi-document summarization recently. For example, Daume et al. (Daumé III and Marcu, 2006) proposed BayeSum, a Bayesian summarization model for query-focused summarization. Wang et al. (Wang et al., 2009) introduced a Bayesian sentence-based topic model for summarization which used both term-document and term-sentence associations. Their system achieved significance performance and outperformed many other summarization methods. Celikyilmaz et al. (Celikyilmaz and Hakkani-Tur, 2010) describe multi-document summarization as a prediction problem based on a two-phase hybrid model. First, they propose a hierarchical topic model to discover the topic structures of all sentences. Then, they compute the similarities of candidate sentences with human-provided summaries using a novel tree-based sentence scoring function. In the second step they make use of these scores and train a regression model according the lexical and structural characteristics of the sentences, and employ the model to score sentences of new documents (unseen documents) to form a summary.

4. Knowledge Bases and Automatic Summarization

The goal of automatic text summarization is to create summaries that are similar to human-created summaries. However, in many cases, the soundness and readability of created summaries are not satisfactory, because the summaries do not cover all the semantically relevant aspects of data in an effective way. This is because many of the existing text summarization techniques do not consider the semantics of words. A step towards building more accurate summarization systems is to combine summarization techniques with knowledge bases (semantic-based or ontology-based summarizers).

The advent of human-generated knowledge bases and various ontologies in many different domains (e.g. Wikipedia, YAGO, DBpedia, etc) has opened further possibilities in text summarization , and reached increasing attention recently. For example, Henning et al. (Hennig et al., 2008) present an approach to sentence extraction that maps sentences to concepts of an ontology. By considering the ontology features, they can improve the semantic representation of sentences which is beneficial in selection of sentences for summaries. They experimentally showed that ontology-based extraction of sentences outperforms baseline summarizers. Chen et al. (Chen and Verma, 2006) introduce a user query-based text summarizer that uses the UMLS medical ontology to make a summary for medical text. Baralis et al. (Baralis et al., 2013) propose a Yago-based summarizer that leverages YAGO ontology (Suchanek et al., 2007) to identify key concepts in the documents. The concepts are evaluated and then used to select the most representative document sentences. Sankarasubramaniam et al. (Sankarasubramaniam et al., 2014) introduce an approach that employs Wikipedia in conjunction with a graph-based ranking technique. First, they create a bipartite sentence-concept graph, and then use an iterative ranking algorithm for selecting summary sentences.

5. The Impact of Context in Summarization

Summarization systems often have additional evidence they can utilize in order to specify the most important topics of document(s). For example when summarizing blogs, there are discussions or comments coming after the blog post that are good sources of information to determine which parts of the blog are critical and interesting. In scientific paper summarization, there is a considerable amount of information such as cited papers and conference information which can be leveraged to identify important sentences in the original paper. In the following, we describe some the contexts in more details.

5.1. Web Summarization

Web pages contains lots of elements which cannot be summarized such as pictures. The textual information they have is often scarce, which makes applying text summarization techniques limited. Nonetheless, we can consider the context of a web page, i.e. pieces of information extracted from content of all the pages linking to it, as additional material to improve summarization. The earliest research in this regard is (Amitay and Paris, 2000) where they query web search engines and fetch the pages having links to the specified web page. Then they analyze the candidate pages and select the best sentences containing links to the web page heuristically. Delort et al. (Delort et al., 2003) extended and improved this approach by using an algorithm trying to select a sentence about the same topic that covers as many aspects of the web page as possible. For blog summarization, (Hu et al., 2007) proposed a method that first derives representative words from comments and then selects important sentences from the blog post containing representative words. For more related works, see (Sharifi et al., 2013, 2010; Hu et al., 2008).

5.2. Scientific Articles Summarization

A useful source of information when summarizing a scientific paper (i.e. citation-based summarization) is to find other papers that cite the target paper and extract the sentences in which the references take place in order to identify the important aspects of the target paper. Mei et al. (Mei and Zhai, 2008) propose a language model that gives a probability to each word in the citation context sentences. They then score the importance of sentences in the original paper using the KL divergence method (i.e. finding the similarity between a sentence and the language model). For more information, see (Abu-Jbara and Radev, 2011; Qazvinian and Radev, 2008)

5.3. Email Summarization

Email has some distinct characteristics that indicates the aspects of both spoken conversation and written text. For example, summarization techniques must consider the interactive nature of the dialog as in spoken conversations. Nenkova et al. (Nenkova and Bagga, 2004) presented early research in this regard, by proposing a method to generate a summary for the first two levels of the thread discussion. A thread consists of one or more conversations between two or more participants over time. They select a message from the root message and from each response to the root, considering the overlap with root context. Rambow et al. (Rambow et al., 2004) used a machine learning technique and included features related to the thread as well as features of the email structure such as position of the sentence in the tread, number of recipients, etc. Newman et al. (Newman and Blitzer, 2003) describe a system to summarize a full mailbox rather than a single thread by clustering messages into topical groups and then extracting summaries for each cluster.

6. Indicator Representation Approaches

Indicator representation approaches aim to model the representation of the text based on a set of features and use them to directly rank the sentences rather than representing the topics of the input text. Graph-based methods and machine learning techniques are often employed to determine the important sentences to be included in the summary.

6.1. Graph Methods for Summarization

Graph methods, which are influenced by PageRank algorithm (Mihalcea and Tarau, 2004)

, represent the documents as a connected graph. Sentences form the vertices of the graph and edges between the sentences indicate how similar the two sentences are. A common technique employed to connect two vertices is to measure the similarity of two sentences and if it is greater then a threshold they are connected. The most often used method for similarity measure is cosine similarity with TFIDF weights for words.

This graph representation results in two outcomes. First, the partitions (sub-graphs) included in the graph, create discrete topics covered in the documents. The second outcome is the identification of the important sentences in the document. Sentences that are connected to many other sentences in the partition are possibly the center of the graph and more likely to be included in the summary.

Graph-based methods can be used for single as well as multi-document summarization (Erkan and Radev, 2004). Since they do not need language-specific linguistic processing other than sentence and word boundary detection, they can also be applied to various languages (Mihalcea and Tarau, 2005). Nonetheless, using TFIDF weighting scheme for similarity measure has limitations, because it only preserves frequency of words and does not take the syntactic and semantic information into account. Thus, similarity measures based on syntactic and semantic information enhances the performance of the summarization system (Chali and Joty, 2008). For more graph-based approaches, see (Nenkova and McKeown, 2012).

6.2. Machine Learning for Summarization

Machine learning approaches model the summarization as a classification problem. (Kupiec et al., 1995) is an early research attempt at applying machine learning techniques for summarization. Kupiec et al. develop a classification function, naive-Bayes classifier

, to classify the sentences as summary sentences and non-summary sentences based on the features they have, given a training set of documents and their extractive summaries. The classification probabilities are learned statistically from the training data using Bayes’ rule:

(8)

where is a sentence from the document collection, are features used in classification and is the summary to be generated. Assuming the conditional independence between the features:

(9)

The probability a sentence to belongs to the summary is the score of the sentence. The selected classifier plays the role of a sentence scoring function. Some of the frequent features used in summarization include the position of sentences in the document, sentence length, presence of uppercase words, similarity of the sentence to the document title, etc. Machine learning approaches have been widely used in summarization by (Zhou and Hovy, 2003; Wong et al., 2008; Ouyang et al., 2011), to name a few.

Naive Bayes, decision trees, support vector machines, Hidden Markov models and Conditional Random Fields are among the most common machine learning techniques used for summarization. One fundamental difference between classifiers is that sentences to be included in the summary have to be decided

independently. It turns out that methods explicitly assuming the dependency between sentences such as Hidden Markov model (Conroy and O’leary, 2001) and Conditional Random Fields (Shen et al., 2007) often outperform other techniques.

One of the primary issues in utilizing supervised learning methods for summarization is that they need a set of training documents (labeled data) to train the classifier, which may not be always easily available. Researchers have proposed some alternatives to cope with this issue:

  • Annotated corpora creation: Creating annotated corpus for summarization greatly benefits the researchers, because more public benchmarks will be available which makes it easier to compare different summarization approaches together. It also lowers the risk of overfitting with a limited data. Ulrich et al. (Ulrich et al., 2008) introduce a publicly available annotated email corpus and its creation process. However, creating annotated corpus is very time consuming and more critically, there is no standard agreement on choosing the sentences, and different people may select varied sentences to construct the summary.

  • Semi-supervised approaches:

    Using a semi-supervised technique to train a classifier. In semi-supervised learning we utilize the unlabeled data in training. There is usually a small amount of labeled data along with a large amount of unlabeled data. For complete overview of semi-supervised learning, see

    (Chapelle et al., 2006). Wong et al. (Wong et al., 2008) proposed a semi-supervised method for extractive summarization. They co-trained two classifiers iteratively to exploit unlabeled data. In each iteration, the unlabeled training examples (sentences) with top scores are included in the labeled training set, and the two classifiers are trained on the new training data.

Machine learning methods have been shown to be very effective and successful in single and multi-document summarization, specifically in class specific summarization where classifiers are trained to locate particular type of information such as scientific paper summarization (Teufel and Moens, 2002; Qazvinian et al., 2014; Qazvinian and Radev, 2008) and biographical summaries (Soares et al., 2011; Zhou et al., 2005; Schiffman et al., 2001).

7. Evaluation

Evaluation of a summary is a difficult task because there is no ideal summary for a document or a collection of documents and the definition of a good summary is an open question to large extent (Saggion and Poibeau, 2013)

. It has been found that human summarizers have low agreement for evaluating and producing summaries. Additionally, prevalent use of various metrics and the lack of a standard evaluation metric has also caused summary evaluation to be difficult and challenging.

7.1. Evaluation of Automatically Produced Summaries

There have been several evaluation campaigns since the late 1990s in the US (Saggion and Poibeau, 2013). They include SUMMAC (1996-1998) (Mani et al., 2002), DUC (the Document Understanding Conference, 2000-2007) (Over et al., 2007), and more recently TAC (the Text Analysis Conference, 2008-present) 111http://www.nist.gov/tac/about/index.html. These conferences have primary role in design of evaluation standards and evaluate the summaries based on human as well as automatic scoring of the summaries.

In order to be able to do automatic summary evaluation, we need to conquer three major difficulties: i) It is fundamental to decide and specify the most important parts of the original text to preserve. ii) Evaluators have to automatically identify these pieces of important information in the candidate summary, since this information can be represented using disparate expressions. iii) the readability of the summary in terms of grammaticality and coherence has to be evaluated.

7.2. Human Evaluation

The simplest way to evaluate a summary is to have a human assess its quality. For example, in DUC, the judges would evaluate the coverage of the summary, i.e. how much the candidate summary covered the original given input. In more recent paradigms, in particular TAC, query-based summaries have been created. Then judges evaluate to what extent a summary answers the given query. The factors that human experts must consider when giving scores to each candidate summary are grammaticality, non redundancy, integration of most important pieces of information, structure and coherence. For more information, see (Saggion and Poibeau, 2013).

7.3. Automatic Evaluation Methods

There has been a set of metrics to automatically evaluate summaries since the early 2000s. ROUGE is the most widely used metric for automatic evaluation.

7.3.1. Rouge

Lin (Lin, 2004) introduced a set of metrics called Recall-Oriented Understudy for Gisting Evaluation (ROUGE) to automatically determine the quality of a summary by comparing it to human (reference) summaries. There are several variations of ROUGE (see (Lin, 2004)), and here we just mention the most broadly used ones:

  • ROUGE-: This metric is recall-based measure and based on comparison of -grams. a series of -grams (mostly two and three and rarely four) is elicited from the reference summaries and the candidate summary (automatically generated summary). Let be ”the number of common -grams between candidate and reference summary”, and be ”the number of -grams extracted from the reference summary only”. The score is computed as:

    (10)
  • ROUGE-: This measure employs the concept of longest common subsequence (LCS) between the two sequences of text. The intuition is that the longer the LCS between two summary sentences, the more similar they are. Although this metric is more flexible than the previous one, it has a drawback that all -grams must be consecutive. For more information about this metric and its refined metric, see (Lin, 2004).

  • ROUGE-SU: This metric called skip bi-gram and uni-gram ROUGE and considers bi-grams as well as uni-grams. This metric allows insertion of words between the first and the last words of the bi-grams, so they do not need to be consecutive sequences of words.

8. Conclusions

The increasing growth of the Internet has made a huge amount of information available. It is difficult for humans to summarize large amounts of text. Thus, there is an immense need for automatic summarization tools in this age of information overload.

In this paper, we emphasized various extractive approaches for single and multi-document summarization. We described some of the most extensively used methods such as topic representation approaches, frequency-driven methods, graph-based and machine learning techniques. Although it is not feasible to explain all diverse algorithms and approaches comprehensively in this paper, we think it provides a good insight into recent trends and progresses in automatic summarization methods and describes the state-of-the-art in this research area.

Acknowledgements.
This project was funded in part by Federal funds from the US National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and Human Services under contract #HHSN272201200031C, which supports the Malaria Host-Pathogen Interaction Center (MaHPIC).

9. Conflict of Interest

The author(s) declare(s) that there is no conflict of interest regarding the publication of this article.

References

  • (1)
  • Abu-Jbara and Radev (2011) Amjad Abu-Jbara and Dragomir Radev. 2011. Coherent citation-based summarization of scientific papers. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1. Association for Computational Linguistics, 500–509.
  • Alguliev et al. (2011) Rasim M Alguliev, Ramiz M Aliguliyev, Makrufa S Hajirahimova, and Chingiz A Mehdiyev. 2011. MCMR: Maximum coverage and minimum redundant text summarization model. Expert Systems with Applications 38, 12 (2011), 14514–14522.
  • Alguliev et al. (2013) Rasim M Alguliev, Ramiz M Aliguliyev, and Nijat R Isazade. 2013. Multiple documents summarization based on evolutionary optimization algorithm. Expert Systems with Applications 40, 5 (2013), 1675–1689.
  • Allahyari and Kochut (2015) Mehdi Allahyari and Krys Kochut. 2015. Automatic topic labeling using ontology-based topic models. In Machine Learning and Applications (ICMLA), 2015 IEEE 14th International Conference on. IEEE, 259–264.
  • Allahyari and Kochut (2016a) Mehdi Allahyari and Krys Kochut. 2016a. Discovering Coherent Topics with Entity Topic Models. In Web Intelligence (WI), 2016 IEEE/WIC/ACM International Conference on. IEEE, 26–33.
  • Allahyari and Kochut (2016b) Mehdi Allahyari and Krys Kochut. 2016b. Semantic Context-Aware Recommendation via Topic Models Leveraging Linked Open Data. In International Conference on Web Information Systems Engineering. Springer, 263–277.
  • Allahyari and Kochut (2016c) Mehdi Allahyari and Krys Kochut. 2016c. Semantic Tagging Using Topic Models Exploiting Wikipedia Category Network. In Semantic Computing (ICSC), 2016 IEEE Tenth International Conference on. IEEE, 63–70.
  • Allahyari et al. (2017) M. Allahyari, S. Pouriyeh, M. Assefi, S. Safaei, E. D. Trippe, J. B. Gutierrez, and K. Kochut. 2017. A Brief Survey of Text Mining: Classification, Clustering and Extraction Techniques. ArXiv e-prints (2017). arXiv:1707.02919
  • Amitay and Paris (2000) Einat Amitay and Cécile Paris. 2000. Automatically summarising web sites: is there a way around it?. In Proceedings of the ninth international conference on Information and knowledge management. ACM, 173–179.
  • Baralis et al. (2013) Elena Baralis, Luca Cagliero, Saima Jabeen, Alessandro Fiori, and Sajid Shah. 2013. Multi-document summarization based on the Yago ontology. Expert Systems with Applications 40, 17 (2013), 6976–6984.
  • Berg-Kirkpatrick et al. (2011) Taylor Berg-Kirkpatrick, Dan Gillick, and Dan Klein. 2011. Jointly learning to extract and compress. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1. Association for Computational Linguistics, 481–490.
  • Blei et al. (2003) David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation. the Journal of machine Learning research 3 (2003), 993–1022.
  • Celikyilmaz and Hakkani-Tur (2010) Asli Celikyilmaz and Dilek Hakkani-Tur. 2010. A hybrid hierarchical model for multi-document summarization. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 815–824.
  • Chali and Joty (2008) Yllias Chali and Shafiq R Joty. 2008. Improving the performance of the random walk model for answering complex questions. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers. Association for Computational Linguistics, 9–12.
  • Chapelle et al. (2006) Olivier Chapelle, Bernhard Schölkopf, Alexander Zien, and others. 2006. Semi-supervised learning. Vol. 2. MIT press Cambridge.
  • Chen and Verma (2006) Ping Chen and Rakesh Verma. 2006. A query-based medical information summarization system using ontology knowledge. In Computer-Based Medical Systems, 2006. CBMS 2006. 19th IEEE International Symposium on. IEEE, 37–42.
  • Chua and Asur (2013) Freddy Chong Tat Chua and Sitaram Asur. 2013. Automatic Summarization of Events from Social Media.. In ICWSM.
  • Conroy and O’leary (2001) John M Conroy and Dianne P O’leary. 2001. Text summarization via hidden markov models. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 406–407.
  • Daumé III and Marcu (2006) Hal Daumé III and Daniel Marcu. 2006. Bayesian query-focused summarization. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 305–312.
  • Deerwester et al. (1990) Scott C. Deerwester, Susan T Dumais, Thomas K. Landauer, George W. Furnas, and Richard A. Harshman. 1990. Indexing by latent semantic analysis. JASIS 41, 6 (1990), 391–407.
  • Delort et al. (2003) J-Y Delort, Bernadette Bouchon-Meunier, and Maria Rifqi. 2003. Enhanced web document summarization using hyperlinks. In Proceedings of the fourteenth ACM conference on Hypertext and hypermedia. ACM, 208–215.
  • Dunning (1993) Ted Dunning. 1993. Accurate methods for the statistics of surprise and coincidence. Computational linguistics 19, 1 (1993), 61–74.
  • Edmundson (1969) Harold P Edmundson. 1969. New methods in automatic extracting. Journal of the ACM (JACM) 16, 2 (1969), 264–285.
  • Erkan and Radev (2004) Günes Erkan and Dragomir R Radev. 2004. LexRank: Graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res.(JAIR) 22, 1 (2004), 457–479.
  • Gong and Liu (2001) Yihong Gong and Xin Liu. 2001. Generic text summarization using relevance measure and latent semantic analysis. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 19–25.
  • Gupta and Lehal (2010) Vishal Gupta and Gurpreet Singh Lehal. 2010. A survey of text summarization extractive techniques. Journal of Emerging Technologies in Web Intelligence 2, 3 (2010), 258–268.
  • Hachey et al. (2006) Ben Hachey, Gabriel Murray, and David Reitter. 2006. Dimensionality reduction aids term co-occurrence based multi-document summarization. In Proceedings of the workshop on task-focused summarization and question answering. Association for Computational Linguistics, 1–7.
  • Hannon et al. (2011) John Hannon, Kevin McCarthy, James Lynch, and Barry Smyth. 2011. Personalized and automatic social summarization of events in video. In Proceedings of the 16th international conference on Intelligent user interfaces. ACM, 335–338.
  • Harabagiu and Lacatusu (2005) Sanda Harabagiu and Finley Lacatusu. 2005. Topic themes for multi-document summarization. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 202–209.
  • Hennig et al. (2008) Leonhard Hennig, Winfried Umbrath, and Robert Wetzker. 2008. An ontology-based approach to text summarization. In Web Intelligence and Intelligent Agent Technology, 2008. WI-IAT’08. IEEE/WIC/ACM International Conference on, Vol. 3. IEEE, 291–294.
  • Hu et al. (2007) Meishan Hu, Aixin Sun, and Ee-Peng Lim. 2007. Comments-oriented blog summarization by sentence extraction. In Proceedings of the sixteenth ACM conference on Conference on information and knowledge management. ACM, 901–904.
  • Hu et al. (2008) Meishan Hu, Aixin Sun, and Ee-Peng Lim. 2008. Comments-oriented document summarization: understanding documents with readers’ feedback. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 291–298.
  • Knight and Marcu (2000) Kevin Knight and Daniel Marcu. 2000. Statistics-based summarization-step one: Sentence compression. In AAAI/IAAI. 703–710.
  • Kullback and Leibler (1951) Solomon Kullback and Richard A Leibler. 1951. On information and sufficiency. The Annals of Mathematical Statistics (1951), 79–86.
  • Kupiec et al. (1995) Julian Kupiec, Jan Pedersen, and Francine Chen. 1995. A trainable document summarizer. In Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 68–73.
  • Lin (2004) Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text Summarization Branches Out: Proceedings of the ACL-04 Workshop. 74–81.
  • Lloret and Palomar (2012) Elena Lloret and Manuel Palomar. 2012. Text summarisation in progress: a literature review. Artificial Intelligence Review 37, 1 (2012), 1–41.
  • Luhn (1958) Hans Peter Luhn. 1958. The automatic creation of literature abstracts. IBM Journal of research and development 2, 2 (1958), 159–165.
  • Mani and Bloedorn (1999) Inderjeet Mani and Eric Bloedorn. 1999. Summarizing similarities and differences among related documents. Information Retrieval 1, 1-2 (1999), 35–67.
  • Mani et al. (2002) Inderjeet Mani, Gary Klein, David House, Lynette Hirschman, Therese Firmin, and Beth Sundheim. 2002. SUMMAC: a text summarization evaluation. Natural Language Engineering 8, 01 (2002), 43–68.
  • Mei and Zhai (2008) Qiaozhu Mei and ChengXiang Zhai. 2008. Generating Impact-Based Summaries for Scientific Literature.. In ACL, Vol. 8. Citeseer, 816–824.
  • Mihalcea and Tarau (2004) Rada Mihalcea and Paul Tarau. 2004. TextRank: Bringing order into texts. Association for Computational Linguistics.
  • Mihalcea and Tarau (2005) Rada Mihalcea and Paul Tarau. 2005. A language independent algorithm for single and multiple document summarization. (2005).
  • Na et al. (2014) Liu Na, Li Ming-xia, Lu Ying, Tang Xiao-jun, Wang Hai-wen, and Xiao Peng. 2014. Mixture of topic model for multi-document summarization. In Control and Decision Conference (2014 CCDC), The 26th Chinese. IEEE, 5168–5172.
  • Nenkova and Bagga (2004) Ani Nenkova and Amit Bagga. 2004. Facilitating email thread access by extractive summary generation.

    Recent advances in natural language processing III: selected papers from RANLP

    2003 (2004), 287.
  • Nenkova and McKeown (2012) Ani Nenkova and Kathleen McKeown. 2012. A survey of text summarization techniques. In Mining Text Data. Springer, 43–76.
  • Newman and Blitzer (2003) Paula S Newman and John C Blitzer. 2003. Summarizing archived discussions: a beginning. In Proceedings of the 8th international conference on Intelligent user interfaces. ACM, 273–276.
  • Ouyang et al. (2011) You Ouyang, Wenjie Li, Sujian Li, and Qin Lu. 2011. Applying regression models to query-focused multi-document summarization. Information Processing & Management 47, 2 (2011), 227–237.
  • Over et al. (2007) Paul Over, Hoa Dang, and Donna Harman. 2007. DUC in Context. Inf. Process. Manage. 43, 6 (Nov. 2007), 1506–1520. https://doi.org/10.1016/j.ipm.2007.01.019
  • Ozsoy et al. (2010) Makbule Gulcin Ozsoy, Ilyas Cicekli, and Ferda Nur Alpaslan. 2010. Text summarization of turkish texts using latent semantic analysis. In Proceedings of the 23rd international conference on computational linguistics. Association for Computational Linguistics, 869–876.
  • Qazvinian and Radev (2008) Vahed Qazvinian and Dragomir R Radev. 2008. Scientific paper summarization using citation summary networks. In Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1. Association for Computational Linguistics, 689–696.
  • Qazvinian et al. (2014) Vahed Qazvinian, Dragomir R Radev, Saif M Mohammad, Bonnie Dorr, David Zajic, Michael Whidby, and Taesun Moon. 2014. Generating extractive summaries of scientific paradigms. arXiv preprint arXiv:1402.0556 (2014).
  • Radev et al. (2002) Dragomir R Radev, Eduard Hovy, and Kathleen McKeown. 2002. Introduction to the special issue on summarization. Computational linguistics 28, 4 (2002), 399–408.
  • Radev et al. (2000) Dragomir R Radev, Hongyan Jing, and Malgorzata Budzikowska. 2000. Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies. In Proceedings of the 2000 NAACL-ANLP Workshop on Automatic Summarization. Association for Computational Linguistics, 21–30.
  • Radev et al. (2004) Dragomir R Radev, Hongyan Jing, Małgorzata Styś, and Daniel Tam. 2004. Centroid-based summarization of multiple documents. Information Processing & Management 40, 6 (2004), 919–938.
  • Rambow et al. (2004) Owen Rambow, Lokesh Shrestha, John Chen, and Chirsty Lauridsen. 2004. Summarizing email threads. In Proceedings of HLT-NAACL 2004: Short Papers. Association for Computational Linguistics, 105–108.
  • Ren et al. (2013) Zhaochun Ren, Shangsong Liang, Edgar Meij, and Maarten de Rijke. 2013. Personalized time-aware tweets summarization. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval. ACM, 513–522.
  • Saggion and Poibeau (2013) Horacio Saggion and Thierry Poibeau. 2013. Automatic text summarization: Past, present and future. In Multi-source, Multilingual Information Extraction and Summarization. Springer, 3–21.
  • Salton and Buckley (1988) Gerard Salton and Christopher Buckley. 1988. Term-weighting approaches in automatic text retrieval. Information processing & management 24, 5 (1988), 513–523.
  • Sankarasubramaniam et al. (2014) Yogesh Sankarasubramaniam, Krishnan Ramanathan, and Subhankar Ghosh. 2014. Text summarization using Wikipedia. Information Processing & Management 50, 3 (2014), 443–461.
  • Savova et al. (2010) Guergana K Savova, James J Masanz, Philip V Ogren, Jiaping Zheng, Sunghwan Sohn, Karin C Kipper-Schuler, and Christopher G Chute. 2010. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. Journal of the American Medical Informatics Association 17, 5 (2010), 507–513.
  • Schiffman et al. (2001) Barry Schiffman, Inderjeet Mani, and Kristian J Concepcion. 2001. Producing biographical summaries: Combining linguistic knowledge with corpus statistics. In Proceedings of the 39th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 458–465.
  • Sharifi et al. (2010) Beaux Sharifi, Mark-Anthony Hutton, and Jugal Kalita. 2010. Summarizing microblogs automatically. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 685–688.
  • Sharifi et al. (2013) Beaux P Sharifi, David I Inouye, and Jugal K Kalita. 2013. Summarization of Twitter Microblogs. Comput. J. (2013), bxt109.
  • Shen et al. (2007) Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, and Zheng Chen. 2007. Document Summarization Using Conditional Random Fields.. In IJCAI, Vol. 7. 2862–2867.
  • Soares et al. (2011) Sérgio Soares, Bruno Martins, and Pavel Calado. 2011. Extracting biographical sentences from textual documents. In Proceedings of the 15th Portuguese Conference on Artificial Intelligence (EPIA 2011), Lisbon, Portugal. 718–30.
  • Spärck Jones (2007) Karen Spärck Jones. 2007. Automatic summarising: The state of the art. Information Processing & Management 43, 6 (2007), 1449–1481.
  • Steinberger et al. (2007) Josef Steinberger, Massimo Poesio, Mijail A Kabadjov, and Karel Ježek. 2007. Two uses of anaphora resolution in summarization. Information Processing & Management 43, 6 (2007), 1663–1680.
  • Steyvers and Griffiths (2007) Mark Steyvers and Tom Griffiths. 2007. Probabilistic topic models. Handbook of latent semantic analysis 427, 7 (2007), 424–440.
  • Suchanek et al. (2007) Fabian M Suchanek, Gjergji Kasneci, and Gerhard Weikum. 2007. Yago: a core of semantic knowledge. In Proceedings of the 16th international conference on World Wide Web. ACM, 697–706.
  • Teufel and Moens (2002) Simone Teufel and Marc Moens. 2002. Summarizing scientific articles: experiments with relevance and rhetorical status. Computational linguistics 28, 4 (2002), 409–445.
  • Trippe et al. (2017) E. D. Trippe, J. B. Aguilar, Y. H. Yan, M. V. Nural, J. A. Brady, M. Assefi, S. Safaei, M. Allahyari, S. Pouriyeh, M. R. Galinski, J. C. Kissinger, and J. B. Gutierrez. 2017. A Vision for Health Informatics: Introducing the SKED Framework.An Extensible Architecture for Scientific Knowledge Extraction from Data. ArXiv e-prints (2017). arXiv:1706.07992
  • Turpin et al. (2007) Andrew Turpin, Yohannes Tsegay, David Hawking, and Hugh E Williams. 2007. Fast generation of result snippets in web search. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 127–134.
  • Ulrich et al. (2008) Jan Ulrich, Gabriel Murray, and Giuseppe Carenini. 2008. A publicly available annotated corpus for supervised email summarization. In Proc. of aaai email-2008 workshop, chicago, usa.
  • Vanderwende et al. (2007) Lucy Vanderwende, Hisami Suzuki, Chris Brockett, and Ani Nenkova. 2007. Beyond SumBasic: Task-focused summarization with sentence simplification and lexical expansion. Information Processing & Management 43, 6 (2007), 1606–1618.
  • Wan and Yang (2008) Xiaojun Wan and Jianwu Yang. 2008. Multi-document summarization using cluster-based link analysis. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 299–306.
  • Wang et al. (2009) Dingding Wang, Shenghuo Zhu, Tao Li, and Yihong Gong. 2009. Multi-document summarization using sentence-based topic models. In Proceedings of the ACL-IJCNLP 2009 Conference Short Papers. Association for Computational Linguistics, 297–300.
  • Wong et al. (2008) Kam-Fai Wong, Mingli Wu, and Wenjie Li. 2008. Extractive summarization using supervised and semi-supervised learning. In Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1. Association for Computational Linguistics, 985–992.
  • Yih et al. (2007) Wen-tau Yih, Joshua Goodman, Lucy Vanderwende, and Hisami Suzuki. 2007. Multi-Document Summarization by Maximizing Informative Content-Words.. In IJCAI, Vol. 2007. 20th.
  • Zhou and Hovy (2003) Liang Zhou and Eduard Hovy. 2003. A web-trained extraction summarization system. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1. Association for Computational Linguistics, 205–211.
  • Zhou et al. (2005) Liang Zhou, Miruna Ticrea, and Eduard Hovy. 2005. Multi-document biography summarization. arXiv preprint cs/0501078 (2005).