Log In Sign Up

LIA-RAG: a system based on graphs and divergence of probabilities applied to Speech-To-Text Summarization

This paper aims to introduces a new algorithm for automatic speech-to-text summarization based on statistical divergences of probabilities and graphs. The input is a text from speech conversations with noise, and the output a compact text summary. Our results, on the pilot task CCCS Multiling 2015 French corpus are very encouraging


page 1

page 2

page 3

page 4


ESSumm: Extractive Speech Summarization from Untranscribed Meeting

In this paper, we propose a novel architecture for direct extractive spe...

JSSS: free Japanese speech corpus for summarization and simplification

In this paper, we construct a new Japanese speech corpus for speech-base...

Attention-based Multi-hypothesis Fusion for Speech Summarization

Speech summarization, which generates a text summary from speech, can be...

Align then Summarize: Automatic Alignment Methods for Summarization Corpus Creation

Summarizing texts is not a straightforward task. Before even considering...

Dimensionality on Summarization

Summarization is one of the key features of human intelligence. It plays...

1 Introduction

Nowadays, a lot of information is daily generated. It is necessary to have available memory storage because each datum must be processed and the information contained therein analyzed. The manual analysis is impossible because it is necessary a huge number of persons to analyze this information in an available time. The summary is a short text with main ideas of original text [Torres-Moreno2014] and reduces the read time to analyze these data.

Audio is widely used in daily life on the radio and on the internet, in news, interviews and conversations. A Call Centre Conversation creates a lot of conversations every day. These centers has issues and tasks. It is essential the control of the discussed topics and the results obtained by customers in these calls. One way to analyze and accelerate the data processing is speech summarization, that is different from traditional text summarization because there are other problems in these texts as speech errors, sentences of different sizes and colloquialisms.

“Multiling is a community-driven initiative for benchmarking multilingual summarization systems, nurturing further research, and pushing the state-of-the-art in the area”111 The MultiLing 2015 initiative features the following tasks: Multilingual Multi-document Summarization, Multilingual Single-document Summarization, Online Forum Summarization and Call Centre Conversation Summarization (CCCS). The CCCS pilot task consists in “creating systems that can analyze call centres conversations and generate written summaries reflecting why the customer is calling, how the agent answers that query, what are the steps to solve the problem and what is the resolution status of the problem” [Favre et al.2015].

We developed the LIA-RAG summarization system based on the RAG system [Pontes et al.2015], coupled with some post-processing rules in order to generate a final summary. LIA-RAG uses a graph model to analyze and verify a set of documents (e.g., the conversation transcription) for MultiLing’15 CCCS pilot task. LIA-RAG creates a summary computing the relevance of the words and the similarity among the sentences. The system uses a simple post-processing to improve the quality of the final summary.

The rest of the paper is organized as follows: section 2 describes related work on automatic summarization of texts and conversations. Sections 3 and 4 analyze the graph model and the system used in this work. Section 5 describes the results obtained for Multiling/DECODA French corpus and section 6 concludes this work.

2 Related Works

Automatic Text Summarization (ATS) aims to creates a summary containing the main ideas of a textual document [Mani and Mayburi1999, Mani2001, Torres-Moreno2014]. The summary can be an extraction or abstraction of a single document or multi-document. The extraction process identifies the most informative sentences of a document and creates a summary by assembling of these sentences [Luhn1958, Torres-Moreno2014]. Extraction may be guided (by a query). In this case, the algorithm selects the most relevant information follow a particular topic. The abstraction algorithms create new (or reformulate) sentences from original texts [Seno2010, Seno and Nunes2008] and the extraction methods use the key sentences of texts [Barzilay and McKeown2005, Torres-Moreno2014].

Works about abstraction usually uses syntactic and semantic knowledge of a language to create the summary. This procedure verifies the best construction of a sentence [Barzilay et al.1999]. This type of summarization uses fusion to help the review of information. [Seno2010] proposed a method to fusion similar sentences in Brazilian Portuguese based on a symbolic and domain-independent approach. This method allows the fusion by union and by intersection of a document cluster. Fusion by union preserves the overall message of the cluster while fusion by intersection analyses the redundant information considered most important in the cluster. [Seno and Nunes2008] described how to identify common information between sentences in Brazilian Portuguese using lexical knowledge, syntactic and semantic rules of paraphrasing.

[Jorge et al.2010] developed a summarizer system based on the CST model (Cross-document Structure Theory). The system proposed analyses redundancy and contradiction among different information sources in Brazilian Portuguese.

[Barzilay et al.1999] developed a method to generate automatic summaries by identifying and synthesizing similar elements in a cluster of documents. This method creates the summary based on similarity between the sentences and topic. [Barzilay and McKeown2005] described an approach to fusion sentences through the text-to-text technique, to synthesize repeated information from multiple documents. This method uses a syntactic alignment in sentences to identify common information. After the identification step, sentences are processed and a new text is generated with the same content.

A way to calculate the similarity between sentences is to use co-occurrence of words. [He et al.2008] proposed a fusion method using similarity metrics, co-occurrence skip-bigram and information density to evaluate sentences and to select the most relevant ones. [Hennig and Albayrak2010] developed a multi-document model to summarize by analyzing the co-occurrence of sentence-term and sentence-bigram using the Jensen-Shannon (JS) divergence.

Another method to obtain relevant sentences uses compression, as reported in [Pitler2010]. Pitler uses approaches based on syntactic trees, sentences and discourse. [Filippova2010] describes a multi-sentence compression method using a word-based graph.

The summarization by extraction does not have the same quality as the summaries produced by abstraction because it uses surface methods based on statistical calculations to verify the sentence relevance. However, the extraction is general and do not require deep analysis of the language [Barzilay and McKeown2005, Pontes et al.2014].

[Pontes et al.2014] use Graph theory concomitant with JS divergence to create multi-document summaries by extraction. Their system describes a text model as a graph where the sentences are represented by vertices and the edges connect two similar sentences. Their approach calculates the stable set of the graph aiming creating the summary containing sentences with general information of the cluster and without redundancy. [Linhares et al.2013]

model the text as graph model and use a heuristic (greedy algorithm) to obtain the relevant sentences in the text.

The speech summarization task is more complex and it involves other problems. It is more difficult to identify utterance boundaries because it may be fragmented, contain disfluencies and also because speech recognition introduces errors. Meetings involve multi-party conversation with overlapping speakers. The language used is informal and utterances tend to be partial, fragmentary, ungrammatical and include many ellipses and pronouns. However, the speech signal may provides additional information that emphasizes a piece of text as prosody [Murray et al.2005].

[Mckeown et al.2005] described some ways to use a text summarization as a speech summarization. They described some work about summarization of broadcast news and meetings. [Murray et al.2005] analyzed extractive summarization of multiparty meetings. They described Maximal Marginal Relevance and Latent Semantic Analysis to create the summary based on prosodic and lexical features.

3 Modeling the problem

This paper aims to design a system to summarize several documents by extraction its most important sentences. Statistical techniques were used to build a language independent system. The proposed methods are based on a specific preprocessing of words, a weighting function of sentences and a bag-of-words model to represent the text content.

This model uses matrices represented by and constructed from documents, where is the number of sentences and is the number of distinct words in the document (). The cell of the matrix represents the frequency of word in the sentence () of the document . This stage was constructed using the libraries and algorithms from Cortex summarization system [Torres-Moreno et al.2002, Torres-Moreno et al.2001].


3.1 Jensen-Shannon divergence

We use Jensen-Shannon (JS) divergence to measure the similarity between sentences. Let

be a words’ set in P and Q. P and Q represent the probability distribution between two objects: two individuals sentences or a sentence and a set of sentences. The divergence will then calculated among these two objects. The

JS divergence is symmetric and provides a stable way to measure the difference between two distributions (equation 3.1).

The JS divergence value ranges from . It is closer to zero when the distributions are similar and they differ in another case.

In the case there is a word in a sentence that is missing in another one, a smooth (different weighting) will be used to avoid null values and have a smoother distribution [Hiemstra2009]. If a word is not present in the sentence , then the smooth is calculated by the equation 3, where , which is the number of distinct words in , is the variable that controls the relevance of the missing word in the sentence and is the number of words in [Louis and Nenkova2013].


3.2 Term Frequency-Inverse Sentence Frequency (TF-ISF)

One way to verify the initial relevance of a word and a sentence to the text is through the TF-ISF. This metric is based on term frequency in the text and it is calculated by the equation 4.


where is frequency of term , is total number of documents and is number of documents that contain the term .

4 The LIA-RAG system

In general lines, a text consists of several sentences with different topics. The text can be divided into several groups and each of them describes one step/idea in the text. If a group is large, then it is relevant to the text. It is possible to choose the sentences of the largest group and obtains the most relevant content.

The main ideas of a text are generally analyzed and discussed several times. The vertices with higher degree have more similar sentences and then, are important to the text. However, it is not necessary to have a lot of similar sentences to be a relevant one.

Résumeur Audio-texte à base de Graphes (RAG) is a summarizer system by sentence extraction, which selects the main sentences of a text and uses a post-processing to remove some errors and make the text more concise and compact.

4.1 The RAG algorithm

RAG uses Graph theory and divergence metrics to calculate the similarity and to group the sentences. Initially, the system performs a filtering process to remove the brackets. Then, it performs a segmentation, filtering and stemming processes to remove stopwords and reduce the words to their roots. RAG accomplished this preprocessing and matrix transformation based on [Torres-Moreno et al.2001]. It calculates the relevance of each sentence based on TF-ISF metric (equation 4) and removes the less relevant sentences.

The system creates a graph which each vertex represents a sentence previously selected. The text is analyzed and modeled as a sentence graph (vertices). Based on equation 4, it calculates the similarity between sentences. If the similarity between two sentences is less than 0.16 (threshold obtained by empirical testing), then the system creates an edge between them. So, the vertices with higher degrees have the most relevant content of the text. However, some sentences may have a small degree, but they may contain important information.

RAG combines the TF-ISF and degree sentences to analyze the relevance of them. The relevance of the sentence is defined by:


where is the degree of vertex and is the relevance of the sentence . After, the system creates a summary with the higher score sentences, excluding similar (or redundant) sentences based on Dice’s coefficient [Bai et al.2012].

The figure 1 describes the RAG system.

Figure 1: Architecture of the RAG system.

4.2 LIA-RAG: RAG with a specific speech post-processing

The speech recognition process produces a text that contains several grammatical problems (slang, colloquialisms, expressions and speech recognition errors). An extraction summary algorithm selects the relevant sentences, however the sentences may have some grammatical problems. So, it is necessary to perform a treatment of this summary.

The main analyzed aspects in this process are:

  • Colloquialisms,

  • Speech expressions and

  • Dates.

LIA-RAG system receives the summary as an input. In this input, some speech expressions are used to connect ideas or concepts in oral conversations. LIA-RAG removes these expressions, because often they are incorrectly transcripted (a noise source). Also, the system eliminates several colloquialisms and the duplicated words. The system replaces some mistaken words by its correct form. The figure 2 shows the architecture of the LIA-RAG system.

Figure 2: Architecture of the LIA-RAG system.

5 Results

The tests were carried on a computer with i5@2.6 GHz processor and 4 GB of RAM on GNU/Linux Debian 64-bit operating system. The algorithms of RAG were implemented using the Perl language.

We used the French DECODA corpus [Bechet et al.2012]. The systems have to generate textual summaries with the main idea of each conversation belonging to the corpus. “The conversations topics range from itinerary and schedule requests, to lost and found, to complaints (the calls were recorded during strikes)”[Favre et al.2015]. Each summary has 7% of the number of words of each conversation transcription. We compared LIA-RAG and RAG systems with two baseline systems (random and first lead base).

In order to evaluate the quality of the summaries, Multiling CCCS used the system Recall-Oriented Understudy for Gisting Evaluation (ROUGE)222The options for running ROUGE 1.5.5 are -a -l 10000 -n 4 -x -2 4 -u -c 95 -r 1000 -f A -p 0.5 -t 0, which determines the quality of an automatic summary based on the intersection of the -grams of a candidate summary and the -grams of a set of reference summaries. More specifically, we used ROUGE-N and ROUGE-SU measures. ROUGE-N, N . ROUGE is an -gram recall measure [Lin2004]333 The values of these metrics belongs to , 1 for the best result.

The table 1 shows the results obtained using the systems over the training corpus. This corpus contains 50 conversations transcription with 23,363 words and 115 summaries. Both versions of RAG provided the best results. The RAG system identified the main sentences discussed in conversations. However, the errors and speech expressions decreased the informativeness. The post-processing of LIA-RAG allowed to improve the results. This process reduces errors and generates a more informative and concise summary.

LIA-RAG:1 0.1893 0.0628 0.0683
RAG 0.1833 0.0614 0.0654
Base-first 0.1578 0.0556 0.0583
Base-rand 0.1170 0.0310 0.0371
Table 1: Evaluation of training corpus.

The French test corpus has 100 conversations transcription with 42,130 words and 212 summaries. The ROUGE-2 official performance for the systems participating to CCCS pilot task is showed in table 2 [Favre et al.2015]. The LIA-RAG system obtained the best results.

Systems ROUGE-2
LIA-RAG:1 0.037
NTNU:1 0.035
NTNU:3 0.034
NTNU:2 0.027
Table 2: Evaluation of test corpus.

6 Conclusion and perspectives

Divergence of probabilities in a graph model to extract key sentences in French speech-to-text summarization was very interesting. LIA-RAG system uses very few language resources (stopwords and stemming) and has achieved good results. Nevertheless, the system is easily adaptable to other languages with only some modifications in the preprocessing stage.

An interesting perspective of this work consists in the utilization of the speech TAGs markers to improve the computation of the sentences score. In addition, it is necessary to improve the post-processing in order to increase the quality of the final summary. Finally, the verification of the grammaticality and readability of the extracted key sentences can help to produce more realistic abstracts.


This project was partially founded by a scholarship from FUNCAP-CE (Brazil).


  • [Bai et al.2012] Ming-Hong Bai, Yu-Ming Hsieh, Keh-Jiann Chen, and Jason S. Chang. 2012. Domcat: A bilingual concordancer for domain-specific computer assisted translation. In Proceedings of the ACL 2012 System Demonstrations, pages 55–60, Jeju Island, Korea, July. Association for Computational Linguistics.
  • [Barzilay and McKeown2005] Regina Barzilay and Kathleen R. McKeown. 2005. Sentence fusion for multidocument news summarization. Comput. Linguist., 31(3):297–328, September.
  • [Barzilay et al.1999] Regina Barzilay, Kathleen R. McKeown, and Michael Elhadad. 1999. Information fusion in the context of multi-document summarization. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics, ACL’99, pages 550–557, Stroudsburg, PA, USA. Association for Computational Linguistics.
  • [Bechet et al.2012] Frederic Bechet, Benjamin Maza, Nicolas Bigouroux, Thierry Bazillon, Marc El-Beze, Renato De Mori, and Eric Arbillot. 2012. Decoda: a call-centre human-human spoken conversation corpus. In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12), Istanbul, Turkey, may. European Language Resources Association (ELRA).
  • [Favre et al.2015] Benoit Favre, Evgeny Stepanov, Jérémy Trione, Frédéric Béchet, and Giuseppe Riccardi. 2015. Call centre conversation summarization: A pilot task at multiling 2015. In SIGdial Meeting on Discourse and Dialogue (SIGDIAL 2015). Prague, Czech Republic.
  • [Filippova2010] Katja Filippova. 2010. Multi-sentence compression: Finding shortest paths in word graphs. In Proceedings of the 23rd International Conference on Computational Linguistics, COLING’10, pages 322–330, Stroudsburg, PA, USA. Association for Computational Linguistics.
  • [He et al.2008] Tingting He, Fang Li, Wei Shao, Jinguang Chen, and Liang Ma. 2008. A new feature-fusion sentence selecting strategy for query-focused multi-document summarization. In Cheolyoung Ock, JeongYong Byun, YuDe Bi, and Hongfei Lin, editors, ALPIT, pages 81–86. IEEE Computer Society.
  • [Hennig and Albayrak2010] L. Hennig and S. Albayrak. 2010.

    Personalized multi-document summarization using n-gram topic model fusion.

    In Proceedings of LREC’10, 1st Workshop on Semantic Personalized Information Management (SPIM 2010), pages 28–34, Valletta, Malta. European Language Resources Association.
  • [Hiemstra2009] D. Hiemstra. 2009. Probability smoothing. In Encyclopedia of Database Systems, pp. 2169-2170, Springer.
  • [Jorge et al.2010] Castro Jorge, Maria Lucía del Rosario, and Thiago Alexandre Salgueiro Pardo. 2010. Experiments with cst-based multidocument summarization. In Proceedings of the 2010 Workshop on Graph-based Methods for Natural Language Processing, TextGraphs-5, pages 74–82, Stroudsburg, PA, USA. Association for Computational Linguistics.
  • [Lin2004] Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Proc. ACL workshop on Text Summarization Branches Out, page 10.
  • [Linhares et al.2013] Andréa Carneiro Linhares, Juan-Manuel Torres-Moreno, and Javier Ramirez. 2013. Résumé automatique 4-lingue avec un algorithme glouton. In ROADEF’13.
  • [Louis and Nenkova2013] Annie Louis and Ani Nenkova. 2013. Automatically assessing machine summary content without a gold standard. Computational Linguistics, 39(2):267–300.
  • [Luhn1958] H. P. Luhn. 1958. The automatic creation of literature abstracts. IBM J. Res. Dev., 2(2):159–165, April.
  • [Mani and Mayburi1999] I. Mani and M. Mayburi. 1999. Advances in Automatic Text Summarization. MIT Press, Cambridge.
  • [Mani2001] Inderjeet Mani. 2001. Automatic Summarization. John Benjamins Publishing Co.
  • [Mckeown et al.2005] Kathleen Mckeown, Julia Hirschberg, Michel Galley, and Sameer Maskey. 2005. From text to speech summarization. In ICASSP. 2005. Philadelphia, PA.
  • [Murray et al.2005] Gabriel Murray, Steve Renals, and Jean Carletta. 2005. Extractive summarization of meeting recordings. In in Proceedings of the 9th European Conference on Speech Communication and Technology, pages 593–596.
  • [Pitler2010] E. Pitler. 2010. Methods for sentence compression. Technical report, University of Pennsylvania, Technical Report MS-CIS-10-20.
  • [Pontes et al.2014] Elvys Linhares Pontes, Andréa Carneiro Linhares, and Juan-Manuel Torres-Moreno. 2014. Sasi: sumarizador automático de documentos baseado no problema do subconjunto independente de vértices. In Proceedings of the XLVI Simpósio Brasileiro de Pesquisa Operacional.
  • [Pontes et al.2015] Elvys Linhares Pontes, Juan-Manuel Torres-Moreno, and Andréa Carneiro Linhares. 2015. Rag : un système de résume automatique à base de graphes.
  • [Seno and Nunes2008] Eloize Rossi Marques Seno and Mariadas Graças Volpe Nunes. 2008. Some experiments on clustering similar sentences of texts in portuguese. In António Teixeira, VeraLúciaStrube de Lima, LuísCaldas de Oliveira, and Paulo Quaresma, editors, Computational Processing of the Portuguese Language, volume 5190 of Lecture Notes in Computer Science, pages 133–142. Springer Berlin Heidelberg.
  • [Seno2010] Eloize Rossi Marques Seno. 2010. Um método para a fusão automática de sentenças similares em português. Ph.D. thesis, Instituto de Ciências Matemáticas e de Computação, Universidade de São Paulo, São Carlos.
  • [Torres-Moreno et al.2001] Juan-Manuel Torres-Moreno, Patricia Velázquez-Morales, and Jean-Guy Meunier. 2001. Cortex : un algorithme pour la condensation automatique des textes. In ARCo’01, volume 2, pages 365–366. Lyon, France.
  • [Torres-Moreno et al.2002] Juan-Manuel Torres-Moreno, Patricia Velázquez-Morales, and Jean-Guy Meunier. 2002. Condensés de textes par des méthodes numériques. In JADT, volume 2, pages 723–734.
  • [Torres-Moreno2014] Juan-Manuel Torres-Moreno. 2014. Automatic Text Summarization. John Wiley & Sons.