Utilização de Grafos e Matriz de Similaridade na Sumarização Automática de Documentos Baseada em Extração de Frases
The internet increased the amount of information available. However, the reading and understanding of this information are costly tasks. In this scenario, the Natural Language Processing (NLP) applications enable very important solutions, highlighting the Automatic Text Summarization (ATS), which produce a summary from one or more source texts. Automatically summarizing one or more texts, however, is a complex task because of the difficulties inherent to the analysis and generation of this summary. This master's thesis describes the main techniques and methodologies (NLP and heuristics) to generate summaries. We have also addressed and proposed some heuristics based on graphs and similarity matrix to measure the relevance of judgments and to generate summaries by extracting sentences. We used the multiple languages (English, French and Spanish), CSTNews (Brazilian Portuguese), RPM (French) and DECODA (French) corpus to evaluate the developped systems. The results obtained were quite interesting.
READ FULL TEXT