On the Composition of Scientific Abstracts
Scientific abstracts contain what is considered by the author(s) as information that best describe documents' content. They represent a compressed view of the informational content of a document and allow readers to evaluate the relevance of the document to a particular information need. However, little is known on their composition. This paper contributes to the understanding of the structure of abstracts, by comparing similarity between scientific abstracts and the text content of research articles. More specifically, using sentence-based similarity metrics, we quantify the phenomenon of text re-use in abstracts and examine the positions of the sentences that are similar to sentences in abstracts in the IMRaD structure (Introduction, Methods, Results and Discussion), using a corpus of over 85,000 research articles published in the seven PLOS journals. We provide evidence that 84 one sentence in common with the body of the article. Our results also show that the sections of the paper from which abstract sentence are taken are invariant across the PLOS journals, with sentences mainly coming from the beginning of the introduction and the end of the conclusion.
READ FULL TEXT