Data-driven Summarization of Scientific Articles

04/24/2018
by   Nikola I. Nikolov, et al.
0

Data-driven approaches to sequence-to-sequence modelling have been successfully applied to short text summarization of news articles. Such models are typically trained on input-summary pairs consisting of only a single or a few sentences, partially due to limited availability of multi-sentence training data. Here, we propose to use scientific articles as a new milestone for text summarization: large-scale training data come almost for free with two types of high-quality summaries at different levels - the title and the abstract. We generate two novel multi-sentence summarization datasets from scientific articles and test the suitability of a wide range of existing extractive and abstractive neural network-based summarization approaches. Our analysis demonstrates that scientific papers are suitable for data-driven text summarization. Our results could serve as valuable benchmarks for scaling sequence-to-sequence models to very long sequences.

READ FULL TEXT

page 5

page 6

page 7

research
01/21/2022

SciBERTSUM: Extractive Summarization for Scientific Documents

The summarization literature focuses on the summarization of news articl...
research
10/18/2018

WikiHow: A Large Scale Text Summarization Dataset

Sequence-to-sequence models have recently gained the state of the art pe...
research
09/14/2019

Efficiency Metrics for Data-Driven Models: A Text Summarization Case Study

Using data-driven models for solving text summarization or similar tasks...
research
04/28/2018

Data-Driven Methods for Solving Algebra Word Problems

We explore contemporary, data-driven techniques for solving math word pr...
research
11/03/2020

Topic-Centric Unsupervised Multi-Document Summarization of Scientific and News Articles

Recent advances in natural language processing have enabled automation o...
research
02/14/2023

Exploiting Summarization Data to Help Text Simplification

One of the major problems with text simplification is the lack of high-q...
research
11/02/2018

Abstractive Summarization of Reddit Posts with Multi-level Memory Networks

We address the problem of abstractive summarization in two directions: p...

Please sign up or login with your details

Forgot password? Click here to reset