CUED_speech at TREC 2020 Podcast Summarisation Track

12/04/2020
by   Potsawee Manakul, et al.
0

In this paper, we describe our approach for the Podcast Summarisation challenge in TREC 2020. Given a podcast episode with its transcription, the goal is to generate a summary that captures the most important information in the content. Our approach consists of two steps: (1) Filtering redundant or less informative sentences in the transcription using the attention of a hierarchical model; (2) Applying a state-of-the-art text summarisation system (BART) fine-tuned on the Podcast data using a sequence-level reward function. Furthermore, we perform ensembles of three and nine models for our submission runs. We also fine-tune the BART model on the Podcast data as our baseline. The human evaluation by NIST shows that our best submission achieves 1.777 in the EGFB scale, while the score of creator-provided description is 1.291. Our system won the Spotify Podcast Summarisation Challenge in the TREC2020 Podcast Track in both human and automatic evaluation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/12/2022

Implementing Deep Learning-Based Approaches for Article Summarization in Indian Languages

The research on text summarization for low-resource Indian languages has...
research
09/02/2020

Learning to summarize from human feedback

As language models become more powerful, training and evaluation are inc...
research
04/08/2020

Improving BERT with Self-Supervised Attention

One of the most popular paradigms of applying large, pre-trained NLP mod...
research
11/20/2022

Artificial Interrogation for Attributing Language Models

This paper presents solutions to the Machine Learning Model Attribution ...
research
10/01/2020

RRF102: Meeting the TREC-COVID Challenge with a 100+ Runs Ensemble

In this paper, we report the results of our participation in the TREC-CO...
research
06/22/2022

iTiger: An Automatic Issue Title Generation Tool

In both commercial and open-source software, bug reports or issues are u...
research
07/27/2023

ARC-NLP at PAN 2023: Hierarchical Long Text Classification for Trigger Detection

Fanfiction, a popular form of creative writing set within established fi...

Please sign up or login with your details

Forgot password? Click here to reset