Keyword Extraction from Short Texts with a Text-To-Text Transfer Transformer

09/28/2022
by   Piotr Pęzik, et al.
0

The paper explores the relevance of the Text-To-Text Transfer Transformer language model (T5) for Polish (plT5) to the task of intrinsic and extrinsic keyword extraction from short text passages. The evaluation is carried out on the new Polish Open Science Metadata Corpus (POSMAC), which is released with this paper: a collection of 216,214 abstracts of scientific publications compiled in the CURLICAT project. We compare the results obtained by four different methods, i.e. plT5kw, extremeText, TermoPL, KeyBERT and conclude that the plT5kw model yields particularly promising results for both frequent and sparsely represented keywords. Furthermore, a plT5kw keyword generation model trained on the POSMAC also seems to produce highly useful results in cross-domain text labelling scenarios. We discuss the performance of the model on news stories and phone-based dialog transcripts which represent text genres and domains extrinsic to the dataset of scientific abstracts. Finally, we also attempt to characterize the challenges of evaluating a text-to-text model on both intrinsic and extrinsic keyword extraction.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/26/2019

Complex Network based Supervised Keyword Extractor

In this paper, we present a supervised framework for automatic keyword e...
research
04/11/2017

Automatic Keyword Extraction for Text Summarization: A Survey

In recent times, data is growing rapidly in every domain such as news, s...
research
05/19/2022

Mapping Complex Technologies via Science-Technology Linkages; The Case of Neuroscience – A transformer based keyword extraction approach

In this paper, we present an efficient deep learning based approach to e...
research
07/05/2022

Keyword Extraction in Scientific Documents

The scientific publication output grows exponentially. Therefore, it is ...
research
03/20/2020

TNT-KID: Transformer-based Neural Tagger for Keyword Identification

With growing amounts of available textual data, development of algorithm...
research
12/04/2020

On-Device Sentence Similarity for SMS Dataset

Determining the sentence similarity between Short Message Service (SMS) ...

Please sign up or login with your details

Forgot password? Click here to reset