Extending Neural Keyword Extraction with TF-IDF tagset matching

01/31/2021
by   Boshko Koloski, et al.
0

Keyword extraction is the task of identifying words (or multi-word expressions) that best describe a given document and serve in news portals to link articles of similar topics. In this work we develop and evaluate our methods on four novel data sets covering less represented, morphologically-rich languages in European news media industry (Croatian, Estonian, Latvian and Russian). First, we perform evaluation of two supervised neural transformer-based methods (TNT-KID and BERT+BiLSTM CRF) and compare them to a baseline TF-IDF based unsupervised approach. Next, we show that by combining the keywords retrieved by both neural transformer based methods and extending the final set of keywords with an unsupervised TF-IDF based technique, we can drastically improve the recall of the system, making it appropriate to be used as a recommendation system in the media house environment.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/21/2020

Keywords lie far from the mean of all words in local vector space

Keyword extraction is an important document process that aims at finding...
research
03/31/2021

No Keyword is an Island: In search of covert associations

This paper describes how corpus-assisted discourse analysis based on key...
research
06/20/2016

Comparing the hierarchy of keywords in on-line news portals

The tagging of on-line content with informative keywords is a widespread...
research
03/20/2020

TNT-KID: Transformer-based Neural Tagger for Keyword Identification

With growing amounts of available textual data, development of algorithm...
research
05/17/2020

Context-Based Quotation Recommendation

While composing a new document, anything from a news article to an email...
research
11/07/2022

Exploiting Transformer-based Multitask Learning for the Detection of Media Bias in News Articles

Media has a substantial impact on the public perception of events. A one...
research
01/08/2020

REST: A thread embedding approach for identifying and classifying user-specified information in security forums

How can we extract useful information from a security forum? We focus on...

Please sign up or login with your details

Forgot password? Click here to reset