Log In Sign Up

Extending Neural Keyword Extraction with TF-IDF tagset matching

by   Boshko Koloski, et al.

Keyword extraction is the task of identifying words (or multi-word expressions) that best describe a given document and serve in news portals to link articles of similar topics. In this work we develop and evaluate our methods on four novel data sets covering less represented, morphologically-rich languages in European news media industry (Croatian, Estonian, Latvian and Russian). First, we perform evaluation of two supervised neural transformer-based methods (TNT-KID and BERT+BiLSTM CRF) and compare them to a baseline TF-IDF based unsupervised approach. Next, we show that by combining the keywords retrieved by both neural transformer based methods and extending the final set of keywords with an unsupervised TF-IDF based technique, we can drastically improve the recall of the system, making it appropriate to be used as a recommendation system in the media house environment.


page 1

page 2

page 3

page 4


Keywords lie far from the mean of all words in local vector space

Keyword extraction is an important document process that aims at finding...

No Keyword is an Island: In search of covert associations

This paper describes how corpus-assisted discourse analysis based on key...

Comparing the hierarchy of keywords in on-line news portals

The tagging of on-line content with informative keywords is a widespread...

TNT-KID: Transformer-based Neural Tagger for Keyword Identification

With growing amounts of available textual data, development of algorithm...

Context-Based Quotation Recommendation

While composing a new document, anything from a news article to an email...

Exploiting Transformer-based Multitask Learning for the Detection of Media Bias in News Articles

Media has a substantial impact on the public perception of events. A one...