Novelty and Coverage in context-based information filtering

11/24/2018
by   Alexandra Dumitrescu, et al.
0

We present a collection of algorithms to filter a stream of documents in such a way that the filtered documents will cover as well as possible the interest of a person, keeping in mind that, at any given time, the offered documents should not only be relevant, but should also be diversified, in the sense not only of avoiding nearly identical documents, but also of covering as well as possible all the interests of the person. We use a modification of the WEBSOM algorithm, with limited architectural adaptation, to create a user model (which we call the "user context" or simply the "context") based on a network of units laid out in the word space and trained using a collection of documents representative of the context. We introduce the concepts of novelty and coverage. Novelty is related to, but not identical to, the homonymous information retrieval concept: a document is novel it it belongs to a semantic area of interest to a person for which no documents have been seen in the recent past. A group of documents has coverage to the extent to which it is a good representation of all the interests of a person. In order to increase coverage, we introduce an "interest" (or "urgency") factor for each unit of the user model, modulated by the scores of the incoming documents: the interest of a unit is decreased drastically when a document arrives that belongs to its semantic area and slowly recovers its initial value if no documents from that semantic area are displayed. Our tests show that these algorithms can effectively increase the coverage of the documents that are shown to the user without overly affecting precision.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/23/2017

Content Based Document Recommender using Deep Learning

With the recent advancements in information technology there has been a ...
research
01/06/2020

Semantic Sensitive TF-IDF to Determine Word Relevance in Documents

Keyword extraction has received an increasing attention as an important ...
research
11/26/2021

Predicting Document Coverage for Relation Extraction

This paper presents a new task of predicting the coverage of a text docu...
research
07/18/2017

Energy-Efficient Strip Monitoring by Identical Devices Directed to One Side Along the Strip and Having a Coverage Area in the Form of a Sector

In this paper, we propose several regular covers with identical sectors ...
research
03/20/2017

Automatic Text Summarization Approaches to Speed up Topic Model Learning Process

The number of documents available into Internet moves each day up. For t...
research
05/10/2021

Word-level Human Interpretable Scoring Mechanism for Novel Text Detection Using Tsetlin Machines

Recent research in novelty detection focuses mainly on document-level cl...

Please sign up or login with your details

Forgot password? Click here to reset