Two Steps Feature Selection and Neural Network Classification for the TREC-8 Routing

07/11/2000
by   Mathieu Stricker, et al.
0

For the TREC-8 routing, one specific filter is built for each topic. Each filter is a classifier trained to recognize the documents that are relevant to the topic. When presented with a document, each classifier estimates the probability for the document to be relevant to the topic for which it has been trained. Since the procedure for building a filter is topic-independent, the system is fully automatic. By making use of a sample of documents that have previously been evaluated as relevant or not relevant to a particular topic, a term selection is performed, and a neural network is trained. Each document is represented by a vector of frequencies of a list of selected terms. This list depends on the topic to be filtered; it is constructed in two steps. The first step defines the characteristic words used in the relevant documents of the corpus; the second one chooses, among the previous list, the most discriminant ones. The length of the vector is optimized automatically for each topic. At the end of the term selection, a vector of typically 25 words is defined for the topic, so that each document which has to be processed is represented by a vector of term frequencies. This vector is subsequently input to a classifier that is trained from the same sample. After training, the classifier estimates for each document of a test set its probability of being relevant; for submission to TREC, the top 1000 documents are ranked in order of decreasing relevance.

READ FULL TEXT
research
12/16/2016

Automatic Labelling of Topics with Neural Embeddings

Topics generated by topic models are typically represented as list of te...
research
10/12/2018

HiTR: Hierarchical Topic Model Re-estimation for Measuring Topical Diversity of Documents

A high degree of topical diversity is often considered to be an importan...
research
07/28/2019

TopicSifter: Interactive Search Space Reduction Through Targeted Topic Modeling

Topic modeling is commonly used to analyze and understand large document...
research
03/09/2019

A New Approach for Topic Detection using Adaptive Neural Networks

Topic detection becomes more important due to the increase of informatio...
research
12/23/2016

"What is Relevant in a Text Document?": An Interpretable Machine Learning Approach

Text documents can be described by a number of abstract concepts such as...
research
05/29/2020

Automatic Generation of Topic Labels

Topic modelling is a popular unsupervised method for identifying the und...
research
07/17/2020

Scalable Methods for Calculating Term Co-Occurrence Frequencies

Search techniques make use of elementary information such as term freque...

Please sign up or login with your details

Forgot password? Click here to reset