C-DLSI: An Extended LSI Tailored for Federated Text Retrieval

10/05/2018
by   Qijun Zhu, et al.
0

As the web expands in data volume and in geographical distribution, centralized search methods become inefficient, leading to increasing interest in cooperative information retrieval, e.g., federated text retrieval (FTR). Different from existing centralized information retrieval (IR) methods, in which search is done on a logically centralized document collection, FTR is composed of a number of peers, each of which is a complete search engine by itself. To process a query, FTR requires firstly the identification of promising peers that host the relevant documents and secondly the retrieval of the most relevant documents from the selected peers. Most of the existing methods only apply traditional IR techniques that treat each text collection as a single large document and utilize term matching to rank the collections. In this paper, we formalize the problem and identify the properties of FTR, and analyze the feasibility of extending LSI with clustering to adapt to FTR, based on which a novel approach called Cluster-based Distributed Latent Semantic Indexing (C-DLSI) is proposed. C-DLSI distinguishes the topics of a peer with clustering, captures the local LSI spaces within the clusters, and consider the relations among these LSI spaces, thus providing more precise characterization of the peer. Accordingly, novel descriptors of the peers and a compatible local text retrieval are proposed. The experimental results show that C-DLSI outperforms existing methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/01/2018

Information Retrieval in the Cloud

There has been a recent trend to migrate IT infrastructure into the clou...
research
11/01/2020

Cheap IR Evaluation: Fewer Topics, No Relevance Judgements, and Crowdsourced Assessments

To evaluate Information Retrieval (IR) effectiveness, a possible approac...
research
08/01/2020

Cluster-Based Information Retrieval by using (K-means)- Hierarchical Parallel Genetic Algorithms Approach

Cluster-based information retrieval is one of the Information retrieval(...
research
02/13/2022

Web-Based File Clustering and Indexing for Mindoro State University

The Web Based File Clustering and Indexing for Mindoro State University ...
research
05/03/2023

Understanding Differential Search Index for Text Retrieval

The Differentiable Search Index (DSI) is a novel information retrieval (...
research
03/06/2023

LongEval-Retrieval: French-English Dynamic Test Collection for Continuous Web Search Evaluation

LongEval-Retrieval is a Web document retrieval benchmark that focuses on...
research
05/24/2018

An experimental comparison of label selection methods for hierarchical document clusters

The focus of this paper is on the evaluation of sixteen labeling methods...

Please sign up or login with your details

Forgot password? Click here to reset