A novel model for query expansion using pseudo-relevant web knowledge

08/27/2019 ∙ by Hiteshwar Kumar Azad, et al. ∙ NIT Patna 0

In the field of information retrieval, query expansion (QE) has long been used as a technique to deal with the fundamental issue of word mismatch between a user's query and the target information. In the context of the relationship between the query and expanded terms, existing weighting techniques often fail to appropriately capture the term-term relationship and term to the whole query relationship, resulting in low retrieval effectiveness. Our proposed QE approach addresses this by proposing three weighting models based on (1) tf-itf, (2) k-nearest neighbor (kNN) based cosine similarity, and (3) correlation score. Further, to extract the initial set of expanded terms, we use pseudo-relevant web knowledge consisting of the top N web pages returned by the three popular search engines namely, Google, Bing, and DuckDuckGo, in response to the original query. Among the three weighting models, tf-itf scores each of the individual terms obtained from the web content, kNN-based cosine similarity scores the expansion terms to obtain the term-term relationship, and correlation score weighs the selected expansion terms with respect to the whole query. The proposed model, called web knowledge based query expansion (WKQE), achieves an improvement of 25.89 over the unexpanded queries on the FIRE dataset. A comparative analysis of the WKQE techniques with other related approaches clearly shows significant improvement in the retrieval performance. We have also analyzed the effect of varying the number of pseudo-relevant documents and expansion terms on the retrieval effectiveness of the proposed model.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 8

page 22

page 24

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Present information retrieval (IR) systems, especially search engines, need to deal with the challenging issue of satisfying a user’s needs expressed by short queries. As per recent reports sta ; key , the most frequent queries consist of one, two, or three words only azad2019query – the same as twenty years ago as reported by Lau and Horvitz lau1999patterns . While the users continue to fire short queries, the number of web pages have increased exponentially on the web merigo2018fifty . This has increased the ambiguity in finding the relevant information due to the multiple meanings/senses of the query terms, where indexers and users often do not use the same word. This is also called vocabulary mismatch problem furnas1987vocabulary . An effective strategy to resolve this issue is to use query expansion (QE) techniques. Query expansion reformulates the seed query by adding additional relevant terms with similar meaning. The selection of these expansion terms plays a crucial role in QE because only a small subset of the candidate expansion terms is actually relevant to the query liu2017multi . Current commercial search engines do a remarkably good job in interpreting these short queries, however their results can be further improved by using additional external knowledge, obtained by combining their search results, to expand the initial queries.

The data source used for mining the expansion terms plays an important role in QE. A variety of data sources have been explored for mining the expansion terms. These terms may be extracted from an entire document corpus or from a few top-ranked retrieved documents in response to the seed query. A comprehensive survey on data sources used for QE has been provided by Carpineto and Romano carpineto2012survey and Azad and Deepak azad2019query

. Broadly, such sources can be classified into four classes: (i) documents used in retrieval process (e.g., corpus), (ii) hand-built knowledge resources (e.g., WordNet

111https://wordnet.princeton.edu/, ConceptNet222http://conceptnet5.media.mit.edu/, thesaurus, ontology), (iii) external text collections and resources (e.g., Web, Wikipedia, DBpedia), and (iv) hybrid data sources (e.g., combination of two or more data sources). Among these data sources, external text collections and resources are a popular choice – even more so in the recent past – for expanding the user’s seed query lucchese2018efficient ; dalton2014entity ; bendersky2012effective ; yin2009query . This is because they cover a very diverse range of topics and are curated and regularly updated by a large number of contributors, e.g., Wikipedia333https://www.wikipedia.org/ and DBpedia444https://wiki.dbpedia.org/. In external text collections and resources, web lucchese2018efficient ; bendersky2012effective , Wikipedia dalton2014entity ; almasri2013wikipedia , DBpedia anand2015empirical , query logs yin2009query ; cui2002probabilistic , and anchor texts dang2010query ; kraft2004mining are the most common and effective data sources for QE. While external text collections like web and Wikipedia cover a diverse range of topics and are regularly updated, a key challenge is to mine these huge data sources for the candidate expansion terms.

Once a set of candidate expansion terms is determined, the weighting models are used to assess the significance of these terms in relation to the original query. Finally, a few of the candidate expansion terms are selected for inclusion in the expanded query. This selection of the final set of expansion terms is an important factor in QE because only a small set of expansion terms are concretely relevant to the seed query. Here, the selected set of expansion terms should be such that they are well related to the individual terms of the seed query (term-term relationship) as well as to the seed query as a whole.

This article focuses on query expansion using web knowledge collections from three different search engines namely: Google555https://www.google.co.in/, Bing666https://www.bing.com/, and DuckDuckGo777https://duckduckgo.com/. The Web is the most up-to-date and diversified source of information. This naturally motivates its use as a source for mining expansion terms. Further, popular commercial search engines like Google, Bing, and DuckDuckGo have reasonably mastered the complex technique to mine the Web. They can extract information that is not only relevant to the original query but also provides a rich set of terms semantically similar to the original query. Hence, this combination of commercial search engines and the Web appeals as the perfect tool to mine expansion terms. In our proposed model, we have used the pseudo-relevance feedback (PRF) to accumulate the web content of the top URLs returned by these search engines in response to the seed query. This accumulated web content is then used an external data source to mine candidate expansion terms.

Pseudo-relevance feedback is an effective strategy in QE to improve the overall retrieval performance of an IR system lv2011boosting ; xu2009query . It assumes that the top-ranked documents returned in response to the seed query are relevant for mining expansion terms. It uses these relevant ‘feedback’ documents as a source for selecting potentially related terms. In our work, we use the text content of the web pages of the top-ranked URLs returned by search engines in response to the seed query. However, the expansion terms provided by PRF-based methods may not have one-to-one relation with the original query terms, resulting in false expansion terms and causing topic drift. To address this term-relationship problem, we propose a novel web knowledge based query expansion (WKQE) approach. The proposed WKQE approach uses three modified versions of weighting techniques: (1) tf-itf, (2) k-nearest neighbor (kNN) based cosine similarity, and (3) correlation score. Among these three weighting scores, tf-itf and kNN based cosine similarity approach scores the expansion terms to determine the term-term relationship, and correlation score weighs the selected candidate expansion terms with respect to the whole query.

Experimental results show that the proposed WKQE approach produces consistently better results for a variety of queries (Query ID-126 to 176 in the FIRE dataset) when compared with the baseline and other related state-of-the-art approaches. We have also analyzed the effect on retrieval effectiveness of the proposed technique by varying the number of pseudo-relevant documents and expansion terms. The experiments were carried on the FIRE dataset using popular weighting models and evaluation metrics.

1.1 Research contributions

The research contributions of this paper are follows :

  • Data sources: This paper presents a novel web knowledge-based query expansion (WKQE) approach that uses the top pseudo relevant web pages returned by commercial search engines Google, Bing, and DuckDuckGo as data sources. While the Web is the most diversified and up-to-date source of information for mining candidate expansion terms, the commercial search engines are perfect interfaces to the Web. Based on the literature survey done by us, this seems to be the first use of PRF-based web search results as an external data source for QE.

  • Weighting method: To appropriately capture the term relationship while weighing the candidate expansion, three modified versions of weighting techniques have been proposed: (1) tf-itf (2) k-nearest neighbor (kNN) based cosine similarity, and (3) correlation score.

    A two-level strategy is employed to select the final set of expansion terms: first, an intermediate set of terms are selected on the basis of term-to-term relationship using tf-itf and kNN based cosine similarity approach, and then, the final set of terms are selected on the basis of term-to-whole query relationship using correlation score.

  • Experimental results: Experimental evaluation on the Forum for Information Retrieval Evaluation (FIRE) dataset produced an improvement of 25.89% on the MAP score and 30.83% on the GMAP score over the unexpanded queries. Retrieval effectiveness of the proposed technique was also evaluated by varying the number of pseudo-relevant documents and expansion terms.

1.2 Organization

The rest of this paper is organized as follows. Section 2 discusses related work. Section 3 describes the proposed approach. Section 4 discusses the experimental setup; describing dataset, model parameters,s and evaluation matrices in sub sections. Section 5 discusses the experimental results. Finally, we conclude in Section 6.

2 Related Work

Query expansion has a long history of literature in the field of information retrieval. It was first coined by Moron et al. maron1960relevance in the 1960s for literature indexing and searching in a mechanized library system. In 1971, Rocchio rocchio1971relevance

brought QE to spotlight through the relevance feedback method and its characterization in a vector space model. While this was the first use of relevance feedback method, Rocchio’s method is still used for QE in its original and modified forms. The availability of several standard text collections (e.g., Text Retrieval Conference (TREC)

888http://trec.nist.gov/, and Forum for Information Retrieval Evaluation (FIRE)999http://fire.irsi.res.in/) and IR platforms (e.g., Terrier101010http://terrier.org/ and Apache Lucene111111http://lucene.apache.org/) have been very instrumental in evaluating the progress in this area in a systematic way. Carpineto and Romano carpineto2012survey and Azad and Deepak azad2019query present state-of-the-art comprehensive surveys on QE. This article focuses on web based QE techniques.

In web based QE techniques, web pages lucchese2018efficient ; bendersky2012effective , Wikipedia articles azad2019new ; dalton2014entity ; almasri2013wikipedia , query logs yin2009query ; cui2002probabilistic , and anchor texts dang2010query ; kraft2004mining are the most common and effective data sources for QE. A common way to utilize the web pages, which are related to the user query, as a data source for query expansion is to use the text snippets of those web pages that are returned by web search engine(s) in response to the user query. Such a snippet is typically a brief window of text extracted by a search engine around the query term in a web page related to the user query. They are short summaries of the corresponding web pages and are expected to be highly related to the user query, and hence, a rich data source for query expansion. Sahmi et al. sahami2006web used snippets returned by Google search engine (http://www.google.com/apis) to extract semantically similar terms for QE. For each query, their approach collects snippets from the search engine and represent each snippet as a tf-idf weighted term vector. However, a widely accepted flaw of using snippets is that due to the massive scale of the web and a large number of documents in the result set, only those snippets that are extracted form the top-ranked results can be processed efficiently for improving retrieval performance. For addressing this flaw, Bollegala et al. bollegala2007measuring considered the text snippets and page counts returned by the Google search engine. Based on this, they defined various similarity scores based on page counts for finding relevant expansion terms. Another work based on snippets is by Riezler et al. riezler2008translating , where the authors have proposed a query-to-snippet translation model for improving QE. Their proposed model uses user queries and snippets of clicked results to train a machine translation model. This establishes the relationship between query and document space to resolve the lexical gap. Yin et al. yin2009query used the search engine query logs, snippets, and search result documents for QE. Their proposed method expresses the search engine query log as a bipartite query-URL graph, where query nodes are connected to the URL nodes by click edges; it reported an improvement of retrieval effectiveness by more than 10%.

With the fast growing size of the Web and an increasing use of search engines, the abundance of query logs and their ease of use have made them an important source for QE. The query logs usually contains user queries and the corresponding URLs of web pages visited by the user in response to the query results. Here, different users may submit various queries to express the same information-need. Therefore, the query can be expanded by using the wisdom of the crowd. Cui et al. cui2002probabilistic used the query logs to extract probabilistic correlations between the query terms and document terms. These correlations are further used for expanding the user’s initial query. They extended their work in cui2003query to improve upon their results when compared with QE based on pseudo relevance feedback. One of the advantages of using the query logs is that it implicitly incorporates relevance feedback. On the other hand, it has been shown by White et al. white2005study that implicit measurements are relatively good, however, their performance may not be the same for all types of users and search operations.

Based on web search query logs, two types of QE approaches are usually used. The first type extract features from the queries, stored in logs, that are related to the user’s original query, with or without making use of their respective retrieval results huang2003relevant ; yin2009query . In techniques based on the first approach, some use their combined retrieval results huang2009analyzing , while some do not (e.g., huang2003relevant ; yin2009query ). In the second type of approach, the features are extracted on relational behavior of queries and retrieval results. For example, Baeza et al. baeza2011extracting represent queries in a graph based vector space model (query-click bipartite graph) and analyze the graph constructed using the query logs. Under the second approach, the expansion terms are extracted form several approaches: through user clicks xue2004optimizing ; yin2009query ; hua2013clickage , directly from the clicked results cui2003query ; riezler2007statistical ; cao2008context , the top results from the past query terms entered by the user fitzpatrick1997automatic ; wang2007learn , and queries related with the same documents billerbeck2003query ; wang2008mining . However, the second type of approach is more widely used and has been shown to provide better results.

In the context of web-based knowledge, anchor texts can play a role similar to the user’s search queries because an anchor text to a page can serve as a brief summary of its content. Anchor texts were first used by McBryan mcbryan1994genvl for associating hyperlinks with linked pages as well as with the pages in which the anchor texts are found. Kraft and Zien kraft2004mining also used anchor texts for QE; their experimental results suggest that anchor texts can be used to improve traditional QE based on query logs. Similarly, Dang and Croft dang2010query suggested that anchor text could be an effective alternative to query logs. They demonstrated the effectiveness of QE techniques using log-based stemming through experiments on standard TREC collection dataset.

Another popular approach in the web-based knowledge is the use of Wikipedia articles, titles and hyperlinks (in-link and out-link) arguello2008document ; almasri2013wikipedia ; azad2019new for QE. Wikipedia is the largest encyclopedia available freely on the Web where the articles are regularly updated and new ones are added every day. These features make it an ideal knowledge source for QE. Recently, quite a few research works have used it for QE (e.g., li2007improving ; arguello2008document ; xu2009query ; aggarwal2012query ; almasri2013wikipedia ). Li et al. li2007improving performed an investigation using Wikipedia where they retrieved all the articles corresponding to the original query and used them as a source of expansion terms for pseudo-relevance feedback. They observed that for those queries where the general pseudo-relevance feedback failed to improve the query, Wikipedia-based pseudo-relevance feedback improved them significantly. Xu et al. xu2009query utilized Wikipedia to categorize the original query into three types: (1) ambiguous queries (queries with terms having more than one potential meaning), (2) entity queries (queries having a specific sense that cover a narrow topic), and (3) broader queries (queries having neither ambiguous nor specific meaning). They consolidated the expansion terms into the original query and evaluated these techniques using language modeling IR. Al-Shboul and Myaeng al2014wikipedia attempted to enrich the initial queries using semantic annotations in Wikipedia articles combined with phrase-disambiguation. Their experiments show better results than the relevance based language model.

3 Proposed Approach

During query expansion, the first important decision is the determination of the source for mining candidate expansion terms. The top-ranked documents retrieved in response to the initial query appeal be a good source for mining candidate expansion terms. In the context of pseudo relevance feedback, these documents form the set of pseudo-relevant documents. Our proposed approach, called web knowledge based query expansion (WKQE) and shown in Fig. 1, is a pseudo-relevant feedback based technique, where the pseudo-relevant documents consists of the web-pages of the top URLs returned by three popular search engines namely: Google, Bing, and DuckDuckGo, in response to the initial query. The motivation for doing so has already been discussed earlier in Sec. 1. The relevant terms found in the collection of these pseudo-relevant documents are used for QE. Sometimes a particular search engine may not provide the result that the user exactly intended. For example, consider the top ten search results on the query term apple as returned by the three search engines. While Google and Bing provide results interpreting apple only as a company, DuckDuckGo offers results interpreting the query term both as a company as well as the fruit. So, for diversifying the sense of expansion terms we select three popular search engines instead of just one.

Both term-to-term and term to the whole query relationships are computed for finding the most relevant in the set of candidate expansion terms. To estimate the term-to-term relationship, we weigh the expansion terms with the proposed tf-itf and kNN based cosine similarity score. To estimate the term to the whole query relationship, we weigh the expansion terms with the correlation score. As shown in Fig.

1, the proposed approach consist of five main steps: (i) retrieval of top URLs, (ii) text collection and tokenization, (iii) weighting with tf-itf, (iv) weighting with kNN-based approach, and (v) reweighting with correlation score. These steps are described next.

Figure 1: Steps involved in the proposed approach

3.1 Retrieval of top URLs

In order to expand the initial query, first of all we fired the initial query on three popular search engines namely: Google, Bing, and DuckDuckGo. After that, we extracted the web pages corresponding to the top URLs returned by each of the search engine separately. These web pages act as the set of pseudo-relevant documents for our approach. When considering the top URLs, we have excluded the URLs associated with advertising, video, and e-commerce sites. For experimental evaluation, we considered different value of as 5, 10, 15, 20, 30, 50. However, the proposed model showed the best performance for . See Sec. 5.1 for more details.

3.2 Text collection and tokenization

The entire content of the pseudo-relevant web pages, corresponding to the top URLs, is not informative. A web page usually has different types of content that are either not related to the topic of the web page or are not useful for the purpose of query expansion. Such items can be:

  • Decoration: Pictures, animations, logos, etc. for attractions or advertising purposes.

  • Navigation: Intra and inter hyperlinks to guide the user in different parts of a web page.

  • Interaction: Forms to collect user information or provide search services.

  • Other special words or paragraphs such as copyrights and contact information.

In the context of query matching and term weighting, all of the points as mentioned above are considered to be noise and can adversely affect the retrieval performance. For example, if some words from an advertisement embedded in a top-ranked pseudo-relevant web page are chosen as expansion terms, irrelevant documents containing these advertising terms can be ranked highly. Therefore, it is necessary to filter out the noisy information in web pages and retrieve only the contents that are semantically related to the initial query. The Document Object Model (DOM)121212https://www.w3.org/TR/WD-DOM/introduction.html represents the structure of an HTML page as a tag tree. DOM-based segmentation approaches have been widely used for many years. In our proposed WKQE approach, we use these page segmentation methods where the tags that can represent useful blocks in a page include (for paragraph), (for heading), (for table), (for list), etc. To extract the relevant text contents from the web pages (HTML and XML documents), we used Beautiful Soup131313https://www.crummy.com/software/BeautifulSoup/ Python library and web services.

After collecting the text content of the web pages returned by the three search engines, we created a combined corpus of all the web pages. Then, we tokenized corpus to identify individual words. For tokenisation and to remove stop words, we used the Natural Language Toolkit (NLTK)141414https://www.nltk.org/. The part of speech (POS) tags assigned by NLTK tagger were used to identify phrases and individual words. After the extraction of individual terms, we weighted these individual terms with tf-itf. The weighting of the expansion terms is described next.

3.3 Weighting with tf-itf

We weighted the individual terms with tf-itf, which is a modified version of tf-idf. This weight is a statistical measure used to evaluate how important a word in a corpus is. The term frequency (tf) measures how frequently a term occurs in a corpus. While computing tf, all terms are considered equally important. However, it is known that certain terms in a corpus, such as “is”, “of”, and “that”, may appear a lot of times but have little importance. Thus we need to weight down such frequent terms while scale up the importance of the rare ones. This is achieved by computing the inverse term frequency (itf). We weight the individual terms with tf-itf scoring function as follows:

(1)

where:
denotes the term frequency of term in the entire corpus ,
denotes the inverse term frequency of in the entire corpus ,
is the number of terms in the entire corpus , and
is the number of times term appears in the entire corpus .

After assigning this score to each of the terms in the corpus, we ranked these terms according to the scoring value. Then, we selected the top individual terms as intermediate candidate expansion terms. After this, these intermediate candidate expansion terms were re-weighted with the kNN-based cosine similarity score, which is described next.

3.4 Weighting with kNN-based approach

The kNN-based approach weights the intermediate candidate expansion terms with cosine similarity and selects the top nearest neighbor candidate expansion terms. It establishes a term-term correlation among the candidate expansion terms so that the most relevant expansion terms can be chosen. The proposed kNN-based approach is an extension of the technique given by Roy et al. roy2016using . Here, instead of computing the nearest neighbors for each query term, we computed the nearest neighbors of each intermediate candidate expansion term extracted in response to the original query. The highly similar neighbors have comparatively lower drift in terms of similarity than the terms occurring later in the list. Since the most similar terms are the strongest contenders for becoming the expansion terms, it can be assumed that these terms are also similar to each other, in addition to being similar to the query term. Based on this, we use an iterative process for sorting the expansion terms described in Algorithm 1.

0:   Set of intermediate candidate expansion terms sorted using Eq.1.        Number of nearest neighbor candidate expansion terms to be returned.        Number of terms terms to be dropped during each iteration.
0:   Set of iteratively added nearest neighbor candidate expansion terms.
1:  Initialisation: ;                                     # Number of iterations.
2:  Select having maximum score
3:  while  do
4:                                                                            # Add to .
5:                                                      # Remove the term from .
6:     Sort w.r.t. using cosine similarity score of Eqn. (2)
7:     Select having maximum score
8:     Select as the set of least scoring terms in
9:                        # Remove the set of least neighbors terms from .
10:     Select having maximum score
11:     
12:  end while
13:  Select as the set of highest scoring terms in
14:  
15:  return                            # Final set of nearest neighbor candidate expansion terms.
Algorithm 1 k-Nearest Neighbors

Algorithm 1 takes as input the sorted (using Eq. 1) set of candidate expansion terms obtained from the previous step, denoted as . First, the expansion term having maximum similarity score in is identified as (line 2 in Algorithm 1). Term acts as the nearest neighbor for the first iteration of the iterative process (lines 6-9). At the start of each iteration, the nearest neighbor is added to the set of nearest neighbors , which is initialized as empty (line 1), and removed from the set of intermediate candidate expansion terms (lines 4 and 5). The terms in are then sorted based on their proximity with computed on the basis of cosine similarity. The cosine similarity between two terms and is given as:

(2)

where:
denotes the correlation score of term with in the entire corpus and
is the weight of term ( in the document (as returned by one of the three search engines). is computed as ( is similarly defined):

(3)

where:
denotes the term frequency of in the document .
is the inverse term frequency of in the document .
is the number of distinct terms in the document , and
is the number of terms in the entire collection.

Then, the least similar neighbors are removed from the intermediate set of candidate expansion terms . This completes the first iteration. This iterative process is executed for iterations. In each iteration, the nearest neighbors list is rearranged on the basis of the nearest neighbor obtained from the previous iteration and a set of least similar neighbors is eliminated. Essentially, by following the above procedure, we are compelling the nearest neighbors to be similar to each other in addition to being similar to the original query.

A high value of may lead to query drift, while a low value essentially performs similar to the initial set of intermediate candidate expansion terms. In our proposed method we chose as the number of iterations. Finally, at the end of iterations, the top nearest neighbor candidate expansion terms are returned as the final set of nearest neighbor candidate expansion terms (lines 13 and 14). Now these intermediate candidate expansion terms are reweighted using correlation score (described next) and the top terms are chosen as the final set of expansion terms.

3.5 Reweighting with correlation score

So far a set of candidate expansion terms have been obtained, where each expansion term is strongly connected to the other individual candidate expansion terms. These terms have been allocated weights using tf-itf and the kNN-based approach. Things done so far resolve the issue of semantic similarity between term-to-term relationship. However, this may not accurately reflect the relationship of an expansion term to the query as a whole. For example, while the word “program” may be highly associated with the word “computer”, use of this association for selecting candidate expansion terms may work well for some queries such as “Java program” and “application program” but not for others such as “space program”, “TV program”, and “government program”. This problem has been analyzed in Bai et al. bai2007using . To address this language ambiguity problem, we use a weighting scheme called correlation score. A similar approach has been suggested in Xu and Croft xu1996query , Cui et al. cui2003query , Sun et al. sun2006mining , and, Azad and Deepak azad2019new . This approach extends the term-to-term association methods described previously in Sections 3.3 and 3.4. In this approach we used term-to-term correlation and computed the correlation score of a given candidate expansion term to each query term. We then combined the found score to find the correlation to the initial query .

The correlation score is defined as follows. Let be the original query having individual terms s and let be a candidate expansion term. Then, the correlation score of with , denoted , is computed as:

(4)

where:
is the correlation (similarity) score between the candidate expansion term and the query term and
is the weight of term () in the document .
The weight of the candidate expansion term in the document , denoted ( is similarly defined), is computed as:

(5)

where:
denotes the term frequency of in the document .
is the inverse term frequency of in the document .
is the number of distinct terms in the document , and
is the number of terms in the entire collection.

After assigning the correlation score to candidate expansion terms, we collect the top terms as the final set of candidate expansion terms.

4 Experimental Setup

This section discusses the evaluation of the proposed WKQE approach. Section 4.1 describes the dataset used, followed by discussion of model parameters in Sec. 4.2. Section 4.3 describes the evaluation metrics used.

4.1 Dataset

To evaluate the proposed technique, the experiments were carried out on a large number of queries (or, Topics) from the well-known benchmark FIRE151515http://fire.irsi.res.in/ dataset. As queries in the real life are short, we have used only the title field of all queries. Table 1 shows the details of the FIRE test collections used in our investigation. These datasets consist of a very large set of documents on which IR is done, a set of queries (called topics), and the right answers (called relevance judgments) stating relevance of documents to the corresponding topic(s). Specifically, the FIRE ad hoc dataset consists of a large collection of newswire articles from two sources namely The Telegraph161616https://www.telegraphindia.com/ and BDnews24171717https://bdnews24.com/ provided by Indian Statistical Institute Kolkata, India.

Corpus Disk / Source Size # of docs Query ID
FIRE FIRE 2011 (English) 1.76 GB 3,92,577 126 - 175
Table 1: Details of experimental corpora

4.2 Model parameters

In order to investigate the optimal value of parameters, we have explored different numbers of top ranked feedback documents, i.e., , from the three search engines (Google, Bing, and DuckDuckGo). We found that our proposed model performed best for the top 20 feedback documents, hence, we chose the top 20 feedback documents to expand the initial query in our experiments. We also explored different numbers of expansion terms, i.e., , to evaluate the model performance. Our proposed model performed best for the top 15 candidate expansion terms, hence, we chose the top 15 candidate expansion terms to reformulate the original query in our experiments.

We used the TERRIER181818http://terrier.org/ retrieval system for our all experimental evaluation. We used the title field of the topics in the test collections. For indexing the documents, first stopwords were removed and then Porter’s Stemmer was used for stemming the documents. All experimental evaluations are based on the unigram word assumption, i.e., all documents and queries in the corpus are indexed using single terms. We did not use any phrase or positional information. To compare the effectiveness of our expansion technique, we used the following weighting models: the BM25 model of Okapi robertson1996okapi , IFB2 a probabilistic divergence from randomness (DFR) model amati2002probabilistic , Laplace’s law of succession I(n)L2 good1965estimation , Divergence from Independence model DPH amati2008fub , Log-logistic DFR model LGD clinchant2010information , and Standard tf-idf model. The Parameters for these models were set to the default values in TERRIER.

4.3 Evaluation Metrics

We evaluated the results on standard evaluation metrics: MAP (Mean Average Precision), GM_MAP (Geometric Mean Average Precision), F-Measure, P@5, P@10, P@20, P@30 (P@

denotes precision at top ranks), bpref (binary preference), and the overall recall (number of relevant documents retrieved). The evaluation metric MAP reflects the overall performance of the system, P@5 measures the precision over the top 5 documents retrieved, and bpref measures a preference relation about how many documents judged as relevant are ranked before the documents judged as irrelevant. Additionally, we have reported the percentage improvement in MAP over the baseline (non expanded query) for each expansion method and other related methods. We have also investigated the retrieval effectiveness of the proposed technique with number of expansion terms.

5 Evaluation Results

The objective of our experiments is to explore the effectiveness of the proposed Web Knowledge based QE (WKQE) approach and to compare it with the baseline as well as existing state-of-the-art methods on popular weighting models and evaluation metrics. The proposed WKQE approach can be categorised into seven different techniques, namely GQE (Google-based query expansion), BQE (Bing-based query expansion), DQE (DuckDuckGo-based query expansion), GBQE (Google-Bing-based query expansion), GDQE (Google-DuckDuckGo-based query expansion), BDQE (Bing-DuckDuckGo-based query expansion), and GBDQE (Google-Bing-DuckDuckGo-based query expansion). We compared each approach with the baseline (unexpanded query) as well as with the existing state-of-the-art methods. We found that the proposed GBDQE approach gives the best results compared to other approaches.

Table 28 shows the comparative retrieval performance of the proposed WKQE approach using popular weighting models and with respect to evaluation metrics, namely MAP, GM_MAP, P@10, P@20, P@30, and relevant return. The table shows that the proposed WKQE approach is compatible with the existing popular weighting models and significantly improves upon the retrieval performance over the unexpanded query. It also shows the relative percentage improvements (within parentheses) of various standard evaluation metrics measured against the unexpanded query. In all the cases, the MAP improvement is more than 4.84% and the maximal improvement achieved by our proposed QE technique is up to 25.89%. Based on the results presented in Tables 28, we can say that in the context of all evaluation parameters, the proposed QE technique performs well with all weighting models with respect to the baseline approach of unexpanded query.

Table 2 presents the comparative analysis of QE using Google alone (GQE) on different popular weighting models with top 15 expansion terms. In the best case, GQE improved the MAP up to 22.35% and GM_MAP up to 29.08%. The improvement over the precision reaches its its best value of 25.14% on the top 10 feedback documents. Overall, GQE produced the best results among the individual query expansion techniques (i.e., GQE, BQE, and DQE), even better than some of the combined QE techniques (e.g., BDQE and GDQE).

Model Performance Without Query Expansion
Method MAP GM_MAP P@10 P@20 P@30 #rel_ret
IFB2 0.2765 0.1907 0.3660 0.3560 0.3420 2330
LGD 0.2909 0.1974 0.4100 0.3710 0.3420 2309
I(n)L2 0.2979 0.2023 0.4280 0.3900 0.3553 2322
DPH 0.3133 0.2219 0.4540 0.4040 0.3653 2338
TF_IDF 0.3183 0.2261 0.4560 0.4010 0.3707 2340
BM25 0.3163 0.2234 0.4600 0.3970 0.3660 2343
Model Performance With QE using Google alone
IFB2 0.3383 (22.35%) 0.2427 (27.27%) 0.4580 (25.14%) 0.4240 (19.10%) 0.4013 (17.34%) 2558 (9.79%)
LGD 0.3530 (21.34%) 0.2548 (29.08%) 0.4680 (14.15%) 0.4330 (16.71%) 0.4080 (19.30%) 2550 (10.44%)
I(n)L2 0.3492 (17.22%) 0.2518 (24.47%) 0.4780 (11.68%) 0.4270 (9.49%) 0.4000 (12.58%) 2566 (10.51%)
DPH 0.3622 (15.6%) 0.2650 (19.42%) 0.4820 (6.17%) 0.4420 (9.4%) 0.4200 (14.97%) 2572 (10.0%)
TF_IDF 0.3573 (12.25%) 0.2620 (15.88%) 0.4580 (0.44%) 0.4360 (8.73%) 0.4133 (11.49%) 2556 (9.23%)
BM25 0.3549 (12.2%) 0.2601 (16.43%) 0.4620 (0.43%) 0.4350 (9.57%) 0.4060 (10.93%) 2555 (9.05%)
Table 2: Comparison of QE using Google alone (GQE) on popular models with top 15 expansion terms on the FIRE Dataset. The best result for each method have been highlighted in bold.

Table 3 shows the corresponding comparative analysis of QE using Bing alone (BQE). Here, MAP is improved up to 21.16% and GM_MAP up to 27.63%. For BM25 weighting model in particular, it has been shown that the value of precision for top 10 retrieval (P@10) is reduced by 1.32%. The main reason behind the P@10 reduction is the low availability of the expansion terms in the top 10 retrieval documents. Overall BQE showed better retrieval performance in comparison to DQE, BDQE, and GDQE.

Model Performance Without Query Expansion
Method MAP GM_MAP P@10 P@20 P@30 #rel_ret
IFB2 0.2765 0.1907 0.3660 0.3560 0.3420 2330
LGD 0.2909 0.1974 0.4100 0.3710 0.3420 2309
I(n)L2 0.2979 0.2023 0.4280 0.3900 0.3553 2322
DPH 0.3133 0.2219 0.4540 0.4040 0.3653 2338
TF_IDF 0.3183 0.2261 0.4560 0.4010 0.3707 2340
BM25 0.3163 0.2234 0.4600 0.3970 0.3660 2343
Model Performance With QE using Bing alone
IFB2 0.3350 (21.16%) 0.2434 (27.63%) 0.4360 (19.12%) 0.4140 (16.29%) 0.4013 (17.34%) 2466 (5.84%)
LGD 0.3428 (17.84%) 0.2376 (20.36%) 0.4520 (10.24%) 0.4360 (17.52%) 0.4027 (17.75%) 2436 (5.50%)
I(n)L2 0.3355 (12.62%) 0.2342 (15.77%) 0.4480 (4.67%) 0.4170 (6.92%) 0.3993 (12.38%) 2431 (4.69%)
DPH 0.3497 (11.62%) 0.2477 (11.63%) 0.4640 (2.20%) 0.4440 (9.90%) 0.4133 (13.14%) 2451 (4.83%)
TF_IDF 0.3482 (9.39%) 0.2472 (9.33%) 0.4660 (2.19%) 0.4410 (9.97%) 0.4173 (12.57%) 2450 (4.70%)
BM25 0.3434 (8.57%) 0.2421 (8.37%) 0.4540 (1.32%) 0.4340 (9.32%) 0.4073 (11.28%) 2442 (4.22%)
Table 3: Comparison of QE using Bing alone (BQE) on popular models with top 15 expansion terms on the FIRE Dataset. The best result for each method have been highlighted in bold.

Table 4 shows the corresponding results for DuckDuckGo-based QE (DQE). The DQE improved the MAP up to 13.03% and GM_MAP up to 13.06%. The DQE showed best retrieval effectiveness with the top 10 expansion terms instead of the top 15 expansion terms used in the other proposed expansion techniques. Overall it showed less improvement when compared to the other proposed expansion techniques.

Model Performance Without Query Expansion
Method MAP GM_MAP P@10 P@20 P@30 #rel_ret
IFB2 0.2765 0.1907 0.3660 0.3560 0.3420 2330
LGD 0.2909 0.1974 0.4100 0.3710 0.3420 2309
I(n)L2 0.2979 0.2023 0.4280 0.3900 0.3553 2322
DPH 0.3133 0.2219 0.4540 0.4040 0.3653 2338
TF_IDF 0.3183 0.2261 0.4560 0.4010 0.3707 2340
BM25 0.3163 0.2234 0.4600 0.3970 0.3660 2343
Model Performance With QE using DuckDuckGo alone
IFB2 0.3066 (10.89%) 0.2156 (13.06%) 0.4340 (18.58%) 0.3980 (11.80%) 0.3747 (9.56%) 2491 (6.90%)
LGD 0.3288 (13.03%) 0.2229 (12.92%) 0.4560 (11.22%) 0.4110 (10.78%) 0.3907 (14.24%) 2470 (6.97%)
I(n)L2 0.3297 (10.67%) 0.2293 (13.35%) 0.4480 (4.67%) 0.4240 (8.72%) 0.3873 (9.00%) 2495 (7.45%)
DPH 0.3380 (7.88%) 0.2359 (6.31%) 0.4580 (0.88%) 0.4290 (6.19%) 0.4007 (9.69%) 2498 (6.84%)
TF_IDF 0.3344 (5.05%) 0.2352 (4.02%) 0.4480 (1.79%) 0.4260 (6.23%) 0.4000 (7.90%) 2489 (6.37%)
BM25 0.3316 (4.84%) 0.2343 (4.88%) 0.4420 (4.07%) 0.4250 (7.05%) 0.3947 (7.84%) 2499 (6.66%)
Table 4: Comparison of QE using DuckDuckGo alone (DQE) on popular models with top 10 expansion terms on the FIRE Dataset. The best result for each method have been highlighted in bold.

Table 5 shows the comparative results for the combined approach involving Bing and DuckDuckGo, called Bing-DuckDuckGo-based QE (BDQE). The BDQE improved the MAP up to 17.53% and GM_MAP up to 22.54%. The BDQE technique shows better retrieval effectiveness in comparison with the DQE technique. However, it failed to improve the retrieval effectiveness when compared to other combined QE techniques.

Model Performance Without Query Expansion
Method MAP GM_MAP P@10 P@20 P@30 #rel_ret
IFB2 0.2765 0.1907 0.3660 0.3560 0.3420 2330
LGD 0.2909 0.1974 0.4100 0.3710 0.3420 2309
I(n)L2 0.2979 0.2023 0.4280 0.3900 0.3553 2322
DPH 0.3133 0.2219 0.4540 0.4040 0.3653 2338
TF_IDF 0.3183 0.2261 0.4560 0.4010 0.3707 2340
BM25 0.3163 0.2234 0.4600 0.3970 0.3660 2343
Model Performance With QE using Bing-DuckDuckGo-based QE
IFB2 0.3221 (16.49%) 0.2278 (19.45%) 0.4540 (24.04%) 0.4200 (17.98%) 0.3913 (14.42%) 2478 (6.35%)
LGD 0.3419 (17.53%) 0.2419 (22.54%) 0.4680 (14.15%) 0.4290 (25.81%) 0.4100 (19.88%) 2465 (6.76%)
I(n)L2 0.3428 (15.07%) 0.2448 (21.01%) 0.4540 (6.07%) 0.4280 (9.74%) 0.4040 (13.71%) 2475 (6.59%)
DPH 0.3481 (11.11%) 0.2505 (12.89%) 0.4620 (1.76%) 0.4380 (8.42%) 0.4207 (15.17%) 2495 (6.72%)
TF_IDF 0.3410 (7.13%) 0.2456 (8.62%) 0.4660 (2.19%) 0.4360 (8.73%) 0.4080 (10.06%) 2476 (5.81%)
BM25 0.3382 (6.92%) 0.2431 (8.82%) 0.4680 (1.74%) 0.4320 (8.82%) 0.4020 (9.84%) 2470 (5.42%)
Table 5: Comparison of QE using Bing-DuckDuckGo-based QE (BDQE) on popular models with top 15 expansion terms on the FIRE Dataset. The best result for each method have been highlighted in bold.

Table 6 shows the comparative analysis for the combined approach involving Google and DuckDuckGo, called Google-DuckDuckGo-based QE technique (GDQE), on different weighting models with the top 15 expansion terms. The GDQE approach improved the MAP up to 21.12% and GM_MAP up to 26.11%. This proposed approach showed better retrieval effectiveness when compared to DQE and BDQE.

Model Performance Without Query Expansion
Method MAP GM_MAP P@10 P@20 P@30 #rel_ret
IFB2 0.2765 0.1907 0.3660 0.3560 0.3420 2330
LGD 0.2909 0.1974 0.4100 0.3710 0.3420 2309
I(n)L2 0.2979 0.2023 0.4280 0.3900 0.3553 2322
DPH 0.3133 0.2219 0.4540 0.4040 0.3653 2338
TF_IDF 0.3183 0.2261 0.4560 0.4010 0.3707 2340
BM25 0.3163 0.2234 0.4600 0.3970 0.3660 2343
Model Performance With QE using Google-DuckDuckGo-based QE
IFB2 0.3349 (21.12%) 0.2405 (26.11%) 0.4580 (25.14%) 0.4310 (21.07%) 0.4067 (18.92%) 2478 (6.48%)
LGD 0.3482 (19.70%) 0.2472 (25.23%) 0.4580 (11.71%) 0.4350 (17.25%) 0.4053 (18.51%) 2480 (7.40%)
I(n)L2 0.3474 (16.61%) 0.2460 (21.60%) 0.4600 (7.48%) 0.4260 (9.23%) 0.4053 (14.07%) 2494 (7.41%)
DPH 0.3557 (13.53%) 0.2581 (16.31%) 0.4780 (5.28%) 0.4400 (8.91%) 0.4193 (14.78%) 2503 (7.06%)
TF_IDF 0.3495 (9.80%) 0.2524 (11.63%) 0.4720 (3.51%) 0.4220 (5.24%) 0.4147 (11.87%) 2497 (6.71%)
BM25 0.3467 (9.61%) 0.2499 (11.86%) 0.4680 (1.74%) 0.4270 (7.56%) 0.4087 (11.67%) 2495 (6.49%)
Table 6: Comparison of QE using Google-DuckDuckGo-based QE (GDQE) on popular models with top 15 expansion terms on the FIRE Dataset. The best result for each method have been highlighted in bold.

Table 7 shows the corresponding results for the combined approach involving Google and Bing, called Google-Bing-based QE technique (GBQE). It improved the MAP up to 22.64% and GM_MAP up to 29.16%. This GBQE technique showed better retrieval effectiveness in comparison to all the proposed techniques, except GBDQE.

Model Performance Without Query Expansion
Method MAP GM_MAP P@10 P@20 P@30 #rel_ret
IFB2 0.2765 0.1907 0.3660 0.3560 0.3420 2330
LGD 0.2909 0.1974 0.4100 0.3710 0.3420 2309
I(n)L2 0.2979 0.2023 0.4280 0.3900 0.3553 2322
DPH 0.3133 0.2219 0.4540 0.4040 0.3653 2338
TF_IDF 0.3183 0.2261 0.4560 0.4010 0.3707 2340
BM25 0.3163 0.2234 0.4600 0.3970 0.3660 2343
Model Performance With QE using Google-Bing-based QE
IFB2 0.3391 (22.64%) 0.2463 (29.16%) 0.4600 (25.68%) 0.4260 (19.66%) 0.4065 (18.86%) 2561 (9.91%)
LGD 0.3539 (21.66%) 0.2535 (28.41%) 0.4700 (14.63%) 0.4360 (17.52%) 0.4092 (19.65%) 2559 (10.83%)
I(n)L2 0.3501 (17.52%) 0.2522 (24.67%) 0.4790 (11.92%) 0.4291 (10.02%) 0.4021 (13.17%) 2568 (10.59%)
DPH 0.3631 (15.90%) 0.2672 (20.41%) 0.4832 (6.43%) 0.4429 (9.63%) 0.4227 (15.71%) 2575 (9.90%)
TF_IDF 0.3577 (12.38%) 0.2635 (16.54%) 0.4595 (0.77%) 0.4372 (9.03%) 0.4142 (11.73%) 2562 (9.49%)
BM25 0.3554 (12.36%) 0.2621 (17.32%) 0.4640 (0.87%) 0.4359 (9.80%) 0.4071 (11.23%) 2559 (9.22%)
Table 7: Comparison of QE using Google-Bing-based QE (GBQE) on popular models with top 15 expansion terms on the FIRE Dataset. The best result for each method have been highlighted in bold.

Table 8 shows the comparative analysis of the query expansion approach by combining all the three search engines, called Google-Bing-DuckDuckGo-based QE technique (GBDQE). It improved the MAP up to 25.89% and GM_MAP up to 30.83%. The proposed GBDQE approach showed the best retrieval performance in comparison to all the other proposed expansion techniques.

Model Performance Without Query Expansion
Method MAP GM_MAP P@10 P@20 P@30 #rel_ret
IFB2 0.2765 0.1907 0.3660 0.3560 0.3420 2330
LGD 0.2909 0.1974 0.4100 0.3710 0.3420 2309
I(n)L2 0.2979 0.2023 0.4280 0.3900 0.3553 2322
DPH 0.3133 0.2219 0.4540 0.4040 0.3653 2338
TF_IDF 0.3183 0.2261 0.4560 0.4010 0.3707 2340
BM25 0.3163 0.2234 0.4600 0.3970 0.3660 2343
Model Performance With QE using Google-Bing-DuckDuckGo-based QE
IFB2 0.3481 (25.89%) 0.2495 (30.83%) 0.4610 (25.96%) 0.4289 (20.48%) 0.4092 (19.65%) 2591 (11.20%)
LGD 0.3552 (22.10%) 0.2539 (28.62%) 0.4713 (14.95%) 0.4392 (18.38%) 0.4099 (19.85%) 2563 (11.11%)
I(n)L2 0.3519 (18.13%) 0.2531 (25.11%) 0.4810 (12.38%) 0.4309 (10.49%) 0.4052 (14.04%) 2572 (10.76%)
DPH 0.3640 (16.18%) 0.2681 (20.82%) 0.4841 (6.63%) 0.4438 (9.85%) 0.4238 (16.01%) 2581 (10.39%)
TF_IDF 0.3583 (12.57%) 0.2649 (17.16%) 0.4611 (1.11%) 0.4381 (9.25%) 0.4157 (12.14%) 2566 (9.66%)
BM25 0.3569 (12.84%) 0.2638 (18.08%) 0.4651 (1.10%) 0.4367 (10.00%) 0.4082 (11.53%) 2568 (9.60%)
Table 8: Comparison of QE using Google-Bing-DuckDuckGo-based QE (GBDQE) on popular models with top 15 expansion terms on the FIRE Dataset. The best result for each method have been highlighted in bold.

Figure 2 compares all the proposed WKQE techniques in terms of MAP, bpref, and F-Measure with baseline (unexpanded query). IFB2 weighting model was used here as the baseline. It can be observed that all WKQE techniques achieved a significant improvement over baseline, while in particular GBDQE outperformed other WKQE techniques. GBDQE improved the MAP by 25.89%, bpref by 24.09%, and F-measure by 47.25% over the baseline on the FIRE dataset. Being the best case, we have chosen the GBDQE technique for comparative analysis with other state-of-the-art approaches.

Figure 2: Comparative analysis of WKQE techniques with baseline.

Figure 3

shows the comparative analysis of the precision-recall curve of WKQE techniques with the baseline (i.e., unexpanded query). IFB2 weighting model was used here as baseline. This graph plots the interpolated precision of an IR system using 11 standard cut-off values from the recall levels, i.e, {0, 0.1, 0.2, 0.3, …,1.0}. These graphs showing average plot of retrieval results are widely used to evaluate IR systems that return ranked documents. Comparisons are best made in three different recall ranges: 0 to 0.2, 0.2 to 0.8, and 0.8 to 1. These ranges characterize high precision, middle recall, and high recall performance respectively. We can see that the proposed WKQE techniques show significant improvement over the baseline.

Figure 3: Comparative analysis of precision-recall curve of the proposed WKQE techniques with baseline.

Graphs in Fig. 4 show the improvement in retrieval results of GBDQE technique when compared with the initial unexpanded queries. As indicated in the graph legend, (Ex) denotes the performance with query expansion, while no parenthesis denotes unexpanded query. The P-R curves show the effectiveness of the proposed GBDQE technique with all the popular weighting models. Among all the weighting models, the IFB2 weighting model provides the best retrieval performance with the proposed GBDQE technique.

(a)
(b)
(c)
(d)
(e)
(f)
Figure 4: Comparative analysis of the Precision-Recall curve of GBDQE technique using popular weighting models on the FIRE dataset. In the legend, (Ex) denotes the performance with query expansion, while no parenthesis denotes unexpanded query.

Graphs in the Figure 5 compare the GBDQE technique with the unexpanded queries in terms of MAP, bpref, and P@5 with various weighting models on the FIRE dataset. Here, MAP shows the overall performance of the GBDQE technique, P@5 measures the precision over the top 5 documents retrieved, and bpref measures a preference relation about how many documents judged as relevant are ranked before the documents judged as irrelevant.

(a)
(b)
(c)
Figure 5: Comparative analysis of GBDQE technique in terms of MAP, bpref, and P@5 with various weighting models on the FIRE dataset.

After evaluating the performance of the proposed QE technique on several popular evaluation metrics, it can be concluded that the proposed WKQE techniques perform well with popular weighting models on several evaluation parameters. Therefore, the proposed WKQE techniques are effective in improving information retrieval results.

We have also compared our approach with related state-of-the-art works of Parapar et al. parapar2014score and Singh and Saran singh2017term . Parapar et al. parapar2014score presented an approach to minimize the number of non-relevant documents in the pseudo-relevant set. These unwanted documents adversely affect the selection of expansion terms. To automatically determine the number of documents to be selected for the pseudo-relevant set for each query, they studied the score distributions in the initial retrieval (i.e., documents retrieved in response to the initial query). The goal of their study was to come-up with a threshold score to differentiate between relevant and non-relevant documents. Singh and Saran singh2017term method combines the co-occurrence, context window, and semantic similarity based approaches to select the best expansion terms for query expansion. It uses the WordNet-based semantic similarity approach for ranking of expanded terms. The Evaluation of their approach shows a significant improvement over baseline. Finally, their paper suggests the use of context window based query expansion (CWBQE), co-occurrence and semantic based query expansion (CSBQE), and context window and semantic based query expansion (CWSBQE) to improve retrieval effectiveness of an information retrieval system.

Table 9 presents a comparison of the proposed WKQE techniques with Parapar et al. parapar2014score and Singh and Saran singh2017term models in terms of mean average precision with the top 15 expansion terms on the FIRE dataset. We can observe that the retrieval effectiveness of the proposed GBDQE technique is better than Parapar et al.’s and Singh and Saran’s model. Although, Parapar et al.’s and Singh and Saran’s models perform well in comparison to the QE using DuckDuckGo alone (DQE).

Data Set Methods MAP
FIRE Baseline (IFB2) 0.2765
Parapar et al. model parapar2014score 0.3178 (14.74%)
Singh and Saran model singh2017term 0.3286 (18.84%)
GDQE (Proposed) 0.3349 (21.12%)
BQE (Proposed) 0.3350 (21.16%)
GQE (Proposed) 0.3383 (22.35%)
GBQE (Proposed) 0.3391 (22.64%)
GBDQE (Proposed) 0.3481 (25.89%)
Table 9: Comparative analysis of the proposed WKQE techniques with Parapar et al. and Singh & Saran’s models in terms of MAP values.

Figure 6 compares the GBDQE technique in terms of MAP, GM_MAP, F-Measure, and P@10 with baseline (IFB2), Parapar et al.’s, and, Singh and Saran’s model. It can be clearly seen that the proposed GBDQE techniques achieves a significant improvement over Parapar et al.’s and Singh and Saran’s technique.

Figure 6: Comparative analysis of GBDQE technique with baseline and other related approaches.

5.1 Performance variation with the number of pseudo-relevant documents

Owing to the high density of the relevant documents in the top-ranked documents, it may be intuitive to deduce that a fewer number of top-ranked documents may prove sufficient for query expansion as far as retrieval performance is concerned. However, as evident from the experimental results shown in Table 10, this may not always be true.

Table 10 depicts the MAP value of the proposed model for each of the considered weighting methods. It can be clearly observed that for all the considered methods, the retrieval performance of the proposed model increases as the number of pseudo-relevant documents is increased up to 20 (25 for I(n)L2 method). This can be attributed to the fact that the lesser ranked pseudo relevant documents also contain relevant expansion terms. For instance, selecting a very small number of pseudo-relevant documents for Synonymy or Polysemy types of queries may produce very bad results in terms of retrieval performance. This can be attributed to the fact that the documents relevant to these queries may not exist in the considered subset of the pseudo-relevant documents. Moreover, with very few relevant documents, the IR system may not have enough information to extract all possible relevant expansion terms.

Further increment in the number of pseudo-relevant documents degrades the performance of the proposed model due to the addition of irrelevant terms from the lower ranked documents. This observation was also made in roy2019estimating . In summary, choosing 20 pseudo relevant strikes the best balance between choosing relevant and irrelevant expansion terms.

Model Performance vs. Pseudo-relevant Documents
Method 5 10 15 20 25 30 50
IFB2 0.3047 0.3168 0.3374 0.3481 0.3364 0.3281 0.3102
LGD 0.3066 0.3170 0.3396 0.3552 0.3472 0.3214 0.3113
I(n)L2 0.3111 0.3274 0.3401 0.3519 0.3521 0.3304 0.3203
DPH 0.3198 0.3311 0.3523 0.3640 0.3501 0.3224 0.3174
Tf-idf 0.3202 0.3374 0.3501 0.3583 0.3498 0.3289 0.3212
BM25 0.3217 0.3301 0.3472 0.3569 0.3507 0.3296 0.3109
Table 10: Effect of the number of Pseudo-relevant Documents on the performance (MAP) of proposed GBDQE technique on the FIRE Dataset. The best result for each method have been highlighted in bold.

5.2 Performance Variation with Number of Expansion Terms

There are different points of views on the number of expansion terms to be chosen; the number of expansion terms can vary from one-third of the expansion terms to all terms azad2019query . Although, it might not be realistic to use all of the expansion terms, a small set of expansion terms is usually better than a large set of expansion terms due to noise reduction salton1990improving . A limited number of expansion terms may also be important to reduce the response time, especially for a large corpus. However, several studies observed that the number of expansion terms is of low relevance and it varies from query to query billerbeck2003query ; billerbeck2004questioning ; cao2008context .

In our experimental results, we show the model performance with top 15 expansion terms because it gives better retrieval performance compared to the other number of terms. We also did experiments by varying the number of expansion terms from 5 to 50 in our proposed model; Table 11 shows the corresponding results. The results show that the variation in performance is limited by varying the weighting methods and the number of expansion terms. It can be clearly observed that initially the retrieval performance of the proposed model increases with an increase in the number of expansion terms, the improvement continues up to 15 (20 for LGD) expansion terms, and any subsequent increase in the number of expansion terms adversely affects the retrieval performance. Based on this, selecting top 15 expansion terms seems to give the best results for our proposed approach.

Model Performance vs. Expansion terms
Method 5 10 15 20 25 30 50
IFB2 0.3067 0.3330 0.3481 0.3361 0.3211 0.3166 0.2983
LGD 0.3365 0.3423 0.3552 0.3561 0.3321 0.3283 0.3141
I(n)L2 0.3190 0.3375 0.3519 0.3448 0.3302 0.3211 0.3079
DPH 0.3462 0.3576 0.3640 0.3559 0.3495 0.3410 0.3321
Tf-idf 0.3466 0.3511 0.3583 0.3498 0.3376 0.3351 0.3227
BM25 0.3497 0.3461 0.3569 0.3463 0.3351 0.3211 0.3191
Table 11: Effect of the number of expansion terms on the performance (MAP) of the proposed GBDQE technique on the FIRE Dataset. The best result for each method have been highlighted in bold.

Table 12 shows some examples of initial query and the corresponding expansion terms obtained with the top three proposed approaches. It is interesting to note the difference in the set of expansion terms returned by the different approaches.

Query ID Original query Expansion terms obtained with GQE Expansion terms obtained with GBQE Expansion terms obtained with GBDQE
135 India’s agriculture-friendly central budget India, rs, government, farmers, scheme, union, minister, sector, finance, agricultural, tax, development, etc. Budget, india, government, rs, tax, agriculture, finance, farmers, union, minister, lakh, rural, sector, etc. Government, tax, finance, agriculture, union, minister, farmers, indian, health, rural, fiscal, etc.
140 Search for life and water in space Space, nasa, earth, scientists, science, ice, surface, moon, image, martian, search, planet, etc. Life, mars, earth, space, surface, science, planet, search, nasa, liquid, scientists, solar, etc. Life, water, mars, earth, space, science, planets, nasa, surface, planet, search, moon, scientists, solar, etc.
155 Attack on the Taj in Mumbai India, attacks, hotel, november, taj, indian, terrorists, police, people, pakistan, mahal, terror, etc. Taj, hotel, india, 2008, attacks, november, terrorists, indian, people, mahal, police, terror, palace, etc. Hotel, india, 2008, november, attack, indian, terrorist, people, mahal, police, palace, terror, etc.
161 George Bush ’s anti-terrorism operations War, bush, president, united, people, iraq, states, terrorism, world, terror, military, american, security, freedom, etc. War, bush, president, united, military, states, iraq, terrorism, terror, george, american, terrorist, world, security, administration, etc. War, president, united, states, military, iraq, american, terrorism, world, security, people, terror, policy, etc.
Table 12: Expansion terms obtained with the top three proposed approaches for selected queries on the FIRE dataset

6 Conclusion

This paper has introduced a novel Web Knowledge based Query Expansion (WKQE) approach that considers expansion terms from the top pseudo-relevant documents collected from different search engines. Although there is no perfect solution for the vocabulary mismatch problem, the proposed WKQE approach is capable to overcome the primary limitations for term-term and term-to-query relationship. To explore the relationship between the query term to the expanded terms, WKQE approach employs a three level weighting strategy to select relevant expansion terms. First, a tf-itf weighting scheme was used to score the individual terms of the web content, then kNN-based cosine similarity was used to identify the k-nearest neighbor expansion terms of the initial query, and lastly the Correlation score was used to correlate the expansion terms with the whole query. The combination of the web content extracted from three different search engines works well for selecting expansion terms and the proposed WKQE techniques performed well with these terms on several weighting models. It also yielded better results when compared to the baseline and with the other related state-of-the-art methods. This article also investigated the retrieval performance of the proposed technique with varying number of pseudo-relevant documents collected from different search engines and expansion terms. The result based on multiple evaluation metrics and popular weighting models on the FIRE dataset demonstrated the effectiveness of our proposed QE technique in the field of information retrieval.

Acknowledgements

Akshay Deepak has been awarded Young Faculty Research Fellowship (YFRF) of Visvesvaraya PhD Programme of Ministry of Electronics & Information Technology, MeitY, Government of India. In this regard, he would like to acknowledge that this publication is an outcome of the R&D work undertaken in the project under the Visvesvaraya PhD Scheme of Ministry of Electronics & Information Technology, Government of India, being implemented by Digital India Corporation (formerly Media Lab Asia).

References

  • (1) Statista, Average number of search terms for online search queries in the united states as of august 2017, https://www.statista.com/statistics/269740/number-of-search-terms-in-internet-research-in-the-us/.
  • (2) Keyword, Query size by country, https://www.keyworddiscovery.com/keyword-stats.html.
  • (3) H. K. Azad, A. Deepak, Query expansion techniques for information retrieval: a survey, Information Processing & Management 56 (5) (2019) 1698–1735.
  • (4) T. Lau, E. Horvitz, Patterns of search: analyzing and modeling web query refinement, in: UM99 User Modeling, Springer, 1999, pp. 119–128.
  • (5) J. M. Merigó, W. Pedrycz, R. Weber, C. de la Sotta, Fifty years of information sciences: A bibliometric overview, Information Sciences 432 (2018) 245–268.
  • (6) G. W. Furnas, T. K. Landauer, L. M. Gomez, S. T. Dumais, The vocabulary problem in human-system communication, Communications of the ACM 30 (11) (1987) 964–971.
  • (7) J. Liu, C. Liu, Y. Huang, Multi-granularity sequence labeling model for acronym expansion identification, Information Sciences 378 (2017) 462–474.
  • (8) C. Carpineto, G. Romano, A survey of automatic query expansion in information retrieval, ACM Computing Surveys (CSUR) 44 (1) (2012) 1.
  • (9) C. Lucchese, F. M. Nardini, R. Perego, R. Trani, R. Venturini, Efficient and effective query expansion for web search, in: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, ACM, 2018, pp. 1551–1554.
  • (10) J. Dalton, L. Dietz, J. Allan, Entity query feature expansion using knowledge base links, in: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval, ACM, 2014, pp. 365–374.
  • (11) M. Bendersky, D. Metzler, W. B. Croft, Effective query formulation with multiple information sources, in: Proceedings of the fifth ACM international conference on Web search and data mining, ACM, 2012, pp. 443–452.
  • (12) Z. Yin, M. Shokouhi, N. Craswell, Query expansion using external evidence, in: European Conference on Information Retrieval, Springer, 2009, pp. 362–374.
  • (13) M. ALMasri, C. Berrut, J.-P. Chevallet, Wikipedia-based semantic query enrichment, in: Proceedings of the sixth international workshop on Exploiting semantic annotations in information retrieval, ACM, 2013, pp. 5–8.
  • (14) R. Anand, A. Kotov, An empirical comparison of statistical term association graphs with dbpedia and conceptnet for query expansion, in: Proceedings of the 7th Forum for Information Retrieval Evaluation, ACM, 2015, pp. 27–30.
  • (15) H. Cui, J.-R. Wen, J.-Y. Nie, W.-Y. Ma, Probabilistic query expansion using query logs, in: Proceedings of the 11th international conference on World Wide Web, ACM, 2002, pp. 325–332.
  • (16) V. Dang, B. W. Croft, Query reformulation using anchor text, in: Proceedings of the third ACM international conference on Web search and data mining, ACM, 2010, pp. 41–50.
  • (17) R. Kraft, J. Zien, Mining anchor text for query refinement, in: Proceedings of the 13th international conference on World Wide Web, ACM, 2004, pp. 666–674.
  • (18) Y. Lv, C. Zhai, W. Chen, A boosting approach to improving pseudo-relevance feedback, in: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, ACM, 2011, pp. 165–174.
  • (19) Y. Xu, G. J. Jones, B. Wang, Query dependent pseudo-relevance feedback based on wikipedia, in: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, ACM, 2009, pp. 59–66.
  • (20) M. E. Maron, J. L. Kuhns, On relevance, probabilistic indexing and information retrieval, Journal of the ACM (JACM) 7 (3) (1960) 216–244.
  • (21) J. J. Rocchio, Relevance feedback in information retrieval.
  • (22) H. K. Azad, A. Deepak, A new approach for query expansion using wikipedia and wordnet, Information Sciences 492 (2019) 147–163.
  • (23) M. Sahami, T. D. Heilman, A web-based kernel function for measuring the similarity of short text snippets, in: Proceedings of the 15th international conference on World Wide Web, AcM, 2006, pp. 377–386.
  • (24) D. Bollegala, Y. Matsuo, M. Ishizuka, Measuring semantic similarity between words using web search engines., www 7 (2007) 757–766.
  • (25) S. Riezler, Y. Liu, A. Vasserman, Translating queries into snippets for improved query expansion, in: Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1, Association for Computational Linguistics, 2008, pp. 737–744.
  • (26) H. Cui, J.-R. Wen, J.-Y. Nie, W.-Y. Ma, Query expansion by mining user logs, IEEE Transactions on knowledge and data engineering 15 (4) (2003) 829–839.
  • (27) R. W. White, I. Ruthven, J. M. Jose, A study of factors affecting the utility of implicit relevance feedback, in: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, ACM, 2005, pp. 35–42.
  • (28) C.-K. Huang, L.-F. Chien, Y.-J. Oyang, Relevant term suggestion in interactive web search based on contextual information in query session logs, Journal of the Association for Information Science and Technology 54 (7) (2003) 638–649.
  • (29) J. Huang, E. N. Efthimiadis, Analyzing and evaluating query reformulation strategies in web search logs, in: Proceedings of the 18th ACM conference on Information and knowledge management, ACM, 2009, pp. 77–86.
  • (30) R. Baeza-Yates, A. Tiberi, Extracting semantic relations from query logs, uS Patent 7,895,235 (Feb. 22 2011).
  • (31) G.-R. Xue, H.-J. Zeng, Z. Chen, Y. Yu, W.-Y. Ma, W. Xi, W. Fan, Optimizing web search using web click-through data, in: Proceedings of the thirteenth ACM international conference on Information and knowledge management, ACM, 2004, pp. 118–126.
  • (32) X.-S. Hua, L. Yang, J. Wang, J. Wang, M. Ye, K. Wang, Y. Rui, J. Li, Clickage: towards bridging semantic and intent gaps via mining click logs of search engines, in: Proceedings of the 21st ACM international conference on Multimedia, ACM, 2013, pp. 243–252.
  • (33) S. Riezler, A. Vasserman, I. Tsochantaridis, V. Mittal, Y. Liu, Statistical machine translation for query expansion in answer retrieval, in: Annual Meeting-Association For Computational Linguistics, Vol. 45, 2007, p. 464.
  • (34) H. Cao, D. Jiang, J. Pei, Q. He, Z. Liao, E. Chen, H. Li, Context-aware query suggestion by mining click-through and session data, in: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, 2008, pp. 875–883.
  • (35) L. Fitzpatrick, M. Dent, Automatic feedback using past queries: social searching?, in: ACM SIGIR Forum, Vol. 31, ACM, 1997, pp. 306–313.
  • (36) X. Wang, C. Zhai, Learn from web search logs to organize search results, in: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, ACM, 2007, pp. 87–94.
  • (37) B. Billerbeck, F. Scholer, H. E. Williams, J. Zobel, Query expansion using associated queries, in: Proceedings of the twelfth international conference on Information and knowledge management, ACM, 2003, pp. 2–9.
  • (38) X. Wang, C. Zhai, Mining term association patterns from search logs for effective query reformulation, in: Proceedings of the 17th ACM conference on Information and knowledge management, ACM, 2008, pp. 479–488.
  • (39) O. A. McBryan, Genvl and wwww: Tools for taming the web, in: Proceedings of the first international world wide web conference, Vol. 341, Geneva, 1994.
  • (40) J. Arguello, J. L. Elsas, J. Callan, J. G. Carbonell, Document representation and query expansion models for blog recommendation., ICWSM 2008 (0) (2008) 1.
  • (41) Y. Li, W. P. R. Luk, K. S. E. Ho, F. L. K. Chung, Improving weak ad-hoc queries using wikipedia asexternal corpus, in: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, ACM, 2007, pp. 797–798.
  • (42) N. Aggarwal, P. Buitelaar, Query expansion using wikipedia and dbpedia., in: CLEF (Online Working Notes/Labs/Workshop), 2012.
  • (43) B. Al-Shboul, S.-H. Myaeng, Wikipedia-based query phrase expansion in patent class search, Information retrieval 17 (5-6) (2014) 430–451.
  • (44) D. Roy, D. Paul, M. Mitra, U. Garain, Using word embeddings for automatic query expansion, arXiv preprint arXiv:1606.07608.
  • (45) J. Bai, J.-Y. Nie, G. Cao, H. Bouchard, Using query contexts in information retrieval, in: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, ACM, 2007, pp. 15–22.
  • (46) J. Xu, W. B. Croft, Query expansion using local and global document analysis, in: Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, ACM, 1996, pp. 4–11.
  • (47) R. Sun, C.-H. Ong, T.-S. Chua, Mining dependency relations for query expansion in passage retrieval, in: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, ACM, 2006, pp. 382–389.
  • (48) S. E. Robertson, S. Walker, M. Beaulieu, M. Gatford, A. Payne, Okapi at trec-4, Nist Special Publication Sp (1996) 73–96.
  • (49) G. Amati, C. J. Van Rijsbergen, Probabilistic models of information retrieval based on measuring the divergence from randomness, ACM Transactions on Information Systems (TOIS) 20 (4) (2002) 357–389.
  • (50)

    I. Good, The estimation of probabilities: An essay on modern bayesian methods, pp. xi-xii (1965).

  • (51) G. Amati, G. Amodeo, M. Bianchi, C. Gaibisso, G. Gambosi, Fub, iasi-cnr and university of tor vergata at trec 2008 blog track, Tech. rep., FONDAZIONE UGO BORDONI ROME (ITALY) (2008).
  • (52) S. Clinchant, E. Gaussier, Information-based models for ad hoc ir, in: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, ACM, 2010, pp. 234–241.
  • (53) J. Parapar, M. A. Presedo-Quindimil, Á. Barreiro, Score distributions for pseudo relevance feedback, Information Sciences 273 (2014) 171–181.
  • (54) J. Singh, A. Sharan, M. Saini, Term co-occurrence and context window-based combined approach for query expansion with the semantic notion of terms, International Journal of Web Science 3 (1) (2017) 32–57.
  • (55)

    D. Roy, D. Ganguly, M. Mitra, G. J. Jones, Estimating gaussian mixture models in the local neighbourhood of embedded word vectors for query performance prediction, Information Processing & Management 56 (3) (2019) 1026–1045.

  • (56) G. Salton, C. Buckley, Improving retrieval performance by relevance feedback, Journal of the American society for information science 41 (4) (1990) 288–297.
  • (57) B. Billerbeck, J. Zobel, Questioning query expansion: An examination of behaviour and parameters, in: Proceedings of the 15th Australasian database conference-Volume 27, Australian Computer Society, Inc., 2004, pp. 69–76.