Viewed broadly, information retrieval is about matching information objects against information needs. In the classical ad hoc document retrieval task, information objects are documents and information needs are expressed as keyword queries. This task has been a main focal point since the inception of the field. The past decade, however, has seen a move beyond documents as units of retrieval to other types of objects. Examples of object retrieval tasks studied at the Text REtrieval Conference (TREC) include ranking people (experts) [1, 4], blogs [10, 11], and verticals [5, 6]. Common to these tasks is that objects do not have direct representations that could be matched against the search query. Instead, they are associated with documents, which are used as a proxy to connect objects and queries. See Figure 1 for an illustration. The main question, then, is how to combine evidence from documents that are associated with a given object.
Most approaches that have been proposed for object retrieval can be categorized into two main groups of retrieval strategies: (1) object-centric methods build a term-based representation of objects by aggregating term counts across the set of documents associated with the objects; (2) document-centric methods first retrieve documents relevant to the query, then consider the objects associated with these documents. Viewed abstractly, the object retrieval task is about fusing or blending information about a given object. This fusion may happen early on in the retrieval process, on the term level (i.e., object-centric methods), or later, on the document level (i.e., document-centric methods). Using either of the two strategies, two main shared components can be distilled: the underlying term-based retrieval model (e.g., language models, BM25, DFR, etc.) and the document-object association method. Various instantiations (i.e., choice of retrieval strategy, retrieval model, and document-object associations) have been studied, but always in the context of a particular object retrieval task, see, e.g., [2, 7, 13, 9].
We show in this paper, as our main contribution, that further generalizations are possible. We present two design patterns for object retrieval, that is, general repeatable solutions that can easily emulate most previously proposed approaches. We call these design patterns to emphasize that they can be used in many different situations. The second contribution of this work is an experimental evaluation performed for three different object retrieval tasks: expert finding, blog distillation, and vertical ranking. Using standard TREC collections, we demonstrate that the early and late fusion patterns are indeed widely applicable and deliver competitive performance without resorting to any task-specific tailoring. The implementation of our models is available at http://bit.ly/ecir2017-fusion.
2 Fusion-Based Object Retrieval Methods
Object retrieval is the task of returning a ranked list of objects in response to a keyword query. We assume a scenario where objects do not have direct term-based representations, but each object is associated with one or more documents. These documents are used as a bridge between queries and objects. We present two design patterns, i.e., general retrieval strategies, in the following two subsections. Both strategies consider the relationship between a document and an object; we detail this element in Sect. 2.3
2.1 Early Fusion
According to the early fusion (or object-centric) strategy a term-based representation is created for each object. That is, the fusion happens on the term level. One can think of this approach as creating a pseudo document for each object; once those object description documents are created, they can be ranked using standard document retrieval models. We define the (pseudo) frequency of a term for an object as follows:
where is the frequency of the term in document and denotes the document-object association weight. The relevance score of an object for a given query is then calculated by summing the individual scores of the individual query terms:
where holds all parameters of the underlying retrieval model (e.g., and for BM25). For computing , any existing retrieval model can be used. Specifically, using language models with Jelinek-Mercer smoothing it is:
where is the length of the object (), is the background language model, and is the smoothing parameter. Using BM25, the term score is computed as:
where is computed as and is the average object length.
lists exiting approaches for different search tasks, which can be classified as early fusion. Due to space constraints, we only highlight one specific method for each of the object ranking tasks we consider.
|Expert finding||Profile-based |
|Blog distillation||Blogger model |
|Vertical ranking||CVV |
Examples of early fusion approaches. Notice that the aggregation happens on the term level. (Computing the log probabilities turns the product into a summation over query terms.)
2.2 Late Fusion
Instead of creating a direct term-based representation for objects, the late fusion (or document-centric) strategy models and queries individual documents, then aggregates their relevance estimates. Formally:
where expresses the document’s relevance to the query and can be computed using any existing document retrieval method, such as language models or BM25. As before, is the weight of document for the given object. The efficiency of this approach can be further improved by restricting the summation to the top-K relevant documents. Table 2 shows three exiting models for different search tasks, which can be catalogued as late fusion strategies.
2.3 Document-Object Associations
Using either the early or the late fusion strategy, they share the component , cf. Eqs. (1) and (2). This document-object association score determines the weight with which a particular document contributes to the relevance score of a given object. In this paper, we consider two simple ways for setting this weight. We introduce the shorthand notation to indicate that document is associated with object (i.e., there is an edge between and in Figure 1). According to the binary method, can take only two values: it is if and otherwise. Alternatively, the uniform method assigns the value if , where is the total number of documents associated with , and otherwise.
3 Experimental Setup
We consider three object retrieval tasks, with corresponding TREC collections. Expert finding uses the test suites of the TREC 2007 and 2008 Enterprise track [1, 4]. Objects are experts and each of them is typically associated with multiple documents. Blog distillation is based on the TREC 2007 and 2008 Blog track [10, 11]. Objects are blogs and documents are posts; each document (post) belongs to exactly on object (blog). Vertical ranking corresponds to the resource selection task of the TREC 2013 and 2014 Federated Search track [5, 6]. Objects are verticals (i.e., web sites) and documents are web pages. Table 3 summarizes the data sets used for each task.
For each task, we consider two retrieval models: language models (using Jelinek Mercer Smoothing, ) and BM25 (with and ). We further compare two models of document-object associations: binary and uniform.
|Expert finding||CSIRO (370K)||50 (2007), 77 (2008)|
|Blog distillation||Blogs06 (3.2M)||50 (2007), 50 (2008)|
|Vertical ranking||FedWeb13 (1.9M), FedWeb14 (3.6M)||50 (2013), 50 (2014)|
4 Experimental Results
The results for the expert finding, blog distillation, and vertical ranking tasks are presented in Tables 6, 6, and 6, respectively. Our main observations are the following. First, there is no preferred fusion strategy; early and late fusion both emerge as overall bests in 3-3 cases. While early fusion is clearly preferred for vertical ranking and late fusion is clearly favorable for blog distillation, a mixed picture unfolds for expert finding: early fusion performs better on one query set (2007) while late fusion wins on another (2008). The differences between the corresponding early and late fusion configurations can be substantial. Second, the main difference between binary and uniform associations is that the latter takes into account the number of different documents associated with the object, while the former does not. For expert finding and vertical ranking the binary method is clearly superior. For blog distillation, on the other hand, it is nearly always the uniform method that performs better. The difference between vertical ranking and blog distillation is especially interesting given that these two tasks have essentially identical structure, i.e., each document is associated with exactly one object (see Figure 1). Third, concerning the choice of retrieval model (LM vs. BM25), we again find that it depends on the task and fusion strategy. BM25 is superior to LM on blog distillation. For expert finding and vertical ranking, LM performs better in case of early fusion, while BM25 is preferable for late fusion.
We also include the TREC best and median results for reference comparison. In most cases, our fusion-based methods perform better than the TREC median, and on one occasion (vertical ranking, 2013) we outperform the best TREC run. Let us emphasize that we did not resort to any task-specific treatment. In the light of this, our results can be considered more than satisfactory and signify the generality of our fusion strategies.
In this paper we have presented two design patterns, early and late fusion, to the commonly occurring problem of object retrieval. We have demonstrated the generality and reusability of these solutions on three different tasks: expert finding, blog distillation, and vertical ranking. Specifically, we have considered various instantiations of these patterns using (i) language models and BM25 as the underlying retrieval model and (ii) binary and uniform document-object associations. We have found that these strategies are indeed robust and deliver competitive performance using default parameter settings and without resorting to any task-specific treatment. We have also observed that there is no single best configuration; it depends on the task and sometimes even on the particular test query set used for the task. One interesting question for future work, therefore, is how to automatically determine the configuration that should be used for a given task.
- Bailey et al.  P. Bailey, N. Craswell, A. P. de Vries, and I. Soboroff. Overview of the TREC 2007 enterprise track. In Proc. of TREC ’07, 2008.
- Balog et al.  K. Balog, L. Azzopardi, and M. de Rijke. Formal models for expert finding in enterprise corpora. In Proc. of SIGIR, pages 43–50, 2006.
- Balog et al.  K. Balog, M. de Rijke, and W. Weerkamp. Bloggers as experts: Feed distillation using expert retrieval models. In Proc. of SIGIR, pages 753–754, 2008.
- Balog et al.  K. Balog, I. Soboroff, P. Thomas, N. Craswell, A. P. de Vries, and P. Bailey. Overview of the TREC 2008 enterprise track. In Proc. of TREC ’08, 2009.
- Demeester et al.  T. Demeester, D. Trieschnigg, D. Nguyen, and D. Hiemstra. Overview of the TREC 2013 federated web search track. In Proc. of TREC ’13, 2014.
- Demeester et al.  T. Demeester, D. Trieschnigg, D. Nguyen, D. Hiemstra, and K. Zhou. Overview of the TREC 2014 federated web search track. In Proc. of TREC ’14, 2015.
- Elsas et al.  J. L. Elsas, J. Arguello, J. Callan, and J. G. Carbonell. Retrieval and feedback models for blog feed search. In Proc. of SIGIR, pages 347–354, 2008.
- Fang and Zhai  H. Fang and C. Zhai. Probabilistic models for expert finding. In Proc. of ECIR, pages 418–430, 2007.
- Macdonald and Ounis  C. Macdonald and I. Ounis. Voting techniques for expert search. Knowl. Inf. Syst., 16:259–280, 2008.
- Macdonald et al.  C. Macdonald, I. Ounis, and I. Soboroff. Overview of the TREC 2007 blog track. In Proc. of TREC ’07, 2008.
- Ounis et al.  I. Ounis, C. Macdonald, and I. Soboroff. Overview of the TREC-2008 blog track. In Proc. of TREC ’08, 2009.
- Shokouhi and Si  M. Shokouhi and L. Si. Federated Search. Found. Trends Inf. Retr., 5:1–102, 2011.
- Weerkamp et al.  W. Weerkamp, K. Balog, and M. de Rijke. Blog feed search with a post index. Inf. Retr., 14:515–545, 2011.