Towards an Understanding of Entity-Oriented Search Intents

02/22/2018
by   Darío Garigliotti, et al.
University of Stavanger
0

Entity-oriented search deals with a wide variety of information needs, from displaying direct answers to interacting with services. In this work, we aim to understand what are prominent entity-oriented search intents and how they can be fulfilled. We develop a scheme of entity intent categories, and use them to annotate a sample of queries. Specifically, we annotate unique query refiners on the level of entity types. We observe that, on average, over half of those refiners seek to interact with a service, while over a quarter of the refiners search for information that may be looked up in a knowledge base.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

09/02/2018

IntentsKB: A Knowledge Base of Entity-Oriented Search Intents

We address the problem of constructing a knowledge base of entity-orient...
09/23/2018

Query Understanding via Entity Attribute Identification

Understanding searchers' queries is an essential component of semantic s...
04/02/2022

Generating recommendations for entity-oriented exploratory search

We introduce the task of recommendation set generation for entity-orient...
08/28/2017

On Type-Aware Entity Retrieval

Today, the practice of returning entities from a knowledge base in respo...
12/31/2020

On the importance of functions in data modeling

In this paper we argue that representing entity properties by tuple attr...
09/01/2021

Hypergraph-of-Entity: A General Model for Entity-Oriented Search

The hypergraph-of-entity was conceptually proposed as a general model fo...
12/31/2021

A Deep Learning Approach to Integrate Human-Level Understanding in a Chatbot

In recent times, a large number of people have been involved in establis...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

A large portion of information needs in web search look for specific entities [11]. Entities are natural units for organizing information, and can provide not only more focused responses, but often immediate answers [9]. Another type of entity-bearing queries is more transaction-oriented. Either trying to book a flight or looking for tickets for an concert, just to mention two popular examples, users are often engaged to fulfill information needs by interacting with a third-party service or application. There has been an increasing focus on supporting task-based search [7], and on modeling actionable knowledge; see, e.g., the dedicated vocabulary for actions in the schema.org ontology, and the NTCIR AKG task.111http://ntcirakg.github.io/tasks.html These developments display the interest and efforts towards transforming search engines into actions-guided task completion assistants [1]. In this work, we are interested in studying one particular type of information needs, namely, entity-oriented searches. Specifically, we want to answer a question arising from this web landscape: what do entity-oriented queries ask for? Furthermore, which of those searches can be fulfilled by looking up direct answers from a knowledge base, and which would require to interact with external services?

Most entity-oriented queries consist of an entity name, complemented with context terms, i.e., refiners, to express the underlying intent of the user [11]. Examples of these queries are “the rock movies” and “london book a hotel.” Our main objective is to understand entity-related search intents by studying those refiners. Specifically, we represent refiners on the level of entity types. Just like entity types boost the disambiguation of known entities and the grouping of emerging ones [10], these type-level characterizations of entity refiners would favor knowledge abstraction and generalization. As an example, by representing with [city] any entity of the type city, we want to categorize a refiner, e.g., “rentals”, in the type-level query “[city] rentals”. Then, we categorize these type-level refiners using an intent classification scheme. Our classification scheme comprises four main categories: property, website, service, and other.

We perform this study without having direct access to past usage data or query logs. To overcome the absence of such data, we exploit query suggestions from a major search engine API. This strategy has been employed successfully in previous work for various applications [5, 3]. After acquiring query suggestions for entities of a given type, they are aggregated to extract type-level refiners. Then, for a representative sample of 50 Freebase types, we collect human annotations for those refiners with respect to the classification scheme we developed.

Our main findings show that, on average, more than a half of all unique type-level refiners correspond to interacting with external services, while over a quarter of them look for information that may be looked up in a knowledge base. Another contribution of this work is a large collection of type-level refiners, annotated with intent categories. The resources developed within this paper are made available at http://bit.ly/ecir2018-intents.

2 Related Work

Broder’s categorization of information needs is broadly accepted and is the most commonly used one for web search [4], with further refinements, e.g., in [13, 6]. We strive for a similar high-level categorization of intents, but specifically for entity-oriented search queries. Previous work has identified high-level patterns from web search queries. For example, according to Lin et al. [8]

, a query can be classified as an entity, an entity plus a refiner (e.g.,

“emma stone 2017”), a category, a category plus a refiner (e.g., “doctors in barcelona”), a website, or other sort of query. Such classification relies merely on lexico-syntactic forms and lacks a more semantically-grounded distinction.

Search intents have been studied in previous work. Reinanda et al. [12] explore entity aspects in user interaction log data. Beyond finding aspects by comparing clustering methods over refiners, they address the tasks of ranking the intents for a given entity independently from a query and recommending aspects. Unlike them, we (i) operate with individual query refiners (i.e., without clustering them together), (ii) model entity intents at the level of types, (iii) always consider entities in queries, and (iv) perform our study in the absence of search logs.

3 Approach

This section describes the process we followed for understanding entity-oriented search intents. An entity-oriented or entity-bearing query is a query that consists of an entity name possibly complemented with a refiner, usually as a suffix. Here, by entity we mean an individual with its own independent existence, uniquely identified in a knowledge base [2]. More than just a syntactic complement, a refiner is a complementary surface form expressing an underlying user intent in relation with the entity. As an example, consider the entity keens steakhouse (a restaurant) in the search query “keens steakhouse menu.” The refiner “menu” expresses the intent of reading the restaurant’s menu. To understand what these entity-bearing queries ask for, we characterize the refiners on the level of entity types, where an entity type is a semantic class that groups entities together with common characteristics. For example, one of the types of Albert Einstein in Freebase is award_winner.

Our approach, to be detailed in the next subsections, can be summarized as follows. We collect refiners for a set of prominent entities, and aggregate them across entity types to obtain type-level refiners. Next, we develop a classification scheme of intent categories, with a focus on how to fulfill the intent expressed by a type-level refiner. Finally, we annotate a representative sample of entity types with intent categories, and obtain a corpus of prominent type-level refiners assigned to those categories.

3.1 Collecting Refiners

We use the type system of Freebase. It is a two-layer categorization system, where types on the leaf level are grouped under high-level domains. Specifically, we use the latest public Freebase dump (2015-03-31), discarding domains meant for administering the Freebase service itself (e.g., base, common).

We focus on prominent entities, since in this way we benefit from observing a larger and more representative selection of information needs. As the criterion of an entity prominence, we rely on Wikistats page views.222https://dumps.wikimedia.org/other/pagecounts-ez/ This dataset registers the number of times its English Wikipedia article has been requested. We set empirically a prominence threshold of 3,000 page views per article over a span of one year (from June 2015 to May 2016). Given a Freebase type, we select it if it covers at least 100 entities with a prominence above the threshold. Applying these criteria, the selected set contains 634 types.

In a second step, we collect query suggestions from the Google Suggestions API for at most top 1,000 entities per type according to the above prominence criteria. Then, we replace the name of the entity by its type in each query suggestion. This can be viewed as getting queries where a refiner complements the type. For example, the type-level query “[travel destination] map” is obtained from all queries for popular travel destinations, e.g., “sydney map” and “paris map.” Finally, we retain only those refiners that occur in at least 5 suggestions for the given type. This leads to a total of 2,688 distinct type-level refiners for 631 types.

3.2 Classification Scheme

To address our main goal of understanding entity-related search intents, we need a suitable scheme to classify the entity intents. After a close inspection of the type-level refiners, we define the following scheme of intent categories. These categories are focused on how (and from which type of source) the information need can be fulfilled.

  • Property: The refiner looks for a specific entity property or attribute that can be looked up in a knowledge base. For example, “children” in the query “angelina jolie children“ or “opening times” in “at&t stadium opening times.”

  • Website: The refiner is about reaching a specific website or application. For example, “twitter” in the query “karpathy twitter.” This category is a rough equivalent of navigational queries in [4].

  • Service: The refiner expresses the need to interact with a service, possibly by redirecting to an external site or app. For example, “menu” in the query “keens steakhouse menu” would indicate the need for accessing to an external site for reading the restaurant’s menu. As another example, “new album” in “eric clapton new album” looks for a service to read about, or listen to, or buy the new album.

  • Other: None of the previous ones is applicable. For example, “batman” in the query “christian bale batman” serves to disambiguate the person’s role of interest.

3.3 Annotation

We need to sample a set of representative types, since it is unfeasible to annotate all types in the knowledge base. From the set of 631 types, we perform stratified sampling as follows. We sort the types by the total aggregated frequencies of refiners. We delimit 5 roughly equally-sized intervals by the splitting values of 1,500, 3,000, 6,000, and 8,500 refiners per type; we randomly pick 10 types from each interval. We annotate data for this final set of 50 representative Freebase types.

We used crowdsourcing to annotate type-level refiners with intent categories. Specifically, using the Crowdflower platform, for each annotation instance we presented workers with the query, indicating its entity type and refiner, and asked them to select one of the four intent categories. A total of 5,301 unique instances (type-level refiners) were annotated, each by at least 3 judges (5 at most, if necessary to reach a majority agreement, using dynamic judgments). We paid ¢5 per batch, comprising 11 annotation instances. We ensured quality by requiring a minimum accuracy of 80%, a minimum time of 20 seconds per batch, and a minimum confidence threshold of 0.7. For each type, we only retain an annotated refiner if at least three annotators agreed on the majority category. This leads to a total of 2,313 unique refiners.

4 Results and Analysis

Figure 1 presents the number of refiners classified per each category, for the 50 sampled types, grouped in one plot for each of the 5 intervals of the stratified sampling. Since the final set of types was sampled from types with prominent entities, this ordering, given by the number of refiners, in a way also reflects the prominence of types.

We obtain a distribution of entity intent categories per type after normalizing the frequency of each category by the total of refiners for that type. From the average proportions in these distributions, we can answer our initial questions. A 54.06% of unique entity-oriented queries are to be fulfilled by interacting with some external service or app, meanwhile, 28.6% look for direct answers from a knowledge base. Further, 5.34% of the type-level refiners represent an attempt to reach a website, while 12.08% of them do not fit into any of the previous three categories.

The types with the largest proportion of service intents are netflix_genre (with refiners, e.g., “videos,” “live”), election (“map,” “polls”), football_match (“video,” “highlights”), and music_album. The property intent category covers refiners that are of a more static nature, e.g., chemical compound (with refiners like “structural formula,” “molecular weight”), political_party (“slogan,” “president”), star (“type of star,” “temperature”), or tower (“hours,” “height”); only the first one is a very prominent type. Most of the entity types exhibit a non-empty proportion of website intents. Among all the types, this category exceeds the average proportion, e.g., for organization, business_operation, hotel and blogger. The most frequent website refiners in the whole corpus are “wikipedia,” “twitter,” “facebook,” and “youtube.” For a few types like muscle, election, belief, or medical_speciality, all in the lowest populated groups, no website refiner is present. A marginal proportion of refiners are classified as having the other intent. A few exceptional cases with large proportions of other intents are, e.g., business_operation and house (where the refiner is usually a location), or basketball_player (for which many refiners refer mostly to an NBA franchise, e.g., “lakers”). Table 1 provides additional examples for a selection of types.

Figure 1: Distributions of intent categories for the sampled types. Note that the y-axis scales differ.
Entity type Intent category
Property Website Service Other
comic_book_publisher logo, address wiki, website, twitter submissions, publishing, movies
comics
tower height, address, wiki tickets, restaurant collapse
opening hours
war deaths, results, youtube, wikipedia, video, uniforms, ap euro,
cause reddit, quizlet pictures, documentary in hindi
academic_institution logo, email, wiki, login, scholarships, ranking, baseball
notable alumni twitter, portal map, library, jobs
automotive_company stock, logo, wikipedia, website, parts, careers, india, inc
ceo, address linkedin, facebook investor relations
programming_language syntax, ide wikipedia, jobs, examples, 3, 2017
wiki, github interview questions
restaurant phone number, yelp, twitter, app, wine list, vouchers, sf, nj, nyc
owner, location tripadvisor, groupon recipes, menu prices
music_album value, cast, youtube, wikipedia, zip download, video, 2015, lp
release date amazon, imdb ukulele chords, tracklist
person son, salary, youtube, instagram, tour, quotes, sr, now,
real name snapchat photos, new album ww2
travel_destination zip code, craigslist weather radar, vacation, today, nj
train station tours, things to do
Table 1: Examples of refiners for each intent category, for each (stratified) type group.

5 Conclusions and Future Work

The study performed in this work has lead to a better understanding of what entity-oriented queries ask for. We have developed a classification scheme to categorize entity-oriented search intents and annotated a representative sample of type-level refiners using this scheme. We have found that, on average, more than a half of those are to be fulfilled by interaction with services; another large proportion of information needs look for direct answers from a knowledge base. Several lines of future work arise from our study. One of them is to develop a method for automatic intent categorization. Another direction is the clustering of refiners which express the same underlying intent. Finally, we seek to extend our approach to be able to capture tail entities and intents.

References

  • Balog [2015] K. Balog. Task-completion engines: A vision with a plan. In Proc. of the 1st International Workshop on Supporting Complex Search Tasks, 2015.
  • Balog [2017] K. Balog. Encyclopedia of Database Systems, chapter Entity Retrieval, pages 1–6. Springer New York, New York, NY, 2017.
  • Benetka et al. [2017] J. R. Benetka, K. Balog, and K. Nørvåg. Anticipating information needs based on check-in activity. In Proc. of WSDM, pages 41–50, 2017.
  • Broder [2002] A. Broder. A taxonomy of web search. SIGIR Forum, 36(2):3–10, 2002.
  • Fourney et al. [2011] A. Fourney, R. Mann, and M. Terry. Characterizing the usability of interactive applications through query log analysis. In Proc. of CHI, pages 1817–1826, 2011.
  • Jansen et al. [2008] B. J. Jansen, D. L. Booth, and A. Spink. Determining the informational, navigational, and transactional intent of web queries. Inf. Process. Manage., 44(3):1251–1266, 2008.
  • Kelly et al. [2013] D. Kelly, J. Arguello, and R. Capra. NSF workshop on task-based information search systems. SIGIR Forum, 47(2):116–127, 2013.
  • Lin et al. [2012] T. Lin, P. Pantel, M. Gamon, A. Kannan, and A. Fuxman. Active objects: Actions for entity-centric search. In Proc. of WWW, pages 589–598, 2012.
  • Mika [2013] P. Mika. Entity Search on the Web. In Proc. of WWW, pages 1231–1232, 2013.
  • Nakashole et al. [2013] N. Nakashole, T. Tylenda, and G. Weikum. Fine-grained semantic typing of emerging entities. In Proc. of ACL, pages 1488–1497, 2013.
  • Pound et al. [2010] J. Pound, P. Mika, and H. Zaragoza. Ad-hoc object retrieval in the web of data. In Proc. of WWW, pages 771–780, 2010.
  • Reinanda et al. [2015] R. Reinanda, E. Meij, and M. de Rijke. Mining, Ranking and Recommending Entity Aspects. In Proc. of SIGIR, pages 263–272, 2015.
  • Rose and Levinson [2004] D. E. Rose and D. Levinson. Understanding user goals in web search. In Proc. of WWW, pages 13–19, 2004.