Towards Query Logs for Privacy Studies: On Deriving Search Queries from Questions

04/04/2020
by   Asia J. Biega, et al.
0

Translating verbose information needs into crisp search queries is a phenomenon that is ubiquitous but hardly understood. Insights into this process could be valuable in several applications, including synthesizing large privacy-friendly query logs from public Web sources which are readily available to the academic research community. In this work, we take a step towards understanding query formulation by tapping into the rich potential of community question answering (CQA) forums. Specifically, we sample natural language (NL) questions spanning diverse themes from the Stack Exchange platform, and conduct a large-scale conversion experiment where crowdworkers submit search queries they would use when looking for equivalent information. We provide a careful analysis of this data, accounting for possible sources of bias during conversion, along with insights into user-specific linguistic patterns and search behaviors. We release a dataset of 7,000 question-query pairs from this study to facilitate further research on query understanding.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/07/2020

Translating Web Search Queries into Natural Language Questions

Users often query a search engine with a specific question in mind and o...
research
10/02/2015

Automatic Taxonomy Extraction from Query Logs with no Additional Sources of Information

Search engine logs store detailed information on Web users interactions....
research
02/01/2021

Automated Query Reformulation for Efficient Search based on Query Logs From Stack Overflow

As a popular Q A site for programming, Stack Overflow is a treasure fo...
research
06/17/2020

MIMICS: A Large-Scale Data Collection for Search Clarification

Search clarification has recently attracted much attention due to its ap...
research
04/03/2022

Graph Enhanced BERT for Query Understanding

Query understanding plays a key role in exploring users' search intents ...
research
08/01/2017

An Analytical Study of Large SPARQL Query Logs

With the adoption of RDF as the data model for Linked Data and the Seman...
research
06/28/2021

Revelio: ML-Generated Debugging Queries for Distributed Systems

A major difficulty in debugging distributed systems lies in manually det...

Please sign up or login with your details

Forgot password? Click here to reset