The Archive Query Log: Mining Millions of Search Result Pages of Hundreds of Search Engines from 25 Years of Web Archives

04/02/2023
by   Jan Heinrich Reimer, et al.
0

The Archive Query Log (AQL) is a previously unused, comprehensive query log collected at the Internet Archive over the last 25 years. Its first version includes 356 million queries, 166 million search result pages, and 1.7 billion search results across 550 search providers. Although many query logs have been studied in the literature, the search providers that own them generally do not publish their logs to protect user privacy and vital business data. Of the few query logs publicly available, none combines size, scope, and diversity. The AQL is the first to do so, enabling research on new retrieval models and (diachronic) search engine analyses. Provided in a privacy-preserving manner, it promotes open research as well as more transparency and accountability in the search industry.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/02/2015

Automatic Taxonomy Extraction from Query Logs with no Additional Sources of Information

Search engine logs store detailed information on Web users interactions....
research
05/01/2020

Studying Ransomware Attacks Using Web Search Logs

Cyber attacks are increasingly becoming prevalent and causing significan...
research
02/13/2019

Delog: A Privacy Preserving Log Filtering Framework for Online Compute Platforms

In many software applications, logs serve as the only interface between ...
research
06/09/2020

ORCAS: 18 Million Clicked Query-Document Pairs for Analyzing Search

Users of Web search engines reveal their information needs through queri...
research
08/09/2021

IntenT5: Search Result Diversification using Causal Language Models

Search result diversification is a beneficial approach to overcome under...
research
12/29/2021

Full-privacy secured search engine empowered by efficient genome-mapping algorithms

Since the 90s, keyword-based search engines have been helping people loc...
research
03/19/2021

Improving Web API Usage Logging

A Web API (WAPI) is a type of API whose interaction with its consumers i...

Please sign up or login with your details

Forgot password? Click here to reset