Robots Still Outnumber Humans in Web Archives, But Less Than Before

08/27/2022
by   Himarsha R. Jayanetti, et al.
0

To identify robots and humans and analyze their respective access patterns, we used the Internet Archive's (IA) Wayback Machine access logs from 2012 and 2019, as well as Arquivo.pt's (Portuguese Web Archive) access logs from 2019. We identified user sessions in the access logs and classified those sessions as human or robot based on their browsing behavior. To better understand how users navigate through the web archives, we evaluated these sessions to discover user access patterns. Based on the two archives and between the two years of IA access logs (2012 vs. 2019), we present a comparison of detected robots vs. humans and their user access patterns and temporal preferences. The total number of robots detected in IA 2012 is greater than in IA 2019 (21 requests and 18 sessions) in Arquivo.pt (2019). We found that the robots are almost entirely limited to "Dip" and "Skim" access patterns in IA 2012, but exhibit all the patterns and their combinations in IA 2019. Both humans and robots show a preference for web pages archived in the near past.

READ FULL TEXT
research
04/12/2012

Enabling Semantic Analysis of User Browsing Patterns in the Web of Data

A useful step towards better interpretation and analysis of the usage pa...
research
03/25/2011

User Modeling Combining Access Logs, Page Content and Semantics

The paper proposes an approach to modeling users of large Web sites base...
research
08/06/2021

Profiling Web Archival Voids for Memento Routing

Prior work on web archive profiling were focused on Archival Holdings to...
research
04/25/2021

Dynamic generation and refinement of robot verbalization

With a growing number of robots performing autonomously without human in...
research
05/29/2019

MementoMap Framework for Flexible and Adaptive Web Archive Profiling

In this work we propose MementoMap, a flexible and adaptive framework to...
research
04/30/2009

FaceBots: Steps Towards Enhanced Long-Term Human-Robot Interaction by Utilizing and Publishing Online Social Information

Our project aims at supporting the creation of sustainable and meaningfu...
research
03/19/2021

Improving Web API Usage Logging

A Web API (WAPI) is a type of API whose interaction with its consumers i...

Please sign up or login with your details

Forgot password? Click here to reset