You, the Web and Your Device: Longitudinal Characterization of Browsing Habits

06/19/2018
by   Luca Vassio, et al.
0

Understanding how people interact with the web is key for a variety of applications, e.g., from the design of effective web pages to the definition of successful online marketing campaigns. Browsing behavior has been traditionally represented and studied by means of clickstreams, i.e., graphs whose vertices are web pages, and edges are the paths followed by users. Obtaining large and representative data to extract clickstreams is however challenging. The evolution of the web questions whether browsing behavior is changing and, by consequence, whether properties of clickstreams are changing. This paper presents a longitudinal study of clickstreams in from 2013 to 2016. We evaluate an anonymized dataset of HTTP traces captured in a large ISP, where thousands of households are connected. We first propose a methodology to identify actual URLs requested by users from the massive set of requests automatically fired by browsers when rendering web pages. Then, we characterize web usage patterns and clickstreams, taking into account both the temporal evolution and the impact of the device used to explore the web. Our analyses precisely quantify various aspects of clickstreams and uncover interesting patterns, such as the typical short paths followed by people while navigating the web, the fast increasing trend in browsing from mobile devices and the different roles of search engines and social networks in promoting content. Finally, we contribute a dataset of anonymized clickstreams to the community to foster new studies (anonymized clickstreams are available to the public at http://bigdata.polito.it/clickstream).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/01/2017

Demystifying Mobile Web Browsing under Multiple Protocols

With the popularity of mobile devices, such as smartphones, tablets, use...
research
05/09/2019

Collecting 16K archived web pages from 17 public web archives

We document the creation of a data set of 16,627 archived web pages, or ...
research
01/26/2018

Can Common Crawl reliably track persistent identifier (PID) use over time?

We report here on the results of two studies using two and four monthly ...
research
08/12/2021

Where Did the Web Archive Go?

To perform a longitudinal investigation of web archives and detecting va...
research
03/23/2018

Fully Automated HTML and Javascript Rewriting for Constructing a Self-healing Web Proxy

Over the last few years, the complexity of web applications has increase...
research
10/18/1999

PIPE: Personalizing Recommendations via Partial Evaluation

It is shown that personalization of web content can be advantageously vi...
research
11/10/2020

Assessing the Feasibility of Web-Request Prediction Models on Mobile Platforms

Prefetching web pages is a well-studied solution to reduce network laten...

Please sign up or login with your details

Forgot password? Click here to reset