Wikipedia Reader Navigation: When Synthetic Data Is Enough

01/03/2022
by   Akhil Arora, et al.
0

Every day millions of people read Wikipedia. When navigating the vast space of available topics using hyperlinks, readers describe trajectories on the article network. Understanding these navigation patterns is crucial to better serve readers' needs and address structural biases and knowledge gaps. However, systematic studies of navigation on Wikipedia are hindered by a lack of publicly available data due to the commitment to protect readers' privacy by not storing or sharing potentially sensitive data. In this paper, we ask: How well can Wikipedia readers' navigation be approximated by using publicly available resources, most notably the Wikipedia clickstream data? We systematically quantify the differences between real navigation sequences and synthetic sequences generated from the clickstream data, in 6 analyses across 8 Wikipedia language versions. Overall, we find that the differences between real and synthetic sequences are statistically significant, but with small effect sizes, often well below 10 utility of the Wikipedia clickstream data as a public resource: clickstream data can closely capture reader navigation on Wikipedia and provides a sufficient approximation for most practical downstream applications relying on reader data. More broadly, this study provides an example for how clickstream-like data can generally enable research on user navigation on online platforms while protecting users' privacy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/22/2021

A Large-Scale Characterization of How Readers Browse Wikipedia

Despite the importance and pervasiveness of Wikipedia as one of the larg...
research
10/09/2017

Inspiration, Captivation, and Misdirection: Emergent Properties in Networks of Online Navigation

The World Wide Web (WWW) has fundamentally changed the ways billions of ...
research
02/17/2017

Why We Read Wikipedia

Wikipedia is one of the most popular sites on the Web, with millions of ...
research
12/02/2018

Why the World Reads Wikipedia: Beyond English Speakers

As one of the Web's primary multilingual knowledge sources, Wikipedia is...
research
03/17/2023

Individual differences in knowledge network navigation

As online information accumulates at an unprecedented rate, it is becomi...
research
07/16/2020

Wikipedia's Network Bias on Controversial Topics

The most important feature of Wikipedia is the presence of hyperlinks in...
research
03/14/2022

Going Down the Rabbit Hole: Characterizing the Long Tail of Wikipedia Reading Sessions

"Wiki rabbit holes" are informally defined as navigation paths followed ...

Please sign up or login with your details

Forgot password? Click here to reset