Insights from the Wikipedia Contest (IEEE Contest for Data Mining 2011)

by   Kalpit V Desai, et al.

The Wikimedia Foundation has recently observed that newly joining editors on Wikipedia are increasingly failing to integrate into the Wikipedia editors' community, i.e. the community is becoming increasingly harder to penetrate. To sustain healthy growth of the community, the Wikimedia Foundation aims to quantitatively understand the factors that determine the editing behavior, and explain why most new editors become inactive soon after joining. As a step towards this broader goal, the Wikimedia foundation sponsored the ICDM (IEEE International Conference for Data Mining) contest for the year 2011. The objective for the participants was to develop models to predict the number of edits that an editor will make in future five months based on the editing history of the editor. Here we describe the approach we followed for developing predictive models towards this goal, the results that we obtained and the modeling insights that we gained from this exercise. In addition, towards the broader goal of Wikimedia Foundation, we also summarize the factors that emerged during our model building exercise as powerful predictors of future editing activity.


page 1

page 2

page 3

page 4


Publishing Wikipedia usage data with strong privacy guarantees

For almost 20 years, the Wikimedia Foundation has been publishing statis...

Eliciting New Wikipedia Users' Interests via Automatically Mined Questionnaires: For a Warm Welcome, Not a Cold Start

Every day, thousands of users sign up as new Wikipedia contributors. Onc...

IEEE BigData 2021 Cup: Soft Sensing at Scale

IEEE BigData 2021 Cup: Soft Sensing at Scale is a data mining competitio...

Manifesto for Putting 'Chartjunk' in the Trash 2021!

In this provocation we ask the visualization research community to join ...

Bringing Salary Transparency to the World: Computing Robust Compensation Insights via LinkedIn Salary

The recently launched LinkedIn Salary product has been designed with the...

Analysing Timelines of National Histories across Wikipedia Editions: A Comparative Computational Approach

Portrayals of history are never complete, and each description inherentl...

Trajectories of Blocked Community Members: Redemption, Recidivism and Departure

Community norm violations can impair constructive communication and coll...

Please sign up or login with your details

Forgot password? Click here to reset