Predicting Links on Wikipedia with Anchor Text Information

05/25/2021
by   Robin Brochier, et al.
0

Wikipedia, the largest open-collaborative online encyclopedia, is a corpus of documents bound together by internal hyperlinks. These links form the building blocks of a large network whose structure contains important information on the concepts covered in this encyclopedia. The presence of a link between two articles, materialised by an anchor text in the source page pointing to the target page, can increase readers' understanding of a topic. However, the process of linking follows specific editorial rules to avoid both under-linking and over-linking. In this paper, we study the transductive and the inductive tasks of link prediction on several subsets of the English Wikipedia and identify some key challenges behind automatic linking based on anchor text information. We propose an appropriate evaluation sampling methodology and compare several algorithms. Moreover, we propose baseline models that provide a good estimation of the overall difficulty of the tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/29/2022

Anchor Prediction: A Topic Modeling Approach

Networks of documents connected by hyperlinks, such as Wikipedia, are ub...
research
05/23/2023

Anchor Prediction: Automatic Refinement of Internet Links

Internet links enable users to deepen their understanding of a topic by ...
research
05/01/2018

Exploring the Accuracy of MIRT Scale Linking Procedures for Mixed-format Tests

This study investigates the accuracy of Stocking-Lord scale linking proc...
research
12/15/2021

Event Linking: Grounding Event Mentions to Wikipedia

Comprehending an article requires understanding its constituent events. ...
research
09/27/2021

Multi-Task and Multi-Corpora Training Strategies to Enhance Argumentative Sentence Linking Performance

Argumentative structure prediction aims to establish links between textu...
research
05/31/2021

A Multilingual Entity Linking System for Wikipedia with a Machine-in-the-Loop Approach

Hyperlinks constitute the backbone of the Web; they enable user navigati...
research
08/05/2020

Computational linguistic assessment of textbook and online learning media by means of threshold concepts in business education

Threshold concepts are key terms in domain-based knowledge acquisition. ...

Please sign up or login with your details

Forgot password? Click here to reset