ORCA: a Benchmark for Data Web Crawlers

12/17/2019
by   Michael Röder, et al.
0

The number of RDF knowledge graphs available on the Web grows constantly. Gathering these graphs at large scale for downstream applications hence requires the use of crawlers. Although Data Web crawlers exist, and general Web crawlers could be adapted to focus on the Data Web, there is currently no benchmark to fairly evaluate their performance. Our work closes this gap by presenting the Orca benchmark. Orca generates a synthetic Data Web, which is decoupled from the original Web and enables a fair and repeatable comparison of Data Web crawlers. Our evaluations show that Orca can be used to reveal the different advantages and disadvantages of existing crawlers. The benchmark is open-source and available at https://github.com/dice-group/orca.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/21/2020

Howl: A Deployed, Open-Source Wake Word Detection System

We describe Howl, an open-source wake word detection toolkit with native...
research
07/07/2022

VeriDark: A Large-Scale Benchmark for Authorship Verification on the Dark Web

The DarkWeb represents a hotbed for illicit activity, where users commun...
research
10/13/2020

Annotationsaurus: A Searchable Directory of Annotation Tools

Manual annotation of textual documents is a necessary task when construc...
research
04/28/2020

KACC: A Multi-task Benchmark for Knowledge Abstraction, Concretization and Completion

Knowledge graphs (KGs) contains an instance-level entity graph and an on...
research
07/24/2019

The sameAs Problem: A Survey on Identity Management in the Web of Data

In a decentralised knowledge representation system such as the Web of Da...
research
10/31/2017

SemTK: An Ontology-first, Open Source Semantic Toolkit for Managing and Querying Knowledge Graphs

The relatively recent adoption of Knowledge Graphs as an enabling techno...
research
11/02/2022

Web-based Elicitation of Human Perception on mixup Data

Synthetic data is proliferating on the web and powering many advances in...

Please sign up or login with your details

Forgot password? Click here to reset