Dataset Generation Patterns for Evaluating Knowledge Graph Construction

by   Markus Schröder, et al.

Confidentiality hinders the publication of authentic, labeled datasets of personal and enterprise data, although they could be useful for evaluating knowledge graph construction approaches in industrial scenarios. Therefore, our plan is to synthetically generate such data in a way that it appears as authentic as possible. Based on our assumption that knowledge workers have certain habits when they produce or manage data, generation patterns could be discovered which can be utilized by data generators to imitate real datasets. In this paper, we initially derived 11 distinct patterns found in real spreadsheets from industry and demonstrate a suitable generator called Data Sprout that is able to reproduce them. We describe how the generator produces spreadsheets in general and what altering effects the implemented patterns have.



There are no comments yet.


page 1

page 2

page 3

page 4


Construction and Application of Teaching System Based on Crowdsourcing Knowledge Graph

Through the combination of crowdsourcing knowledge graph and teaching sy...

Are Missing Links Predictable? An Inferential Benchmark for Knowledge Graph Completion

We present InferWiki, a Knowledge Graph Completion (KGC) dataset that im...

Saga: A Platform for Continuous Construction and Serving of Knowledge At Scale

We introduce Saga, a next-generation knowledge construction and serving ...

Personal Health Knowledge Graph for Clinically Relevant Diet Recommendations

We propose a knowledge model for capturing dietary preferences and perso...

Knowledge Graph semantic enhancement of input data for improving AI

Intelligent systems designed using machine learning algorithms require a...

IPRE: a Dataset for Inter-Personal Relationship Extraction

Inter-personal relationship is the basis of human society. In order to a...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.