WikiDoMiner: Wikipedia Domain-specific Miner

06/21/2022
by   Saad Ezzini, et al.
0

We introduce WikiDoMiner, a tool for automatically generating domain-specific corpora by crawling Wikipedia. WikiDoMiner helps requirements engineers create an external knowledge resource that is specific to the underlying domain of a given requirements specification (RS). Being able to build such a resource is important since domain-specific datasets are scarce. WikiDoMiner generates a corpus by first extracting a set of domain-specific keywords from a given RS, and then querying Wikipedia for these keywords. The output of WikiDoMiner is a set of Wikipedia articles relevant to the domain of the input RS. Mining Wikipedia for domain-specific knowledge can be beneficial for multiple requirements engineering tasks, e.g., ambiguity handling, requirements classification, and question answering. WikiDoMiner is publicly available on Zenodo under an open-source license (DOI: 10.5281/zenodo.6671357).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/03/2019

Improving Question Answering with External Knowledge

Prior background knowledge is essential for human reading and understand...
research
09/17/2020

What if we had no Wikipedia? Domain-independent Term Extraction from a Large News Corpus

One of the most impressive human endeavors of the past two decades is th...
research
01/23/2013

An Application of Uncertain Reasoning to Requirements Engineering

This paper examines the use of Bayesian Networks to tackle one of the to...
research
06/21/2022

TAPHSIR: Towards AnaPHoric Ambiguity Detection and ReSolution In Requirements

We introduce TAPHSIR, a tool for anaphoric ambiguity detection and anaph...
research
04/28/2023

Made of Steel? Learning Plausible Materials for Components in the Vehicle Repair Domain

We propose a novel approach to learn domain-specific plausible materials...
research
10/03/2022

Russian Web Tables: A Public Corpus of Web Tables for Russian Language Based on Wikipedia

Corpora that contain tabular data such as WebTables are a vital resource...
research
10/01/2019

Essentia: Mining Domain-specific Paraphrases with Word-Alignment Graphs

Paraphrases are important linguistic resources for a wide variety of NLP...

Please sign up or login with your details

Forgot password? Click here to reset