Large-Scale Vandalism Detection with Linear Classifiers - The Conkerberry Vandalism Detector at WSDM Cup 2017

12/19/2017
by   Alexey Grigorev, et al.
0

Nowadays many artificial intelligence systems rely on knowledge bases for enriching the information they process. Such Knowledge Bases are usually difficult to obtain and therefore they are crowdsourced: they are available for everyone on the internet to suggest edits and add new information. Unfortunately, they are sometimes targeted by vandals who put inaccurate or offensive information there. This is especially bad for the systems that use these Knowledge Bases: for them it is important to use reliable information to make correct inferences. One of such knowledge bases is Wikidata, and to fight vandals the organizers of WSDM Cup 2017 challenged participants to build a model for detecting mistrustful edits. In this paper we present the second place solution to the cup: we show that it is possible to achieve competitive performance with simple linear classification. With our approach we can achieve AU ROC of 0.938 on the test data. Additionally, compared to other approaches, ours is significantly faster. The solution is made available on GitHub.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/19/2012

SiGMa: Simple Greedy Matching for Aligning Large Knowledge Bases

The Internet has enabled the creation of a growing number of large-scale...
research
12/24/2015

RDF2Rules: Learning Rules from RDF Knowledge Bases by Mining Frequent Predicate Cycles

Recently, several large-scale RDF knowledge bases have been built and ap...
research
01/04/2017

On the Usability of Probably Approximately Correct Implication Bases

We revisit the notion of probably approximately correct implication base...
research
03/13/2000

XNMR: A tool for knowledge bases exploration

XNMR is a system designed to explore the results of combining the well-f...
research
09/28/2020

Learning Knowledge Bases with Parameters for Task-Oriented Dialogue Systems

Task-oriented dialogue systems are either modularized with separate dial...
research
06/09/2009

Toward a Category Theory Design of Ontological Knowledge Bases

I discuss (ontologies_and_ontological_knowledge_bases / formal_methods_a...
research
07/08/2021

A Note on General Statistics of Publicly Accessible Knowledge Bases

Knowledge bases are prevalent in various domains and have been widely us...

Please sign up or login with your details

Forgot password? Click here to reset