The Mafiascum Dataset: A Large Text Corpus for Deception Detection

11/19/2018
by   Bob de Ruiter, et al.
0

Detecting deception in natural language has a wide variety of applications, but because of its hidden nature there are no public, large-scale sources of labeled deceptive text. This work introduces the Mafiascum dataset [1], a collection of over 700 games of Mafia, in which players are randomly assigned either deceptive or non-deceptive roles and then interact via forum postings. Almost 10,000 documents were compiled from the dataset, which each contained all messages written by a single player in a single game. This corpus was used to construct a set of hand-picked linguistic features based on prior deception research and a set of average word vectors enriched with subword information. An SVM classifier fit on a combination of these feature sets achieved an area under the precision-recall curve of 0.35 (chance = 0.26) and an ROC AUC of 0.64 (chance = 0.50). [1] https://bitbucket.org/bopjesvla/thesis/src

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/05/2022

Putting the Con in Context: Identifying Deceptive Actors in the Game of Mafia

While neural networks demonstrate a remarkable ability to model linguist...
research
01/14/2019

Albanian Language Identification in Text Documents

In this work we investigate the accuracy of standard and state-of-the-ar...
research
04/27/2022

CREER: A Large-Scale Corpus for Relation Extraction and Entity Recognition

We describe the design and use of the CREER dataset, a large corpus anno...
research
11/22/2018

Creating a contemporary corpus of similes in Serbian by using natural language processing

Simile is a figure of speech that compares two things through the use of...
research
08/21/2019

WikiCREM: A Large Unsupervised Corpus for Coreference Resolution

Pronoun resolution is a major area of natural language understanding. Ho...
research
04/19/2017

Predicting Role Relevance with Minimal Domain Expertise in a Financial Domain

Word embeddings have made enormous inroads in recent years in a wide var...
research
04/18/2021

Documenting the English Colossal Clean Crawled Corpus

As language models are trained on ever more text, researchers are turnin...

Please sign up or login with your details

Forgot password? Click here to reset