Dutch Humor Detection by Generating Negative Examples

10/26/2020
by   Thomas Winters, et al.
0

Detecting if a text is humorous is a hard task to do computationally, as it usually requires linguistic and common sense insights. In machine learning, humor detection is usually modeled as a binary classification task, trained to predict if the given text is a joke or another type of text. Rather than using completely different non-humorous texts, we propose using text generation algorithms for imitating the original joke dataset to increase the difficulty for the learning algorithm. We constructed several different joke and non-joke datasets to test the humor detection abilities of different language technologies. In particular, we compare the humor detection capabilities of classic neural network approaches with the state-of-the-art Dutch language model RobBERT. In doing so, we create and compare the first Dutch humor detection systems. We found that while other language models perform well when the non-jokes came from completely different domains, RobBERT was the only one that was able to distinguish jokes from generated negative examples. This performance illustrates the usefulness of using text generation to create negative datasets for humor recognition, and also shows that transformer models are a large step forward in humor detection.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/04/2022

A Benchmark Corpus for the Detection of Automatically Generated Text in Academic Publications

Automatic text generation based on neural language models has achieved p...
research
09/07/2022

SynSciPass: detecting appropriate uses of scientific text generation

Approaches to machine generated text detection tend to focus on binary c...
research
06/16/2022

DIALOG-22 RuATD Generated Text Detection

Text Generation Models (TGMs) succeed in creating text that matches huma...
research
02/28/2021

Towards Conversational Humor Analysis and Design

Well-defined jokes can be divided neatly into a setup and a punchline. W...
research
10/07/2022

How Large Language Models are Transforming Machine-Paraphrased Plagiarism

The recent success of large language models for text generation poses a ...
research
05/09/2023

DeepTextMark: Deep Learning based Text Watermarking for Detection of Large Language Model Generated Text

The capabilities of text generators have grown with the rapid developmen...
research
12/14/2020

Contrastive Learning with Adversarial Perturbations for Conditional Text Generation

Recently, sequence-to-sequence (seq2seq) models with the Transformer arc...

Please sign up or login with your details

Forgot password? Click here to reset