Detection of Criminal Texts for the Polish State Border Guard

08/24/2021
by   Artur Nowakowski, et al.
0

This paper describes research on the detection of Polish criminal texts appearing on the Internet. We carried out experiments to find the best available setup for the efficient classification of unbalanced and noisy data. The best performance was achieved when our model was fine-tuned on a pre-trained Polish-based transformer language model. For the detection task, a large corpus of annotated Internet snippets was collected as training data. We share this dataset and create a new task for the detection of criminal texts using the Gonito platform as the benchmark.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/01/2023

THUIR2 at NTCIR-16 Session Search (SS) Task

Our team(THUIR2) participated in both FOSS and POSS subtasks of the NTCI...
research
12/07/2020

Improvements and Extensions on Metaphor Detection

Metaphors are ubiquitous in human language. The metaphor detection task ...
research
07/26/2021

Exploiting Language Model for Efficient Linguistic Steganalysis

Recent advances in linguistic steganalysis have successively applied CNN...
research
10/15/2020

CXP949 at WNUT-2020 Task 2: Extracting Informative COVID-19 Tweets – RoBERTa Ensembles and The Continued Relevance of Handcrafted Features

This paper presents our submission to Task 2 of the Workshop on Noisy Us...
research
08/24/2023

American Stories: A Large-Scale Structured Text Dataset of Historical U.S. Newspapers

Existing full text datasets of U.S. public domain newspapers do not reco...
research
08/18/2021

Image Collation: Matching illustrations in manuscripts

Illustrations are an essential transmission instrument. For an historian...
research
08/04/2016

Word Segmentation on Micro-blog Texts with External Lexicon and Heterogeneous Data

This paper describes our system designed for the NLPCC 2016 shared task ...

Please sign up or login with your details

Forgot password? Click here to reset