The Dark Side of the Language: Pre-trained Transformers in the DarkNet

01/14/2022
by   Leonardo Ranaldi, et al.
0

Pre-trained Transformers are challenging human performances in many natural language processing tasks. The gigantic datasets used for pre-training seem to be the key for their success on existing tasks. In this paper, we explore how a range of pre-trained natural language understanding models perform on truly novel and unexplored data, provided by classification tasks over a DarkNet corpus. Surprisingly, results show that syntactic and lexical neural networks largely outperform pre-trained Transformers. This seems to suggest that pre-trained Transformers have serious difficulties in adapting to radically novel texts.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/07/2021

Syntax-BERT: Improving Pre-trained Transformers with Syntax Trees

Pre-trained language models like BERT achieve superior performances in v...
research
09/10/2021

On the validity of pre-trained transformers for natural language processing in the software engineering domain

Transformers are the current state-of-the-art of natural language proces...
research
05/07/2021

Are Pre-trained Convolutions Better than Pre-trained Transformers?

In the era of pre-trained language models, Transformers are the de facto...
research
09/05/2019

Effective Use of Transformer Networks for Entity Tracking

Tracking entities in procedural language requires understanding the tran...
research
05/21/2022

Calibration of Natural Language Understanding Models with Venn–ABERS Predictors

Transformers, currently the state-of-the-art in natural language underst...
research
06/06/2021

Exploring the Limits of Out-of-Distribution Detection

Near out-of-distribution detection (OOD) is a major challenge for deep n...
research
05/16/2023

Mimetic Initialization of Self-Attention Layers

It is notoriously difficult to train Transformers on small datasets; typ...

Please sign up or login with your details

Forgot password? Click here to reset