Exploiting Language Model for Efficient Linguistic Steganalysis

07/26/2021
by   Biao Yi, et al.
0

Recent advances in linguistic steganalysis have successively applied CNN, RNN, GNN and other efficient deep models for detecting secret information in generative texts. These methods tend to seek stronger feature extractors to achieve higher steganalysis effects. However, we have found through experiments that there actually exists significant difference between automatically generated stego texts and carrier texts in terms of the conditional probability distribution of individual words. Such kind of difference can be naturally captured by the language model used for generating stego texts. Through further experiments, we conclude that this ability can be transplanted to a text classifier by pre-training and fine-tuning to improve the detection performance. Motivated by this insight, we propose two methods for efficient linguistic steganalysis. One is to pre-train a language model based on RNN, and the other is to pre-train a sequence autoencoder. The results indicate that the two methods have different degrees of performance gain compared to the randomly initialized RNN, and the convergence speed is significantly accelerated. Moreover, our methods have achieved the state-of-the-art detection results.

READ FULL TEXT
research
08/24/2021

Detection of Criminal Texts for the Polish State Border Guard

This paper describes research on the detection of Polish criminal texts ...
research
10/28/2022

RoChBert: Towards Robust BERT Fine-tuning for Chinese

Despite of the superb performance on a wide range of tasks, pre-trained ...
research
05/07/2015

Language Models for Image Captioning: The Quirks and What Works

Two recent approaches have achieved state-of-the-art results in image ca...
research
12/29/2020

Generating Adversarial Examples in Chinese Texts Using Sentence-Pieces

Adversarial attacks in texts are mostly substitution-based methods that ...
research
04/20/2021

Frustratingly Easy Edit-based Linguistic Steganography with a Masked Language Model

With advances in neural language models, the focus of linguistic stegano...
research
12/01/2022

Language Model Pre-training on True Negatives

Discriminative pre-trained language models (PLMs) learn to predict origi...
research
11/30/2021

Towards automatic identification of linguistic politeness in Hindi texts

In this paper I present a classifier for automatic identification of lin...

Please sign up or login with your details

Forgot password? Click here to reset