Tracing Text Provenance via Context-Aware Lexical Substitution

12/15/2021
by   Xi Yang, et al.
0

Text content created by humans or language models is often stolen or misused by adversaries. Tracing text provenance can help claim the ownership of text content or identify the malicious users who distribute misleading content like machine-generated fake news. There have been some attempts to achieve this, mainly based on watermarking techniques. Specifically, traditional text watermarking methods embed watermarks by slightly altering text format like line spacing and font, which, however, are fragile to cross-media transmissions like OCR. Considering this, natural language watermarking methods represent watermarks by replacing words in original sentences with synonyms from handcrafted lexical resources (e.g., WordNet), but they do not consider the substitution's impact on the overall sentence's meaning. Recently, a transformer-based network was proposed to embed watermarks by modifying the unobtrusive words (e.g., function words), which also impair the sentence's logical and semantic coherence. Besides, one well-trained network fails on other different types of text content. To address the limitations mentioned above, we propose a natural language watermarking scheme based on context-aware lexical substitution (LS). Specifically, we employ BERT to suggest LS candidates by inferring the semantic relatedness between the candidates and the original sentence. Based on this, a selection strategy in terms of synchronicity and substitutability is further designed to test whether a word is exactly suitable for carrying the watermark signal. Extensive experiments demonstrate that, under both objective and subjective metrics, our watermarking scheme can well preserve the semantic integrity of original sentences and has a better transferability than existing methods. Besides, the proposed LS approach outperforms the state-of-the-art approach on the Stanford Word Substitution Benchmark.

READ FULL TEXT
research
06/25/2020

LSBert: A Simple Framework for Lexical Simplification

Lexical simplification (LS) aims to replace complex words in a given sen...
research
07/14/2019

A Simple BERT-Based Approach for Lexical Simplification

Lexical simplification (LS) aims to replace complex words in a given sen...
research
09/13/2021

Hetero-SCAN: Towards Social Context Aware Fake News Detection via Heterogeneous Graph Neural Network

Fake news, false or misleading information presented as news, has a grea...
research
12/30/2020

Enhancing Pre-trained Language Model with Lexical Simplification

For both human readers and pre-trained language models (PrLMs), lexical ...
research
04/21/2019

Obfuscation for Privacy-preserving Syntactic Parsing

The goal of homomorphic encryption is to encrypt data such that another ...
research
02/03/2023

Lexical Simplification using multi level and modular approach

Text Simplification is an ongoing problem in Natural Language Processing...
research
10/07/2020

CATBERT: Context-Aware Tiny BERT for Detecting Social Engineering Emails

Targeted phishing emails are on the rise and facilitate the theft of bil...

Please sign up or login with your details

Forgot password? Click here to reset