Detect Camouflaged Spam Content via StoneSkipping: Graph and Text Joint Embedding for Chinese Character Variation Representation

08/30/2019
by   Zhuoren Jiang, et al.
0

The task of Chinese text spam detection is very challenging due to both glyph and phonetic variations of Chinese characters. This paper proposes a novel framework to jointly model Chinese variational, semantic, and contextualized representations for Chinese text spam detection task. In particular, a Variation Family-enhanced Graph Embedding (VFGE) algorithm is designed based on a Chinese character variation graph. The VFGE can learn both the graph embeddings of the Chinese characters (local) and the latent variation families (global). Furthermore, an enhanced bidirectional language model, with a combination gate function and an aggregation learning function, is proposed to integrate the graph and text information while capturing the sequential information. Extensive experiments have been conducted on both SMS and review datasets, to show the proposed method outperforms a series of state-of-the-art models for Chinese spam detection.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/26/2015

Component-Enhanced Chinese Character Embeddings

Distributed word representations are very useful for capturing semantic ...
research
10/05/2022

Reading Chinese in Natural Scenes with a Bag-of-Radicals Prior

Scene text recognition (STR) on Latin datasets has been extensively stud...
research
04/26/2020

SpellGCN: Incorporating Phonological and Visual Similarities into Language Models for Chinese Spelling Check

Chinese Spelling Check (CSC) is a task to detect and correct spelling er...
research
05/24/2023

Disentangled Phonetic Representation for Chinese Spelling Correction

Chinese Spelling Correction (CSC) aims to detect and correct erroneous c...
research
12/09/2022

Moto: Enhancing Embedding with Multiple Joint Factors for Chinese Text Classification

Recently, language representation techniques have achieved great perform...
research
04/06/2018

Learning Joint Gaussian Representations for Movies, Actors, and Literary Characters

Understanding of narrative content has become an increasingly popular to...
research
05/05/2023

Block the Label and Noise: An N-Gram Masked Speller for Chinese Spell Checking

Recently, Chinese Spell Checking(CSC), a task to detect erroneous charac...

Please sign up or login with your details

Forgot password? Click here to reset