Cascaded Semantic and Positional Self-Attention Network for Document Classification

09/15/2020
by   Juyong Jiang, et al.
0

Transformers have shown great success in learning representations for language modelling. However, an open challenge still remains on how to systematically aggregate semantic information (word embedding) with positional (or temporal) information (word orders). In this work, we propose a new architecture to aggregate the two sources of information using cascaded semantic and positional self-attention network (CSPAN) in the context of document classification. The CSPAN uses a semantic self-attention layer cascaded with Bi-LSTM to process the semantic and positional information in a sequential manner, and then adaptively combine them together through a residue connection. Compared with commonly used positional encoding schemes, CSPAN can exploit the interaction between semantics and word positions in a more interpretable and adaptive manner, and the classification performance can be notably improved while simultaneously preserving a compact model size and high convergence rate. We evaluate the CSPAN model on several benchmark data sets for document classification with careful ablation studies, and demonstrate the encouraging results compared with state of the art.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/20/2021

Contrastive Document Representation Learning with Graph Attention Networks

Recent progress in pretrained Transformer-based language models has show...
research
10/31/2018

Convolutional Self-Attention Network

Self-attention network (SAN) has recently attracted increasing interest ...
research
06/12/2023

Recurrent Attention Networks for Long-text Modeling

Self-attention-based models have achieved remarkable progress in short-t...
research
01/13/2021

F3SNet: A Four-Step Strategy for QIM Steganalysis of Compressed Speech Based on Hierarchical Attention Network

Traditional machine learning-based steganalysis methods on compressed sp...
research
06/28/2020

Rethinking Positional Encoding in Language Pre-training

How to explicitly encode positional information into neural networks is ...
research
06/28/2020

Rethinking the Positional Encoding in Language Pre-training

How to explicitly encode positional information into neural networks is ...
research
06/22/2020

A Self-Attention Network based Node Embedding Model

Despite several signs of progress have been made recently, limited resea...

Please sign up or login with your details

Forgot password? Click here to reset