Log In Sign Up

Transformer-F: A Transformer network with effective methods for learning universal sentence representation

by   Yu Shi, et al.

The Transformer model is widely used in natural language processing for sentence representation. However, the previous Transformer-based models focus on function words that have limited meaning in most cases and could merely extract high-level semantic abstraction features. In this paper, two approaches are introduced to improve the performance of Transformers. We calculated the attention score by multiplying the part-of-speech weight vector with the correlation coefficient, which helps extract the words with more practical meaning. The weight vector is obtained by the input text sequence based on the importance of the part-of-speech. Furthermore, we fuse the features of each layer to make the sentence representation results more comprehensive and accurate. In experiments, we demonstrate the effectiveness of our model Transformer-F on three standard text classification datasets. Experimental results show that our proposed model significantly boosts the performance of text classification as compared to the baseline model. Specifically, we obtain a 5.28


GRET: Global Representation Enhanced Transformer

Transformer, based on the encoder-decoder framework, has achieved state-...

Laughing Heads: Can Transformers Detect What Makes a Sentence Funny?

The automatic detection of humor poses a grand challenge for natural lan...

An Intelligent CNN-VAE Text Representation Technology Based on Text Semantics for Comprehensive Big Data

In the era of big data, a large number of text data generated by the Int...

Explicitly Modeling Adaptive Depths for Transformer

The vanilla Transformer conducts a fixed number of computations over all...

Semantic Hilbert Space for Text Representation Learning

Capturing the meaning of sentences has long been a challenging task. Cur...

Performance Comparison of Simple Transformer and Res-CNN-BiLSTM for Cyberbullying Classification

The task of text classification using Bidirectional based LSTM architect...