SASICM A Multi-Task Benchmark For Subtext Recognition

by   Hua Yan, et al.

Subtext is a kind of deep semantics which can be acquired after one or more rounds of expression transformation. As a popular way of expressing one's intentions, it is well worth studying. In this paper, we try to make computers understand whether there is a subtext by means of machine learning. We build a Chinese dataset whose source data comes from the popular social media (e.g. Weibo, Netease Music, Zhihu, and Bilibili). In addition, we also build a baseline model called SASICM to deal with subtext recognition. The F1 score of SASICMg, whose pretrained model is GloVe, is as high as 64.37 higher than that of BERT based model, 12.7 methods on average, including support vector machine, logistic regression classifier, maximum entropy classifier, naive bayes classifier and decision tree and 2.39 BTM. The F1 score of SASICMBERT, whose pretrained model is BERT, is 65.12 which is 0.75 SASICMBERT are 71.16 other methods which are mentioned before.


page 21

page 25


Yelp Review Rating Prediction: Machine Learning and Deep Learning Models

We predict restaurant ratings from Yelp reviews based on Yelp Open Datas...

Gender Prediction Based on Vietnamese Names with Machine Learning Techniques

As biological gender is one of the aspects of presenting individual huma...

Danish Stance Classification and Rumour Resolution

The Internet is rife with flourishing rumours that spread through microb...

Sentiment Analysis of Code-Mixed Social Media Text (Hinglish)

This paper discusses the results obtained for different techniques appli...

An Empirical Study on Sentiment Classification of Chinese Review using Word Embedding

In this article, how word embeddings can be used as features in Chinese ...

Hypers at ComMA@ICON: Modelling Aggressiveness, Gender Bias and Communal Bias Identification

Due to the exponentially increasing reach of social media, it is essenti...

Uzbek Cyrillic-Latin-Cyrillic Machine Transliteration

In this paper, we introduce a data-driven approach to transliterating Uz...