Due to the rapid development of the internet, the number of users on social networks have increased significantly. Data generated from wallet social networks also has exponentially grown. Users’ commenting or posting are difficult to control. Therefore, a tool that categorizes posts and comments is essential. This is a primary major that VLSP Shared Task 2019 open with the first task - Hate Speech Detection on Social Networks with the purpose of detecting Vietnamese social media text according to predefined labels.
Recently, Hate Speech Detection has been studied by researchers in the field of natural language processing through the Shared Task SemEval 2019 - Task 5: Multilingual detection of hate speech against immigrants and women in Twitter  which mainly solve problems of predicting hate speech on social media in English and Spanish. In addition, Zhang and Luo evaluated on the largest collection of hate speech datasets based on Twitter  on deep neural models.
In this task, we focus on a solution for predicting hate speech on Vietnamese which is a low-resource language for natural language processing. In particular, we have implemented deep learning to classify comments or posts on social networks. The problem is stated as:
Input: Given a Vietnamese post/comment on social network.
Output: One of three labels (HATE, OFFENSIVE, or CLEAN) which is predicted by our system.
Table I shows several examples for this task.
|1||Thương tụi mày quá không biết tụi mày có thương tao ko :(||Clean(0)|
|2||Thi đấu thể thao chuyên nghiệp ở trong nước bạc bẽo vl||Offensive(1)|
|3||Không ai rãnh mà nói chuyện với mày đâu thằng ngũ||Hate(2)|
In this paper, our contributions are presented as follows.
Firstly, we implemented three different models based on neural networks such as TextCNN, Bi-GRU-CNN and Bi-GRU-LSTM-CNN to solve the VLSP Shared Task: Hate Speech Detection on Vietnamese social media text.
Secondly, we achieved the best result on this task accounting for 70.576% on the public test, ranking the 5th in the Hate Speech Detection task on social networks.
The organization of the paper is as follows: in the section 2 we will discuss related works on the topic and related models, the third section we will talk about the data set we have, the section 4 and 5 is pre-processing and the proposed method, the section 6 is our experiment and the section 7 is the conclusion and the future work.
Ii Related work
Deep neural network models have been widely used to improve performance of a different natural language processing (NLP) tasks. 
have demonstrated the effectiveness of combining pre-processing and the CNN-GRU network where the network consists of an word embedding layer, CNN-1D, 1D max-pooling, GRU, global max pooling and a softmax layer. Zhang et al. have empirically illustrated that CNN perfectly works in classification of text. RNNs shown in  and Bi-LSTMs shown in 
also give better performance in text classification. Besides, there are many traditional machine learning8] were used to recognize hate speech in Tweets. Besides, we also take some other combination models for classification, for example, Bi-RNN , Bidirectional-GRU , Bidirectional-LSTM, Bi-LSTM-CNN and Bi-LSTM-CRF 
. Facebook Artificial Intelligence Research (FAIR) developed a pre-trained word embedding which is very good to text-classification model involving out-of-vocabulary words.
We use a dataset which the VLSP Shared Task 2019 provide, containing posts or comments from the social network Facebook which are annotated with three different classes (Hate, Offensive and Clean).
HATE (Hate Speech): a comment or post is identified as hate speech if it (1) targets individuals or groups on the basis of their characteristics; (2) demonstrates a clear intention to incite harm, or to promote hatred; (3) may or may not use offensive or profane words. For example: “Assimilate? No they all need to go back to their own countries. #BanMuslims Sorry if someone disagrees too bad.”. See the definition of Zhang et al. . In contrast, “All you perverts (other than me) who posted today, needs to leave the O Board. Dfasdfdasfadfs” is an example of abusive language, which often bears the purpose of insulting individuals or groups, and can include hate speech, derogatory and offensive language.
OFFENSIVE (Offensive but not hate speech): a post or comment may contain offensive words but it does not target individuals or groups on the basis of their characteristics. For instance, “WTF, tomorrow is Monday already.”
CLEAN (Neither offensive nor hate speech): normal comments or posts on social networks, it does not contain offensive or hate speech. For example, “She learned how to paint very hard when she was young”.
Iv Text Pre-processing
We use several simple techniques in text pre-processing in all models for this task as follows.
Converting all words to lower case.
Removing extra white spaces, punctuation marks.
Replacing all numbers with ”number”.
Word tokenization using the pivy library .
V Bi-GRU-LSTM-CNN Model For Vietnamese Hate Speech Detection
In this section, we propose a deep neural model for the prediction of hate speech on social media text. Figure 1
shows the architecture of our network. The basic architecture in this paper is Convolutional Neural Network (CNN) with 1D convolutions.In addition, we also study about two other deep neural models which are Long Short Term Memory (LSTM) and Gated Recurrent Unit (GRU). The details of all these neural models are presented in next sub-sections. In this model, there are several common parts:
Word embedding layer
: The input is a matrix of 220x300 dimensions. In particular, each sentence has only 220 words and each word is represented by a 300 dimensional word embedding. Pre-training word level vector already is a kind of word representations for deep neural network models since Word2Vec. In our experiments, we choose FastText  as our pre-training model.
CNN-1D layer: We use a 1D spatial drop out with 0.2 dropout rate. It can prevent the model from over-fitting and to get better generalizations.
: The model uses two parallel blocks of Bidirectional Long Short Term Memory (Bi-LSTM) where the term Bidirectional is that the input sequence is given to the LSTM in two different ways. LSTM is a variation of a recurrent neural network that has an input gate, an output gate, a forget gate and a cell. In our experiment, we used two parallel bidirectional LSTM blocks having 112 units for each. We used sigmoid and tanh for recurrent activations and hidden units respectively.
Bidirectional GRU: Different from LSTMs, gated recurrent units (GRU) is without output gate.  introduced firstly in 2014. In addition, GRUs have an update gate and a reset gate which is responsible of combining new input with the previous one. Finally, the update gate is responsible of how much the previous memory is required to be saved.
Vi Experimental Results
In this section, we describe our experiments and results for the task. Evaluation for this task is based on a metric of the F1-score. We show the results of our experiments for Vietnamese hate speech detection task in Table III. In particular, the Bi-GRU-LSTM-CNN achieved the best performance among three different models that we tried to conduct experiments.
Table IV shows the 5th rank of performance on the public-test set. As a result, our rank in this task is the 5th with 70.576% of F1-score. The results were not significantly different from other teams on the public-test set. However, our results only ranked the 11th on the private-test set.
Vii Conclusion and Future Work
In this paper, we have described our approach to solve the hate speech detection task proposed at the VLSP Shared Task 2019. We develop the system using supervised approach for classifying three different labels. We participate in this and evaluate the performance of our system on this dataset. Our official result is 70.576% of F1-score, ranking the 5th of the scoreboard on the public-test set.
In the future work, we plan to address this problem in different ways to improve the performance. We will investigate directions both in traditional machine learning and types of deep neural network models for this problem. In addition, we also analyze experimental results on this task to select the efficient approach such as the hybrid approach which combines supervised method and rule heuristic to improve the result of detecting hate speech on social media text.
We would like to thank the VLSP Shared Task 2019 organizers for their really hard work and providing the Vietnamese Hate Speech Detection dataset for our experiments.
-  Basile, V., Bosco, C., Fersini, E., Nozza, D., Patti, V., Rangel, F., Rosso, P., Sanguinetti, M.: Semeval-2019 task 5: Multilingual detection of hate speech against immigrants and women in twitter. In: Proceedings of the 13th International Workshop on Semantic Evaluation (SemEval-2019). Association for Computational Linguistics.
-  Zhang, Z., Luo, L.: Hate speech detection: A solved problem? the challenging case of long tail on twitter. CoRR abs/1803.03662 (2018), http://arxiv.org/abs/1803.03662
-  Z. Zhang, D. Robinson, and J. Tepper, “Detecting hate speech on twitter using a convolution-gru based deep neural network,” in European Semantic Web Conference. Springer, 2018, pp. 745–760.
-  Alon Jacovi, Oren Sar Shalom, and Yoav Goldberg. 2018. Understanding Convolutional Neural Networks for Text Classification. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 56–65. Association for Computational Linguistics.
-  Liu, P., Qiu, X., & Huang, X. (2016). Recurrent neural network for text classification with multi-task learning. arXiv preprint arXiv:1605.05101.
-  P. Zhou, Z. Qi, S. Zheng, J. Xu, H. Bao, and B. Xu, “Text classification improved by integrating bidirectional lstm with two-dimensional max pooling,” arXiv preprint arXiv:1611.06639, 2016.
-  P. Badjatiya, S. Gupta, M. Gupta, and V. Varma, “Deep learning for hate speech detection in tweets,” in Proceedings of the 26th International Conference on World Wide Web Companion. International World Wide Web Conferences Steering Committee, 2017, pp. 759–760.
-  Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of EMNLP-2014, pages 1532–1543, Doha, Qatar, October.
M. Schuster and K.K. Paliwal. Bidirectional recurrent neural networks. Trans. Sig. Proc., 45(11):2673–2681, November 1997.
-  Zhiyao Duan Rui Lu. Bidirectional GRU for sound event detection. 2017.
-  Peng Zhou, Zhenyu Qi, Suncong Zheng, Jiaming Xu, Hongyun Bao, and Bo Xu. Text classification improved by integrating bidirectional lstm with two-dimensional max pooling. 2016.
-  Jason P. C. Chiu and Eric Nichols. Named entity recognition with bidirectional lstm-cnns. 2015. cite arxiv:1511.08308.
-  Zhiheng Huang, Wei Xu, and Kai Yu. Bidirectional lstm-crf models for sequence tagging. 08 2015.
-  P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, “Enriching word vectors with subword information,” arXiv preprint arXiv:1607.04606, 2016.
-  Pyvi library, link: https://pypi.org/project/pyvi.
-  Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. CoRR, abs/1301.3781, 2013.
-  Edouard Grave, Piotr Bojanowski, Prakhar Gupta, Armand Joulin, and Tomas Mikolov. 2018. Learning word vectors for 157 languages. LREC.
-  K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using rnn encoder-decoder for statistical machine translation,” arXiv preprint arXiv:1406.1078, 2014.