SmartBullets: A Cloud-Assisted Bullet Screen Filter based on Deep Learning

05/15/2019 ∙ by Haoran Niu, et al. ∙ The University of Tennessee, Knoxville 0

Bullet-screen is a technique that enables the website users to send real-time comment `bullet' cross the screen. Compared with the traditional review of a video, bullet-screen provides new features of feeling expression to video watching and more iterations between video viewers. However, since all the comments from the viewers are shown on the screen publicly and simultaneously, some low-quality bullets will reduce the watching enjoyment of the users. Although the bullet-screen video websites have provided filter functions based on regular expression, bad bullets can still easily pass the filter through making a small modification. In this paper, we present SmartBullets, a user-centered bullet-screen filter based on deep learning techniques. A convolutional neural network is trained as the classifier to determine whether a bullet need to be removed according to its quality. Moreover, to increase the scalability of the filter, we employ a cloud-assisted framework by developing a backend cloud server and a front-end browser extension. The evaluation of 40 volunteers shows that SmartBullets can effectively remove the low-quality bullets and improve the overall watching experience of viewers.



There are no comments yet.


page 1

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Bullet-screen, also known as danmaku or DanMu, allows the viewer to send real-time comments, called bullets, that publicly fly across the screen when watching a video. As shown in Fig. 1, the bullets will overlay the video screen directly and will be publicly viewable to all users who watch the video.

In recent years, danmaku-enabled videos rapidly become popular, especially in East Asian countries, like China and Japan. Bilibili, one of the famous Chinese bullet-screen video websites, ranked 48th in all integrated websites all over the world in April 2019, according to Alexa [1], a well-known web traffic analysis company. Different from traditional video comments and reviews which are static and only allow users to remark the whole video, bullet-screen systems enable a user to create and view comments of a specific scene in a video, which provides users a more direct way to interact with each other and creates a real-time emotional sharing experience. In the rest of the paper, we will use bullet-screen and danmaku interchangeably for simplicity.

Fig. 1: Danmaku Example from (Comment from top to bottom: User A: This is the most popular one right now; User B: Looks at the official site; User C: I feel there are others; User D: Smirk)

Generally, a video’s bullets are publicly viewable to all the video watchers in danmaku system. Explanatory and humorous comments will strike a chord with the viewers, and further, enhance the interactions between the viewers. On the contrary, there is a risk that low-quality bullets, such as rude and aggressive comments, may cause discomfort to users and reduce the overall watching enjoyment. In order to solve this, famous bullet-screen video websites like Bilibili and Tencent Video have provided basic bullet filter functions to remove bullets according to user’s setting. Undesired bullets will be removed according to the position in the screen, font, size, and keywords blacklist. However, due to the diversity and flexibility of natural language, regular expression based bullet filter may have a high false negative rate and can be easily bypassed by making small modifications to the original comment. Therefore, there is a need for a more reliable bullet filter that can sieve bullet comments in accordance with bullet content intelligently to reduce the website maintenance cost and provide a more friendly comment sharing environment to the community.

Fig. 2: SmartBullets Framework

Natural language processing (NLP

) is sub-field artificial intelligence technique that allows the computer to automatically analyze, understand and represent human language

[2]. In recent years, deep learning technologies, such as convolutional neural network (CNN

), recurrent neural network (


) and long short-term memory (

LSTM), have produced state-of-the-art results in natural language processing field [3]. Deep learning-assisted NLP models have been widely employed in many practical applications, such as POS tagging [4], sentiment classification [5], and machine translation [6]. [7]

uses NLP to perform sentiment analysis of online product review data on Amazon.

[8] trains a bidirectional RNN to synthesize control program based on natural language input. [9] proposes CharSCNN, a deep convolutional neural network to do sentiment analysis of short texts, like Twitter messages. There are also studies that make use of deep learning to extract information of video from bullet-screen comments [10] [11] [12] [13] [14]. However, to the best of our knowledge, there is no previous study on using deep learning based natural language processing technologies to study the quality of danmaku.

In this paper, we propose SmartBullets, a cloud-assisted bullet filter framework based on deep learning technologies. SmartBullets consists of a bullet filter that runs on the cloud server and a script program which is embedded into the user’s web browser. We utilize a trained CNN model to category bullet comments into two classes according to their qualities. Users can simply enable the filter function through the extension’s interface so that bullets with low quality will be filtered. To demonstrate the effectiveness of our framework, we design and implement SmartDanmu, a public Google Chrome extension that can help the user remove the undesired bullets on Bilibili website. We claim that our framework can also be applied on other danmaku-enabled video websites with necessary modification according to the websites’ APIs. Our main contributions can be summarized as follow:

  • We review and analyze the research related to bullet-screen comments processing and application, and identify there is a need for more intelligent bullets filtering functions.

  • Based on the state-of-the-art natural language processing techniques, we design and implement a cloud-assisted bullet-screen filtering framework, including a CNN based bullet quality classifier that running on the cloud server and SmartDanmu, a public and convenient front-end Google Chrome browser extension to enhance the scalability.

  • We evaluate our prototype with 40 volunteers. The summary of the survey shows that our bullet filter can effectively remove low-quality bullets and enhance the users’ overall enjoyment while watching danmaku videos. We also open-source our code to encourage the research in danmaku community.

The rest of paper is organized as follow. Section II gives the introduction of the related research work on Danmaku. Section III introduces the complete design of SmartBullets, including the whole framework, the CNN model for bullet-screen comment classification, and the design of SmartDanmu Chrome browser extension. The detailed description of the framework implementation will be presented in Section IV. After that, Section V introduces the evaluation result. Future work will be discussed in Section VI. Finally, Section VII summarizes the paper.

Ii Related Work

Inspired by the rapidly increased social media impact of bullet-screen videos, there are more and more studies related to danmaku proposed. [15] analyzed the comment distribution of bullets over natural time and discovers the burst patterns of danmaku system. [10] designed a new application that extracts time-sync tags for video shots by automatically exploiting bullet comments of the video. After that, Lv et al.

proposed T-DSSM, a temporal deep structured semantic model which can represent bullet-screen comment into semiotic vectors

[11]. T-DSSM is further used to label highlight shots in videos. Chen et al. took advantage of the real-time property of bullets and proposed a personalized keyframe recommendation system [12]. In 2016, He et al. made use of danmaku to predict the popularity of a videos [13]. On the other hand, Chen et al. employed deep learning model that trained by a bullet-screen comment dataset to predict the attractiveness of fine-grained videos [14]. Other research related to danmaku in recent years can be found in [16] [17] [18] [19].

Iii Cloud Assisted Bullet Screen Filter Design

Iii-a Framework Overview

Our framework mainly comprises two parties, namely a deep learning based bullet classification model that runs on the cloud server and a front-end script program that embedded into the user’s browser. As illustrated in Fig. 2, the overall workflow of SmartBullets can be presented in four steps:

  • Step 1: When a user visits the webpage of a danmaku-enabled video, the browser will initial an HTTP request to the video website server.

  • Step 2: If the website server receives the request for a specific video, the server will return the client all the files that the browser needs, including general webpage files, such as HTML, Javascript, CSS, and a danmaku file that contains all bullets information of the video, usually in JSON or XML format.

  • Step 3: If the user enables our script extension in the browser, the browser will forward the processed danmaku file to the cloud server after received the response from the video website.

  • Step 4: After the cloud server received the danmaku file from the user, it will feed the danmaku data to the bullet filer as input after necessary pre-processing. The bullet filter will determine whether a bullet is in low-quality or not. The cloud server will then return the client a list of the indexes of low-quality bullets that should be removed from the danmaku file according to the prediction result of the filter.

After receiving the indexes from the server, the script program will then forward the bullets according to the indexes to the danmaku implementation function. Finally, the user is able to watch the video with the processed high-quality bullets.

Iii-B Deep Learning Based Bullet Filter

Iii-B1 Raw Dataset

Benefit from the blossoming of NLP research, there are many benchmark public datasets for sentiment analysis, such as movie review dataset by Stanford University [20], tweets dataset by Kaggle [21], and dataset for Chinese natural language processing [22]. However, to the best of our knowledge, there is no public, well-organized dataset related to danmaku quality available. Authors of [11] and [10] collected their own dataset for analysis. However, since both [11] and [10] utilize danmaku data for video tagging and recommendation, their datasets are not appropriate for bullet filtering.

Fortunately, popular danmaku video websites, such as Bilibili and Tencent Video, provide web APIs for developers and users to gather danmaku data. We employ a web crawler to gather danmaku data from Tencent Video. We choose Tencent Video because the API provides the upcount number of each bullet, which will be an important reference to determine the bullet quality in our framework. The web crawler runs for around 30 minutes and gathered 100 thousand bullets data from around 120 videos, the vast majority of which are in Chinese. The original danmaku data is in JSON format, and the related attributes of each bullet are shown in Table 1.

Attribute Meaning
CommentID the unique identity of a bullet
Content the text content of a bullet
Upcount the up count number that a bullet earns
IsFriend the number of user’s friends’ upcounts
IsOp the number of user’s opponents’ upcounts

TABLE I: Related Attributes of Bullet Data

We empirically calculate the overall score of each bullet using equation (1). The score is used as a reference to label the records in the next steps.


We clean the raw dataset by removing all type errors and unrelated attributes. We finally format the raw dataset with each record to be pair.

Iii-B2 Pre-processing

The raw dataset contains too much noise that will destroy the deep learning model. Therefore, basic pre-processing is need before feeding the data to the model. In summary, our pre-processing to the raw dataset mainly includes three steps, as shown below.

  • Tokenization: Since a Chinese sentence is written without space between words, it needs to be split into word segments before the word embedding process. In this step, bullets from the raw dataset are split into various length word lists.

  • Stopwords: Stopwords are the most common words in language and in general have little contribution to the meaning of the sentence, such as ‘what’. In this step, stops words in each bullets string will be removed according to a stopwords dictionary.

  • Aggregation: One special property of bullet screen comments is that there are generally a number of repeated bullets in the comment data. In our dataset, two bullets with same content may have different upcount number due to the time of post, which will disturb the overall distribution of the dataset. In this step, we aggregate all repeated bullets by summing up the corresponding scores.

After the above three steps process, each record of the raw dataset will become a unique bullet words list with an overall score, and there are around 30 thousand records left in the dataset. We manually select 11541 low-quality records that we believe are offensive and rude, and we label these records as negative. We randomly pick 11541 records from the rest and label them as positive. Finally, We finish the dataset and are ready for the model training.

Iii-B3 Model Structure

Fig. 3: CNN Model Architecture for Sentence Classification from [5].

In 2014, Yoon Kim proposed a CNN model trained on top of pre-trained word vectors for sentence-level classification tasks and achieves remarkable experiment result [5]

. As shown in Fig 3, the first part of the model is a hidden word embedding layer to represent sentences, followed by a convolutional layer with multiple filter widths and feature maps. After that, a max pooling layer is employed. Finally, a fully connected layer with dropout is used followed by the softmax output.

We employ the model architecture described in [5] in our framework with necessary modifications of Chinese word processing to train the bullet classifier.

Iii-C Front-end Chrome Extension

Iii-C1 Bullet Screen Implementation Technique

Generally, a danmaku video website, such as Bilibili and Tencent Video, will have an independent server that maintains the database of the danmaku files and responses the file requests from users. All the bullet comments of each video are stored in a specific format, such as JSON or XML, and includes the attributes of danmaku content and display. In our project, when a user visits a danmaku video on Bilibili, the user’s browser will acquire the webpage of the video which contains a specific query index of the danmaku file, called ‘cid’. The browser will query the danmaku server with the ‘cid’ and then obtain the danmaku file of the video. The danmaku file will be feed into a JavaScript function that will display danmaku comments that overlay the video according to the danmaku file.

Iii-C2 Design of SmartDanmu Extension

In order to remove low-quality bullet comments, the SmartDanmu extension is expected to acquire the original danmaku file of the video, upload the related danmaku information to the cloud server, receive the feedback from the server, clean the danmaku file according to the feedback, and input the cleaned danmaku file to the danmaku display function.

Since we employ centered cloud server to process the danmaku filter request from different clients, it is necessary to reduce the overall computation and communication overhead of the server. In our design, the SmartDanmu extension needs to summarize the danmaku file, and only sends a list of ordered danmaku comments to the server. After classifying the bullets, the server will response the extension a list of binary element (0, 1) that follows the order of comments list, where 1 represents positive, and 0 means the corresponding comments are positive and should not be displayed. Finally, the extension will remove the positive bullets and send the cleaned file to the display function.

Iv Implementation

We implement a prototype of our framework in the laboratory environment, including the CNN-based bullet classifier, a simple cloud web server that handles the requests from users, and the front-end Google Chrome extension that communicates with the cloud server and process the danmaku file. Our source code is available on Github [23].

Iv-a Backend

The bullet classifier and the web server are implemented in Python 2.7. Before training the CNN model, we utilize a popular Chinese words tokenization library Jieba [24] to segment the Chinese sentences into words. Jieba is also able to segment English sentences or Chinese sentences with English words. [25]

is an open source project in Github that provides a Tensorflow-based implementation of the CNN model in

[5]. We apply the code in [25]

to train the bullet classifier with necessary modifications in pre-processing and vocabulary usage. Our model is trained on a server with a 2.40 GHz Intel Xeon CPU and an external NVIDIA GeForce GTX 1080 Ti GPU. We randomly pick 20% records from the dataset as the testing set and the rest as the training set. We utilize an Adam Optimizer with the learning rate set to 0.001. The training process takes around 30 minutes and can achieve over 93% accuracy and almost 94% recall in around 3000 training steps through manually hyperparameter tuning, as shown in Fig. 4.

Fig. 4: Accuracy and Loss of Model Training

We implement a small Python-based web server to listen to a specific TCP port for the bullet filter requests and use HTTP as the application layer communication protocol between the server and the client. In our experiment, the bullet classifier and the web server are running on a private server in our laboratory with a constant IP address. Our prototype can handle up to 200 filter requests simultaneously.

Iv-B Front-end

We implement the front-end Google Chrome extension SmartDanmu, which is used to help the user remove low-quality bullets on Bilibili video website. We note that the extension can be further extended to other danmaku enabled websites with essential modification.

In general, a Chrome extension contains four files, namely a JSON file used for configuration, an HTML file for display, a JavaScript file for functions, and an icon. We firstly modify the JSON file and add the permission of visiting the cloud server. To enable the filter function, we use Ajax Hook [26] to block the original Ajax requests of the video webpage and use JavaScript XMLHttpRequest to send a JSON file that contains the bullet comments to the web server through HTTP protocol. Our implementation borrows necessary danmaku-related functions from Pakku [27], an open-source project that merging repeated bullets on Bilibili. The user interface of SmartDanmu extension is illustrated in Fig. 5.

Fig. 5: SmartDanmu Chrome Extension User Interface

V Evaluation

We evaluate the prototype of our framework with 40 volunteers, most of whom are college students who are familiar with danmaku control and visit Bilibili video website frequently. Each of the volunteers is asked to watch several top trending videos on Bilibili using Google Chrome browser with and without SmartDanmu extension. After that, each volunteer needs to fill a survey of five statements on the overall user experience, as shown below.

  • Legibility: The instruction of SmartDanmu is clear and easy to read.

  • Fluency: The video webpage is loaded fluently after enabling SmartDanmu.

  • Effectiveness: Most of the low-quality (rude or aggressive) are successfully removed after enabling SmartDanmu.

  • User Interface: The user interface design of SmartDanmu is clear and friendly.

  • Experience: SmartDanmu increase the overall watching experience of danmaku enabled video on Bilibili.

For each statement, each volunteer can select one of the three choices, ‘Agree’, ‘Partially Agree’ or ‘Disagree’ to express his/her opinion on that statement. The survey result is shown in Table 2.

Statement Opinion
Level Agree Partially Agree Disagree
Legibility 37 0 3
Fluency 28 8 4
Effectiveness 30 6 4
UI 35 4 1
Experience 31 7 2

TABLE II: Related Attributes of Bullet Data

From Table II, we can learn that 28 out 40 volunteers claim that the webpage loading is still fluent when the SmartDanmu extension is enabled, which means the latency caused by communicating with cloud server is tolerant and even negligible for the video watchers. There are still 12 volunteers declare that they can feel the inconsistency of the video page loading to an extent. We note that a server with more computational and communication resource will improve this delay. Meanwhile, around 75% participants think SmartDanmu could effectively remove low-quality bullets and increase the overall watching experience compared with the original danmaku videos.

Vi Future Work

Our framework is designed to improve the watching experience of users on bullet screen enabled video by filtering out low-quality bullets. However, some users may prefer watching high-quality bullets only. Selecting high-quality bullets is more difficult since they may follow a complex and dynamic distribution over different videos and time, which requires much larger dataset and periodically retraining of the model. Therefore, we suggest that high-quality bullets filtering could be the future work of this paper.

Vii Conclusion

In this paper, we propose a cloud-assisted user-centered bullet screen filter framework to remove low-quality bullets on bullet-screen enabled video websites. We train a convolutional neural network as the bullet filter that running on the cloud. We also design a front-end browser extension that will communicate with the cloud server and process the danmaku files. We design and implement a prototype in our laboratory and evaluate our framework over 40 volunteers. The survey result shows that our framework is able to effectively remove low-quality bullet screen comments and improve the overall watching experience of users.


  • [1] “Alexa, An company.”
  • [2] J. Hirschberg and C. D. Manning, “Advances in natural language processing,” Science, vol. 349, no. 6245, pp. 261–266, 2015.
  • [3] T. Young, D. Hazarika, S. Poria, and E. Cambria, “Recent trends in deep learning based natural language processing [review article],” IEEE Computational Intelligence Magazine, vol. 13, pp. 55–75, Aug 2018.
  • [4] Z. Huang, W. Xu, and K. Yu, “Bidirectional lstm-crf models for sequence tagging,” arXiv preprint arXiv:1508.01991, 2015.
  • [5] Y. Kim, “Convolutional neural networks for sentence classification,” arXiv preprint arXiv:1408.5882, 2014.
  • [6] J. Gehring, M. Auli, D. Grangier, D. Yarats, and Y. N. Dauphin, “Convolutional sequence to sequence learning,” in

    Proceedings of the 34th International Conference on Machine Learning-Volume 70

    , pp. 1243–1252, JMLR. org, 2017.
  • [7] X. Fang and J. Zhan, “Sentiment analysis using product review data,” Journal of Big Data, vol. 2, no. 1, p. 5, 2015.
  • [8] C. Liu, X. Chen, R. Shin, M. Chen, and D. Song, “Latent attention for if-then program synthesis,” in Advances in Neural Information Processing Systems, pp. 4574–4582, 2016.
  • [9] C. Dos Santos and M. Gatti, “Deep convolutional neural networks for sentiment analysis of short texts,” in Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pp. 69–78, 2014.
  • [10] B. Wu, E. Zhong, B. Tan, A. Horner, and Q. Yang, “Crowdsourced time-sync video tagging using temporal and personalized topic modeling,” in Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 721–730, ACM, 2014.
  • [11] G. Lv, T. Xu, E. Chen, Q. Liu, and Y. Zheng, “Reading the videos: Temporal labeling for crowdsourced time-sync videos based on semantic embedding,” in Thirtieth AAAI Conference on Artificial Intelligence, 2016.
  • [12] X. Chen, Y. Zhang, Q. Ai, H. Xu, J. Yan, and Z. Qin, “Personalized key frame recommendation,” in Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 315–324, ACM, 2017.
  • [13] M. He, Y. Ge, L. Wu, E. Chen, and C. Tan, “Predicting the popularity of danmu-enabled videos: A multi-factor view,” in International Conference on Database Systems for Advanced Applications, pp. 351–366, Springer, 2016.
  • [14] X. Chen, J. Chen, L. Ma, J. Yao, W. Liu, J. Luo, and T. Zhang, “Fine-grained video attractiveness prediction using multimodal deep learning on a large real-world dataset,” in Companion Proceedings of the The Web Conference 2018, WWW ’18, (Republic and Canton of Geneva, Switzerland), pp. 671–678, International World Wide Web Conferences Steering Committee, 2018.
  • [15] M. He, Y. Ge, E. Chen, Q. Liu, and X. Wang, “Exploring the emerging type of comment for online videos: Danmu,” ACM Transactions on the Web (TWEB), vol. 12, no. 1, p. 1, 2018.
  • [16] J. Li, Z. Liao, C. Zhang, and J. Wang, “Event detection on online videos using crowdsourced time-sync comment,” in 2016 7th International Conference on Cloud Computing and Big Data (CCBD), pp. 52–57, Nov 2016.
  • [17]

    H. Sakaji, M. Kohana, A. Kobayashi, and H. Sakai, “Estimation of tags via comments on nico nico douga,” in

    2016 19th International Conference on Network-Based Information Systems (NBiS), pp. 550–553, IEEE, 2016.
  • [18] A. Ikeda, A. Kobayashi, H. Sakaji, and S. Masuyama, “Classification of comments on nico nico douga for annotation based on referred contents,” in 2015 18th International Conference on Network-Based Information Systems, pp. 673–678, IEEE, 2015.
  • [19] Q. Ping and C. Chen, “Video highlights detection and summarization with lag-calibration based on concept-emotion mapping of crowd-sourced time-sync comments,” arXiv preprint arXiv:1708.02210, 2017.
  • [20] “Large movie review dataset.” Accessed: 2019-03-30.
  • [21] “Sentiment140 dataset with 1.6 million tweets.” Accessed: 2019-03-30.
  • [22] “Awesome-chinese-nlp.” Accessed: 2019-03-30.
  • [23] Y. Z. Haoran Niu, Jiangnan Li, “SmartBullets for BiliBili.”, 2019.
  • [24] S. Junyi, “Jieba Chinese Word Segmentation.”, 2018.
  • [25] D. Britz, “Convolutional Neural Network for Text Classification in Tensorflow.”, 2018.
  • [26] wendux, “Ajax-hook.”, 2016.
  • [27] xmcp, “Danmaku Filter for BiliBili.”, 2019.