1. Introduction
Recently, watching online videos of news and amusement have become mainstream entertainment during people’s leisure time. The booming of online videosharing websites raises significant challenges in effective management and retrieval of videos. To address that, many text retrieval based automatic video tagging techniques have been proposed (Siersdorfer et al., 2009; Raamkumar et al., 2017; Ramaboa and Fish, 2018). However, these techniques can only provide videolevel tags (Wu et al., 2014). The problem is that even if these generated tags can perfectly summarize the video content, users have no idea how these tags are associated with the video playback time. If videos are associated with timesync tags, users can preview the content with both thumbnails and text along the timeline, and this textual information can further enhance users’ search experience. Although there are many video content analysis algorithms that can generate video tags with timestamps (Hussein and Piccardi, 2017; Chen et al., 2017a), their time complexities are too high for largescale video retrieval. Fortunately, a new type of review data, i.e., timesync comments (TSCs) appear on video websites like Youku (www.youku.com), AcFun (www.acfun.tv) and BiliBili (www.bilibili.com) in China, and NicoNico (www.nicovideo.jp) in Japan.
In this paper, we focus on extracting timesync video tags from TSCs efficiently, which can enhance users’ search experience. When watching a video, many people are willing to share their feelings and exchange ideas with others. TSC is such a new form of realtime and interactive crowdsourced comments (Wang et al., 2017, 2016a; Gu et al., 2017; Hyung et al., 2017). TSCs are displayed as streams of moving subtitles overlaid on the video screen, and convey information involving the content of current video frame, feelings of users or replies to other TSCs. In TSCenabled online video platforms, users can make their comments synchronized to a video’s playback time. That is, once a user posts a TSC, it will be synchronized to the associated video time and immediately displayed onto the video. All viewers (including the writer) of the video can see the TSCs when they watch around the associated video time. Moreover, each TSC has a timestamp to record the corresponding video time when posted. Therefore, compared with traditional video reviews, TSCs are much easier to obtain the local tags with timestamp rather than videolevel tags. Moreover, the TSCs are more personalized than traditional reviews, therefore the tags generated by TSCs can better reflect the user’s perspective. The users can thereby get highquality retrieval results when they search for videos with these tags (Wu et al., 2014).
Recently, some methods have been proposed to generate temporal tags or labels based on TSCs. Wu et al. (Wu et al., 2014) use statistics and topic model to build Temporal and Personalized Topic Modeling (TPTM) to generate temporal tags. However, their approach is based on the Latent Dirichlet Allocation (LDA) model (Blei et al., 2003), which has poor performance when dealing with short and noisy text like TSC (Yan et al., 2013). Lv et al. (Lv et al., 2016) propose a Temporal Deep Structured Semantic Model (TDSSM) to generate video labels in a supervised way. However, their approach does not consider the semantic association between TSCs, so that some of the video contentindependent noises cannot be processed. In summary, TSCs have some features distinguished from the common comments (Yang et al., 2017; Liao et al., 2018), which make the above methods not very effective in the TSCs: (1) Semantic relevance. Abundant video semantic information is contained that describes both local and global video contents by selecting the time interval of the timestamp. (2) Realtime. TSC is synchronous to the realtime content of the videos. Users may produce different topics when it comes to the same video contents. (3) Herding effects. Herding effects are common in TSCs (He et al., 2016; Yu et al., 2015). That means, latter TSCs may depend on the former ones and have a semantic association with the preceding ones. (4) Noise. Some video contentindependent comments and internet slang are included in TSCs, which makes trouble for tag extraction. Due to the above features of TSCs, how to deal with the herding effects, distinguishing the importance of each TSC and consequently identify highimpact TSCs and noises are the major challenges for extracting video tags from TSCs.
To make full use of the features of TSC and tackle the above challenges, we propose a graphbased algorithm named Semantic WeightInverse Document Frequency (SWIDF) to generate timesync video tags automatically. More precisely, we design to reduce the impact of noises by clustering the semantic similar and timerelated TSCs and identify highimpact TSCs by their semantic relationships. Intuitively, TSCs including video tags are usually within hot topics and impact on the trend of their followup TSCs. On the contrary, the noises usually neither have similar semantic relationships with other TSCs over a period nor influence other TSCs (Yang et al., 2017). Moreover, we find that the density of TSCs (number of TSCs per unit time) affects how users communicate. When the density is low (the TSC in a period is sparse), the user can more clearly distinguish the content of each nearby TSC, and therefore is more likely for the user to reply to a specific TSC when posting the new one. Conversely, when the density is high (the TSC in a period is dense), the user cannot clearly distinguish the content of each TSC, but only roughly distinguish the topic of these TSCs. Therefore, the user is more likely to reply to the entire topic instead of a specific TSC. Specifically, in the SWIDF algorithm, we first treat the TSCs as vertices, generating the semantic association graph (SAG) based on semantic similarities and timestamps of TSCs. Then, we intend to cluster TSCs into different topics. For the videos with lowdensity TSCs, we propose a dialoguebased clustering algorithm, which is inspired by community detection theory (Huang et al., 2017; Fortunato, 2010; Lancichinetti and Fortunato, 2009). For the videos with highdensity TSCs, we propose a topic centerbased cluster algorithm, which is a novel hierarchical agglomerative clustering (Pandove et al., 2018; Murtagh and Legendre, 2014; Murtagh and Contreras, 2012). These two cluster algorithms can identify the topic of each TSC and distinguish the popularity of each topic in any case. In the clustered subgraph, the indegrees of each TSC express its affecting TSCs, while the outdegrees express its affected TSCs. Therefore, we design a graph iteration algorithm to assign the weight of each TSC by its degrees so that we can differentiate the meaningful TSCs from noises. Moreover, similar to TFIDF algorithm, we gain the weight of each word by combining Semantic Weight (SW) and Inverse Document Frequency (IDF) and the video tags are extracted automatically.
Particularly, this paper is an extended version of (Yang et al., 2017). In this extended version, we propose a novel topic centerbased cluster algorithm at first, which is more suitable for highdensity TSCs. Then, we provide a greedy optimization for the topic centerbased algorithm and prove this optimization will not delete any valid case. Finally, we add more experiments to verify the effectiveness of the algorithms. The main contributions of our paper are as follows:

We propose a novel graphbased Semantic WeightInverse Document Frequency (SWIDF) algorithm, which can extract video tags in an unsupervised way by mining TSCs.

We design two graph clustering algorithms based on the density of the TSCs, i.e., dialoguebased clustering algorithm and topic centerbased cluster algorithm, to cluster in the semantic association graph (SAG). These algorithms take the features of TSCs into account and effectively reduce the impact of noises.

We evaluate our proposed algorithms with realworld datasets on mainstream videosharing websites and compare results with classical keyword extraction methods. The results show that SWIDF outperforms baselines in both precision and recall of video tag extraction.
2. Related Work
In this section, we introduce the related work from four aspects.
2.1. Analysis of timesync video comments
TimeSync Comments (TSCs) provide a new source of information regarding the video and have received growing research interests. Wu et al. (Wu et al., 2014) first introduce TSCs and propose a Temporal and Personalized Topic Modeling (TPTM) to generate temporal tags. However, their approach is based on the Latent Dirichlet Allocation (LDA) model (Blei et al., 2003), which has poor performance when dealing with short text like TSC (Yan et al., 2013). To describe the video more specifically, Xu and Zhang (Xu and Zhang, 2017) extract representative TSCs based on a temporal summarization model. Their methods need the preextracted keywords in the TSCs, so our algorithm can improve the effectiveness of them. There are also some other applications based on TSCs. Lv et al. (Lv et al., 2016)
propose a Temporal Deep Structured Semantic Model (TDSSM) to represent comments as semantic vectors and recognize video highlights by semantic vectors in a supervised way. They are the first to analyze the TSC using the neural network. Then, Chen et al.
(Chen et al., 2017b) propose the neural network based collaborative filtering to recommend the personalized keyframe from TSCs. However, both the models of (Lv et al., 2016) and (Chen et al., 2017b) rely on a large amount of humanlabeled video segments or predefined emotional tags to train, which limits its applicability to more general scenarios. In this paper, we design a novel graphbased algorithm according to the features of TSC to efficiently and accurately extract keywords automatically in an unsupervised way.2.2. Tag/keyword extraction
Keyword extraction is a classical problem in the field of information retrieval. At present, mainly three categories unsupervised keyword extraction methods are available. The first one is based on word frequency statistics, where TFIDF is the most commonly used and wellknown method. However, this kind of methods only consider the frequency of words and ignore the semantics, which may generate keywords that are not related to video content. The second kind of methods depends on the cooccurrence of words, such as textrank (Mihalcea and Tarau, 2004), which is a graphbased ranking model. Similar to the first one, this kind of methods does not consider semantics either, so it cannot solve the noise well. And the last one is according to the topic model. It brings documenttopic and topicword distribution together by simulating document generation process. Blei et al. (Blei et al., 2003) propose the Latent Dirichlet Allocation(LDA) model, the most representative model. To better deal with short text situation, Yan et al. (Yan et al., 2013) propose the Biterm Topic Model (BTM), which models the generation of word cooccurrence patterns (i.e., biterms) in the whole corpus directly. Yin and Wang (Yin and Wang, 2014, 2016) propose the Gibbs Sampling algorithm for the Dirichlet multinomial mixture model for short text clustering and keyword extraction. Although the topic modelbased approaches consider the semantics, their basic hypothesis is that the generation of each word is independent and identically distributed. However, some TSCs are generated by herding effects, which does not satisfy the assumptions. Compared with the methods above, our algorithms are welldesigned to identify noises by analyzing the semantic relationship between TSCs.
2.3. Semantic similarity
Semantic similarity calculation is an essential issue of natural language processing, which is widely used in text classification
(Wang et al., 2016b), fuzzy retrieval (Alhabashneh et al., 2017), and so on. Generally, there are mainly two kinds of approaches to measuring the similarity of documents. One is based on the similarity of the words in sentences. The representations of this approach are proposed by (Kenter and De Rijke, 2015)on unsupervised learning and
(Socher et al., 2011; Kusner et al., 2015)on supervised learning. Considering that timesync comments contain a mass of newborn internet slangs, it is difficult to obtain accurate results in this way. The other one is based on the sentence vector. The topic model such as LDA, and embedding model such as word2vec
(Mikolov et al., 2013; Levy et al., 2015) are the representations of this kind of methods. Since the embedding model offers much denser feature representation, embedding based similarity computation is better TSCs than the topic modelbased methods. Kenter and De Rijke (Kenter and De Rijke, 2015) propose a supervised learning method based on external sources of semantic knowledge with word embedding, which considers the weight of the semantic feature. In this paper, we only consider the topics discussed by TSCs while the word order will not change the topics discussed in the TSCs. Therefore, the word order is not important and the sentence2vec (Iacobacci et al., 2015; Levy and Goldberg, 2014b)and deep learning
(He et al., 2015; Mueller and Thyagarajan, 2016) based methods are not used in this paper.2.4. Graph clustering algorithm
Graph clustering algorithms have attracted much research interest in the past. There are two main theories, i.e., community detection theory and hierarchical agglomerative clustering inspired our work. Community detection theory is first proposed by (Newman and Girvan, 2004) to make natural divisions of network nodes into densely connected subgroups, which brings great inspiration to the graph clustering field. Recently, Ramezani et al. (Ramezani et al., 2018) exploit the diffusion information and utilize the conditional random fields to discover the community structures. Li et al. (Li et al., 2018a) propose a novel local expansion via minimum one norm approach for finding overlapping communities, and provide the theoretical analysis of the local spectral properties. Chakraborty et al. (Chakraborty et al., 2016) find that the belongingness of nodes in a community is not uniform and design a new vertexbased metric to quantify the degree of belongingness within a community. To reduce the time complexity, Bae et al. (Bae et al., 2017) propose an algorithm to optimize the map equation, which makes the iterations take less time, and the algorithm converges faster. These abovementioned community detection theory based graph clustering algorithms provide us with good inspiration for designing dialoguebased clustering algorithms. Besides, hierarchical agglomerative clustering is also a method of graph clustering (Pandove et al., 2018; Murtagh and Legendre, 2014; Murtagh and Contreras, 2012). Recently, Pang et al. (Pang et al., 2015) propose a topicrestricted similarity diffusion process to efficiently identify real topics from a large number of candidates. Although their method has a good clustering effect, it has a high time complexity and is not suitable for largescale data. Compared with the aforementioned hierarchical agglomerative clustering algorithms, we proposed a novel topic centerbased clustering algorithm have lower time complexity under the condition of ensuring accuracy.
3. Algorithms
In this section, we first introduce the construction of Semantic Association Graph (SAG) for TSCs with their semantic similarity in Section 3.1. Then, we propose two graph cluster algorithms, i.e., dialoguebased algorithm and topic centerbased algorithm, to cluster the TSCs into subgraphs of different topics in Section 3.2. Moreover, we propose an outin degree iterative algorithm to get the weight of each TSC and extract keywords as video tags automatically by combining Semantic Weight (SW) and inverse document frequency (IDF) in Section 3.3. Finally, we give the complexity analysis in Section 3.4.
The Notation list is shown in Table 1.
3.1. Preliminaries and Graph Construction
Directed graph  
Set of nodes  
Set of Edges  
Number of nodes in  
Number of edges in  
node in  
edge in  
Timestamp of node  
Topic set of node  
Number of nodes in set  
The first node of edge  
The second node of edge  
The weight of edge  
Attenuation coefficient  
Threshold of dialogue bsed intracluster density  
Threshold of topic based intracluster density  
The embedding vector of TSC  
Topic center vector of the set  
Start time of the set  
Center time of the set  
Universal set of topic sets  
A set that matches  
Max Affinity value of set  
A priority queue with set pairs  
A queue with sets to be updated  
Popularity of comment  
Total number of topics in SAG  
Influence matrix  
Influence value of comment i after k iterations  
Weight of comment 
In this section, we construct the semantic association graph and define the attributes in the graph.
Since TSCs appear in chronological order, they can only affect the upcoming TSCs rather than prior TSCs. We use a directed graph to describe the relationships between TSCs and construct the semantic association graph (SAG).
In SAG, the vertices (nodes) are TSCs and the edges reflect their semantic association in a topic. Let denote the directed graph, represented by , where and are the sets of nodes and edges. Specifically, , , where is the number of nodes in , and is the number of edges in . For each TSC , it has a timestamp , denoting the post time in the video, where . Since the TSCs are the short texts (Wu et al., 2014), in our algorithm, we assume that each TSC has one exact topic. For vertex , is used to describe the set that contain the vertices which have the same topic as and is used to express the number of vertices in set . We use the domain to describe the attributes of edges. For edge , and are two vertices that are linked by edge where . The weight of edge is described as . Besides, also describes the edge with vertices and where . Next we will provide the definition of edge weights.
As we mentioned in Section 2, an embedding based method word2vec (more details see Section 4.1) is selected to calculate the semantic similarity between each pair of TSCs. Since we only care about the topic of the TSC, the word order is not important. In this paper, we calculate the mean vector of each word in a TSC as the sentence vector. We set the dimension of each vector as . Therefore, the semantic similarity between TSC and is calculated by the cosine angle between vectors:
(1) 
Besides, the greater the timestamp interval between two TSCs, the less likely they are in the same topic. So we use the exponential function to express the decay of TSC associations:
(2) 
where
is a hyperparameter that control the decay speed.
Combining the semantic similarity and the time decay, the weight of edge that link vertices and is defined by
(3) 
Empirically derived threshold, two TSCs with a negative weight edge are less semantically related (because their angle in the semantic embedding space is greater than ), and negative edge weights are inconvenient to calculate in graph algorithms. Therefore, when , we set and delete this edge.
For a more intuitive description, an example of SAG construction is shown in Fig. 1 (a), which is a UEFA Champions League video. We select 10 TSCs as nodes and construct the SAG. User A made the TSC 1⃝ as “Great player Messi!” when he saw the goal. Then user B responded with “Messi deserves MVP!” as the TSC 3⃝. User C makes a TSC “What is the BGM ?” as TSC 2⃝ to ask the background music, which deviates the video content. So the TSC 2⃝ has the less semantic association with other TSCs, while TSC 1⃝ and TSC 3⃝ have a semantic edge.
3.2. Topic Partitioning
In this section, we will partition the topic of each TSC according to the semantic relationships in SAG. In our algorithm, the TSC that has the similar semantics and similar timestamps should belong to the same topic. However, the density of TSCs (number of TSCs per unit time) affects how users communicate. Therefore, we propose a dialoguebased cluster algorithm in Section 3.2.1 for the videos with sparse TSCs and a topic centerbased cluster algorithm in Section 3.2.2 for the videos with dense TSCs.
3.2.1. Dialoguebased Algorithm
First, we provide a dialoguebased algorithm. When the density of TSCs is low, the user can more clearly distinguish the content of each nearby TSC, and therefore is more likely for the user to reply to a specific TSC when posting the new one. Therefore, we cluster the TSCs according to the semantic relationship between each pair. The main idea is that the mean weight of edges in intratopic is large while the mean weight of edges that link different topics is small, which satisfies community detection theory (Lancichinetti et al., 2008).
Specifically, in the beginning, each TSC belongs to a unique topic. We use a unique set that only contains itself to achieve the objective. That is, for , . Then edges in set are sorted by descending order of weight. The new edge set is obtained, where . We process each edge from to . For edge , and represent the set and . The set and should be merged if and only if TSCs in two sets discuss the similar topics. Therefore, we merge and if
(4) 
and
(5) 
where is the threshold of intracluster density. That is, we merge S1 and S2 only if the average edge weight of the their union is greater than the threshold. In this paper, disjointset (unionfind set) algorithm (Tarjan, 1975) is used to merge the sets efficiently. When all the edges are solved, TSCs with high semantic similarity are merged into a topic, and the intracluster density of each subgraph is higher than the threshold.
An example of dialoguebased topic partitioning is shown in Fig. 1 (b1). The SAG constructed in Fig. 1 (a) is finally partitioned into two topics marked as red and blue, and several noises marked as purple in Fig. 1 (c) . The TSC “Great player Messi!” and “Messi deserves MVP!” belong to the red topic, while the TSC “What is the BGM ?” is identified as a noise.
The full algorithm is shown in Algorithm 1.
3.2.2. Topic Centerbased Algorithm
In the dialoguebased algorithm, we assume that TSCs are in the form of dialogues. However, when the density of TSCs is high, the user cannot clearly distinguish the content of each TSC, but only roughly distinguish the topic of these TSCs. Therefore, the user is more likely to reply to the entire topic instead of a specific TSC. The results of dialoguebased model will be disturbed by these situations. Therefore, we provide a Topic Centerbased algorithm, which is inspired by Hierarchical Agglomerative Clustering (Pang et al., 2015; Murtagh and Legendre, 2014; Murtagh et al., 2008; Murtagh and Contreras, 2012).
Before proposing this algorithm, the definition of topic center is given at first. As we defined in Section 3.1, the set is used to describe the topic and each TSC can be express as an embedding vector by word2vec. The topic center is the average vectors of all TSCs within the topic. We use to express the topic center vector, and and to express the start time and center time of topic set , respectively. Initially, each TSC belongs to a unique topic, so , , where is the sentence embedding vector of TSC . All these sets belong to , which is the universal set of topic sets.
Generally, this algorithm can be divided into two parts. (1) Find the nearest two topic centers. (2) Merge the two topic centers. It is actually a Nearest Neighbor Search (NNS) problem (Bentley, 1975; Alstrup et al., 2000), where the kd tree (Bentley, 1990; Friedman et al., 1977; Bentley, 1975) is one of the most effective methods. However, the analyses of binary search trees have found that the worst case time for range search in a kdimensional kd tree containing N nodes is given by the following equation (Lee and Wong, 1977): . Besides, the kd tree has a larger constant.
In this paper, we propose a greedy algorithm to solve this problem efficiently. In the beginning, for each , we find that
(6) 
where
(7) 
The decay function is still added to avoid that the topics with large time interval are merged.
We use to express the set that matches with maximum value . And the pair is added to a queue , which is a priority queue where the pair with the maximum is the front.
Each time, we take out the front pair , merging and , and pop it, until . When merging sets, the following updates will be done: First, since and are merged, all pairs that contain or , for instance , should be deleted from . Then, these sets that matched or previously like are added into the update list . Next, the sets and are removed from , and a new set is added into and , where
(8) 
(9) 
and
(10) 
That is, the center time and the center vector of are the weighted average of and , and the start time of is the minimum of and . Finally, for each set , we find a new according to Eq.(6) in to match it.
What is more, there exists a greedy optimization in the algorithm. Before giving the greedy optimization, we propose a lemma at first:
Lemma 3.1 ().
For the set , let . Then the pair will never be solved in if .
Proof.
Since , we have , and . There exist two cases:
Case : Then, in the priority queue , the pair will be solved before because . Therefore, the pair will be removed from when solving .
Case :
Then we have (otherwise ). So the pair will be solved before in the priority queue . When solving , will be removed, and set will find a new in . If , then is readded into (at that time, ). Otherwise, . In that case, the pair will be solved before , and will be removed when solving . Therefore, will always be removed and never be solved in any case. ∎
According to Lemma 3.1, we propose the greedy optimization: for the set , if and , then the pair is rejected and not added into .
3.3. Weight Distribution and Tag Extraction
We partition the topic in Section 3.2 and get the topic of each TSC. In this section, we will attribute weight to each TSC according to the influence of its topic and the relationship in the semantic graph.
The weight of a TSC is affected by its topic popularity, so we define the popularity of the TSC as:
(11) 
where is the topic in SAG, and is the total number of topics in SAG. Obviously those topics with fewer TSCs are more likely to be noises and have less weight. According to Eq.(11), noises will have small values of popularity.
Within the topic, a TSC which affects more TSCs and is affected by fewer TSCs should have a higher weight. In order to quantitatively measure the weight of the TSC in a topic, we design a graph iterative algorithm below.
An influence matrix is established at first to express semantic relations within each topic. For the elements in the matrix,
(12) 
we use to denote the influence value of TSC after iterations. For each TSC , initially. Then in the turn of iteration, there are two steps as follows:
(13) 
and
(14) 
In the iteration, we increase the influence value of TSC based on the values of TSCs that affected by TSC . We know that a TSC only affects the TSCs lagging behind it, so the TSCs are processed from down to . That is, before we process TSC , all the TSCs that have been processed. In the iteration, we reduce the influence value of TSC based on the values of the TSCs that affect TSC . Contrary to the iteration, we process the TSCs from to in the iteration.
The iteration process of SAG in Fig. 1 (c) is shown in Fig. 2. Fig. 2 (a) shows the calculation of the last two nodes (marked as red) that need to be processed in the iteration (ignore the noise node ), where the orange edges express their outdegree edges. While Fig. 2 (b) shows the calculation of the last two nodes (marked as red) that need to be processed in the iteration (ignore the noise node ), where the green edges express their indegree edges.
The converged influence values of the 10 TSCs in Fig. 1 (c) is shown in Fig. 3. After 20 iterations, all TSCs converge to the interval .
To combine the popularity and the influence value, the weight of TSC is obtained by
(15) 
where is the number of turns of iterations and depends on the number of nonzero elements in the matrix . Therefore, the weight of each word is formulated as below:
(16) 
where denotes the TSC that word appears and is the inverse document frequency as defined in TFIDF method. We extract words with the highest SWIDF value as video tags. After the above steps, those words which appear in the TSCs that are popular and have high impact will be extracted as tags. The complete algorithm is shown in Algorithm 3.
3.4. Complexity Analysis
In this section, we analyze the time complexity and the space complexity of each algorithm.
In Algorithm 1, the time complexity of the edge sorting algorithm in line 1 is by using quicksort, and the space complexity is . The amortized time complexity of merging sets by disjointset is (Tarjan, 1979) and the space complexity is , where is the inverse Ackermann function that . So the total time complexity of Algorithm 1 is , and the space complexity is .
In Algorithm 2, the time complexity of initialization from line 1 to line 12 is , and the space complexity is . In , the number of times of mergeoperation is limited to (because there are at most sets), and the amortized removal operation is limited to 1 each mergeoperation. For each merge operation, the lookup operation and remove operation can be dealt in by naive algorithm, or by binary balance tree (Bentley, 1975). The worst complexity of total Algorithm 2 is . The total space complexity is just .
In Algorithm 3, the time complexity is and the space complexity is apparently. In our SAG, because two TSCs with a negative semantic similarity do not have an edge. Therefore, in the true TSC data, and the dialoguebase algorithm has a more efficient time complexity than the topic centerbased algorithm.
4. Experimental Study
In this section, we verify the effectiveness of our proposed method by comparing with four unsupervised methods of keyword extraction. The datasets are crawled from AcFun (www.acfun.cn) and Bilibili. We provide the necessary parameters in our algorithms in Section 4.1 and then analyze the performance of our algorithms on video tag extraction in Section 4.2.
4.1. Experimental Setup and Datasets
We crawl TSCs from two famous Chinese timesync comments video websites AcFun and Bilibili. The raw TSC texts are full of noises, so we manually remove nontextual TSCs (such as emojis) and establish a set of mapping rules for network slang, which will be substituted by their real meaning in the text. For instance, 233… (2 followed by several 3) means laughter, 666… (several 6) means playing games very well. After that, we segment the words and remove the anomaly symbol (the symbolic expression, such as a smiley face (^_^) ) in TSCs by an opensource Chineselanguage processing toolbox Jieba ^{1}^{1}1https://github.com/fxsjy/jieba. To analyze the algorithms from different aspects, we collected two datasets. To be specific, in the first dataset (called it D1), totally 287 videos with 227,780 comments are collected randomly from music, sports, and movie. To set the hyperparameters in this paper, we select 167 videos with 126,146 TSCs for the validation set and 120 videos with 101,634 comments for the test set. In the second dataset (called it D2), totally 180 videos with 569,996 comments are collected from Japanese anime. We use D1 to compare our algorithms with baselines, and use D2 to accurately analyze the effects of the two algorithms we proposed at different densities.
We define the density of TSCs as the average number of TSCs per minute. In D2, we divide the density into 5 levels: 030, 3060, 6090, 90120 and more than 120 (the intervals are leftclosed and rightopen). More details include the length of the video, total number of TSCs, density and the number of videos about test set are shown in Table 2 for D1 and Table 3 for D2.
Validation set  Test set  
Total length (minute)  1,573.29  1,441.38 
Total TSCs number  126,146  101,634 
Density  80.18  70.51 
Total video number  167  120 
030  3060  6090  90120  ¿120  
Total length (minute)  644.37  433.01  855.40  883.61  1,221.55 
Total TSCs number  11,489  19,368  60,152  99,671  379,316 
Density  17.83  43.72  70.32  112.80  310.52 
Total video number  29  21  37  42  51 
We select two undergraduate students and one Ph.D. student as volunteers. For each video, each volunteer chooses 15 words from TSCs and votes them as video tags. The words with two or more votes are selected as the standard tags. Therefore, the number of standard tags per video is different. Moreover, the order of these tags is determined by the number of votes at first. TSCs with more votes rank in front. When the number of votes is the same, the order is determined by the Ph.D. student. ^{2}^{2}2The code of our algorithm is uploaded to https://github.com/sdq11111/SAG.
In Section 3.1, we use the word2vec method get the embedding vectors of TSCs. In this paper, we choose the skipgram model of word2vec to pretrain the word embedding vectors and the training algorithm is hierarchical softmax, because both skipgram model and hierarchical softmax algorithm are better for infrequent words (Mikolov et al., 2013), which is more relevant to the features of the TSCs. We use gensim ^{3}^{3}3https://radimrehurek.com/gensim/models/word2vec.html to train the model, and the training data is crawled from Bilibili with the TSCs of 6,743,912 words. Since we have sufficient training corpus, the dimension of word2vec is set to 300 as (Li et al., 2018b).
To further prove the rationality of using the word2vec to calculate the similarities of the TSCs, we use several traditional unsupervised learning and other word embedding methods to calculate the semantic similarities, i.e.

LDA, a famous topic model based method, Latent Dirichlet Allocation (Blei et al., 2003).

PPMI, a cooccurrence probability based distributional model, Positive Pointwise Mutual Information
(Levy and Goldberg, 2014a) 
GLoVe, a famous word embedding method, Global Vectors for Word Representation (Pennington et al., 2014).
We test the top 10 tag extraction results using the above methods to calculate the similarity and build the graph on the verification set (the hyperparameters used in the experiment are discussed later). In this paper, we use F1score and MAP (Mean Average Precision, which is the mean of the average precision scores for each query (Zhu, 2004)) to measure the performance of tag extraction. The results are shown in Table 4.
Method  F1 (dialogue)  MAP (dialogue)  F1 (topic center)  MAP (topic center) 
LDA  0.3625  0.3372  0.3641  0.3224 
PPMI  0.3919  0.3705  0.4101  0.3806 
HowNet  0.3537  0.3423  0.3468  0.3194 
GLoVe  0.4045  0.4012  0.4202  0.4079 
Word2Vec  0.4183  0.4041  0.4342  0.4160 
The experimental results show that, in the verification set, Hownet performs the worst among the baselines because of the limited number of word lists. LDA also performs poorly because it is not good at handling short texts. Among the word embedding based methods PPMI, GLoVe, and word2vec, word2vec performs best, which indicates that the fully trained word2vec method has better robustness and is more suitable for calculating the similarity of the TSCs.
What is more, in our algorithm, three parameters need to be determined, i.e., the threshold of intracluster density and , and the attenuation coefficient . The and control the accuracy of topic clustering. The is the attenuation coefficient of the interval between timesync comments, which controls the value of the edge weights in the graph.
We first fix and adjust the values of and so that the F1score and MAP in the verification set are optimal. Then, we select the optimal and and readjust so that the F1score and MAP in the verification set is optimal. In Bilibili video site, the default time for each TSC to appear on the screen is 10 seconds. Therefore, we assume that the semantic halflife of each TSC is 5 seconds, and calculate the initial according to Eq. (2).
To determine , we fix , adjusting
from 0 to 0.5 in 0.02 steps and observe the Fscore and MAP of Top 10 tagging results generated by the dialoguebased algorithm. The results of F1score and MAP in the validation set are shown in Fig.
5 and Fig.5, respectively. Both in F1score and MAP, gains better results in the range of 0.32 to 0.38 and get optimal performance at 0.34. Therefore, we choose for the following experiments.To determine , we also fix , adjusting from 0 to 0.5 in 0.02 steps and observe the Fscore and MAP of Top 10 tagging results generated by the topic centerbased algorithm. The results of F1score and MAP in the validation set are shown in Fig.7 and Fig.7, respectively. For F1score, gains better results in the range of 0.34 to 0.42 and get optimal performance at 0.40. For MAP, gains better results in the range of 0.34 to 0.40 and get the optimal performance at 0.38. Considering both F1score and MAP, we choose for the following experiments.
With the optimal and obtained before, we readjust from 0 to 0.2 in steps 0.01, and observe the Fscore and MAP of video tags generated by our algorithms. The results of F1score and MAP in the validation set are shown in the Fig. 9 and Fig. 9. For the dialoguebased algorithm, gains better performance in the range of 0.10 to 0.13 and gets optimal performance at 0.12 for F1score and 0.11 for MAP. For topic the centerbased algorithm, gains better performance in the range of 0.10 tp 0.14 and gets optimal performance at 0.13 for both F1score and MAP. To take comprehensive consideration of both F1score and MAP, we choose for the dialoguebased algorithm, and for the topic centerbased algorithm in the following experiments. In fact, when , the semantic association graph is independent of time; when , all weights of edge equal to 0, and our model is equivalent to TFIDF.
Besides, the number of iterations also needs to be determined. We count the number of iterations when algorithms converge at different densities (we consider the algorithm converges when the average of ), the results are shown in Table 5.
030  3160  6090  90120  120  
Dialogue  7.32  13.59  27.59  35.15  43.82 
Topic center  6.89  14.92  23.15  31.42  45.62 
As shown in Table 5, when the density of TSCs is low, the SAG generated by two algorithms is sparse, and therefore the number of iterations is few. As the density increases, the SAG becomes dense and the number of iterations increases. To simplify, we choose in the experiment.
4.2. Results
In this section, we first use D2 to analyze the clustering effect of the two algorithms we proposed at different densities. Then, we use the test set of D1 to verify the effectiveness of the greedy optimization we proposed, and compare our algorithms with the existing methods TFIDF, TextRank (Mihalcea and Tarau, 2004), BTM (Yan et al., 2013) GSDPMM (Yin and Wang, 2014, 2016), and TPTM (Wu et al., 2014).
In the beginning, an experiment was designed to compare the clustering effect of the two algorithms. Given a set of topics , two distance scores are introduced (Yan et al., 2013).
Average IntraCluster Distance:
(17) 
Average InterCluster Distance:
(18) 
Since we use function to calculate the semantic similarity between two topics, where the higher the similarity is, the greater the function value is. Intuitively, if the Average IntraCluster Distance is high and the Average InterCluster Distance is low, then the algorithm has a great clustering effect. Therefore, we calculate
(19) 
to evaluate the quality of clustering algorithms as (Guo et al., 2011; Bordino et al., 2010).
Due to the time decay function in the semantic association graph, the value, the IntraDis and the topic number (cluster number) of the videos vary greatly with the video duration. Therefore, we do not calculate the average value of all the videos directly but define an score instead. That is, for each video, we compare the score obtained by the two cluster algorithms, and the algorithm with the larger score obtains a hit. The Hhit that the dialoguebased algorithm gets is called DHit, and the Hhit that the topic centerbased algorithm obtains is called THit.
The results are shown in Fig. 10. The dialoguebased algorithm performs better when the density is lower than 60. As the density increases and exceeds 60, the topic centerbased algorithm performs better than the dialoguebased model. Moreover, we directly compare the top 10 tag extraction results of two clustering algorithms at different densities. The results are shown in Table 6.
030  3160  6090  90120  120  
Dialogue F1score  0.4357  0.4412  0.4219  0.4108  0.4383 
Dialogue MAP  0.3742  0.4027  0.4615  0.4013  0.4872 
Topic center F1score  0.4139  0.4276  0.4275  0.4216  0.4433 
Topic center MAP  0.3615  0.3988  0.4747  0.4077  0.5093 
The tag extraction results are similar to Fig. 10. From Fig. 10 and Table 6, we can conclude that the dialoguebased algorithm is better for videos with a density lower than 60, while topic centerbased algorithm has significant advantages for videos with the density higher than 60, which fits our assumptions in Section 3.2. Based on the conclusions above, in the test set of D1, we consider the videos with the density of TSCs greater than 60 as highdensity videos, and others are lowdensity videos. Then, the test set in D1 is divided into two parts: videos with highdensity TSCs and with lowdensity TSCs. The details are shown in Table 7.
Highdensity  Lowdensity  
Total length (minute)  124.58  1316.80 
Total TSCs number  41,556  60,078 
Density  333.56  45.62 
Total video number  89  31 
We use the data in Table 7 to verify the effectiveness of greedy optimization we proposed in Section 3.2.2. Specifically, we run the code of Algorithm 2 for 10 times, counting the running time from line 6 to line 34, with and without the greedy optimization (in line 9), respectively. The experiment platform we used is one MacBook Pro 13inch, 2.9 Ghz Inter Core i5, 8GB 2133MHz LPDDR3 with single thread. We add up the total time of all the samples (since the single sample only runs for a short time). The average time of 10 runs is shown in Table 8.
Highdensity  Lowdensity  
Topic center only  7.671  10.725 
Topic center with greed  6.905  10.060 
The results show that the greedy optimization reduces 9.99% running time of highdensity data and 6.20% of lowdensity data, respectively, which verifies the effectiveness of our greedy algorithm.
Then, we compare our algorithm with different existing methods using the test set of D1. To evaluate the performance of the proposed video tag extraction algorithm, we compare our method with 5 unsupervised keyword extraction methods, i.e.,

TFIDF, a classical keyword extraction algorithm.

TX, a graphbased text ranking model, textrank (Mihalcea and Tarau, 2004), which is inspired by PageRank.
Method  Prec  Recall  F1score  MAP 
TFIDF  0.2674  0.5735  0.3648  0.4224 
TX  0.2427  0.5205  0.3310  0.3696 
BTM  0.2337  0.5012  0.3188  0.3094 
GSDPMM  0.2445  0.5094  0.3302  0.3374 
TPTM  0.2539  0.5446  0.3463  0.3824 
SWIDF (dialogue)  0.3079  0.6602  0.4210  0.4932 
SWIDF (topic center)  0.3258  0.6988  0.4444  0.5122 
Method  Prec  Recall  F1score  MAP 
TFIDF  0.3411  0.4028  0.3694  0.3098 
TX  0.3224  0.3709  0.3450  0.3147 
BTM  0.3210  0.3662  0.3369  0.2927 
GSDPMM  0.3440  0.4038  0.3715  0.3202 
TPTM  0.3677  0.4334  0.3979  0.3359 
SWIDF (dialogue)  0.3912  0.4693  0.4267  0.3623 
SWIDF (topic center)  0.3877  0.4562  0.4207  0.3522 
For each method, we calculate the precision, recall, MAP (Mean Average Precision) and F1score of top 10 tagging results at first. Results of highdensity and lowdensity of TSCs are shown in Table 9 and Table 10, respectively.
In highdensity condition, our topic centerbased SWIDF algorithm achieves optimal results in both F1score and MAP. It increases the F1score by 21.82% and the MAP by 21.26% compared with the stateoftheart method TFIDF in the baselines. In lowdensity condition, our dialoguebased SWIDF algorithm achieves optimal results in both F1score and MAP. It increases the F1score by 7.24% and the MAP by 7.86% compared with the stateoftheart method TPTM in the baselines. Compare the two algorithms, we find that the dialoguebased algorithm performs better in lowdensity condition, while topic centerbased algorithm performs better in highdensity condition, which further proves our assumption in Section 3.2.
What is more, when the density of TSCs becomes high, the noises increase. Therefore the result of topic model based methods, BTM, GSDPMM, and TPTM are poor and even worse than classical method TFIDF. However, TFIDF only counts the number of words and does not consider the semantic relationship of TSCs, so the result is not as good as our algorithms. Relatively, in lowdensity comments, the graph is sparse and noises reduce. That is why our algorithms achieve greater improvement in highdensity than in lowdensity.
Method  HTop 5  HTop 15  LTop 5  Ltop 15 
Prec Recall  Prec Recall  Prec Recall  Prec Recall  
TFIDF  0.4182 0.4483  0.1871 0.5997  0.4140 0.2434  0.2993 0.5255 
TX  0.3012 0.3234  0.1810 0.5831  0.3838 0.2250  0.2814 0.5071 
BTM  0.2715 0.2924  0.1771 0.5692  0.3678 0.2158  0.2609 0.4602 
GSDPMM  0.2812 0.3013  0.1832 0.5930  0.4181 0.2486  0.3067 0.5390 
TPTM  0.3627 0.3945  0.1805 0.5927  0.4365 0.2662  0.3183 0.5624 
SWIDF(d)  0.4935 0.5362  0.2273 0.7241  0.4654 0.2893  0.3556 0.6327 
SWIDF(c)  0.5300 0.5692  0.2345 0.7571  0.4518 0.2783  0.3410 0.6269 
To further validate our algorithm, we show the precision and recall of top 5 and top 15 candidate tags in Table 11. The results of each algorithm are similar to the performance of Top 10, which prove that our two algorithms have better performance when extracting video tags from timesync comments in any situation.
Video number  AcFun ac2643295_1  AcFun ac2656362_6  AcFun ac2474006_1  AcFun ac2669229_1  
Screenshot  
Timeline  0:00:000:01:10  0:07:28 0:09:49  0:00:001:04:07  0:00:000:15:41  
Amount  785  764  2933  2460  
Density  672.84  325.08  45.78  156.84  
TFIDF 





TextRank 





BTM 





GSDPMM 





TPTM 





SWIDF(d) 





SWIDF(c) 




Finally, we show the Top 5 of video tags generated by the algorithms above in Table 12. The Bold italic words indicate the good tags (the tags that all three volunteers voted), while the underline words indicate the bad tags ((the tags that less than two volunteers voted)). The results show that the SWIDF (Topic Center) and SWIDF(dialogue) have more good tags and less bad tags than other algorithms, which intuitively demonstrates the superiority of our algorithms.
5. Conclusion
In this paper, we proposed a novel video tag extraction algorithm to acquire video tags for timesync videos. To deal with the features of timesync comments, SWIDF was designed to cluster comments into semantic association graph by taking advantage of their semantic similarities and timestamps. In this way, the noises could be differentiated from the meaningful comments, and thus be effectively eliminated. Finally, video tags were well recognized and extracted in an unsupervised way. Extensive experiments on realworld dataset proved that our algorithm could effectively extract video tags with a significant improvement of precision and recall compared with several baselines, which obviously validates the potential of our algorithm on tag extraction, as well as tackling with the features of timesync comments.
Acknowledgements.
This work is supported by Chinese National Research Fund (NSFC) Key Project No. 61532013 and No. 61872239. NSFC Project No. 61872195 and No. 61702330. FDCT/0007/2018/A1, DCTMoST Jointproject No. (025/2015/AMJ), University of Macau Grant Nos: MYRG201800237RTO, CPG201800032FST and SRG201800111FST of SAR Macau, China.References
 (1)
 Alhabashneh et al. (2017) Obada Alhabashneh, Rahat Iqbal, Faiyaz Doctor, and Anne James. 2017. Fuzzy rule based profiling approach for enterprise information seeking and retrieval. Information Sciences 394 (2017), 18–37.
 Alstrup et al. (2000) Stephen Alstrup, G Stolting Brodal, and Theis Rauhe. 2000. New data structures for orthogonal range searching. In Foundations of Computer Science, 2000. Proceedings. 41st Annual Symposium on. IEEE, 198–207.
 Bae et al. (2017) SeungHee Bae, Daniel Halperin, Jevin D West, Martin Rosvall, and Bill Howe. 2017. Scalable and efficient flowbased community detection for largescale graph analysis. ACM Transactions on Knowledge Discovery from Data (TKDD) 11, 3 (2017), 32.
 Bentley (1975) Jon Louis Bentley. 1975. Multidimensional binary search trees used for associative searching. Commun. ACM 18, 9 (1975), 509–517.
 Bentley (1990) Jon Louis Bentley. 1990. Kd trees for semidynamic point sets. In Proceedings of the sixth annual symposium on Computational geometry. ACM, 187–197.

Blei
et al. (2003)
David M Blei, Andrew Y
Ng, and Michael I Jordan.
2003.
Latent dirichlet allocation.
Journal of Machine Learning Research
3 (2003), 993–1022.  Bordino et al. (2010) Ilaria Bordino, Carlos Castillo, Debora Donato, and Aristides Gionis. 2010. Query similarity by projecting the queryflow graph. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval. ACM, 515–522.
 Chakraborty et al. (2016) Tanmoy Chakraborty, Sriram Srinivasan, Niloy Ganguly, Animesh Mukherjee, and Sanjukta Bhowmick. 2016. Permanence and community structure in complex networks. ACM Transactions on Knowledge Discovery from Data (TKDD) 11, 2 (2016), 14.
 Chen et al. (2017a) Shizhe Chen, Jia Chen, Qin Jin, and Alexander Hauptmann. 2017a. Video captioning with guidance of multimodal latent topics. In Proceedings of the 2017 ACM on Multimedia Conference. ACM, 1838–1846.
 Chen et al. (2017b) Xu Chen, Yongfeng Zhang, Qingyao Ai, Hongteng Xu, Junchi Yan, and Zheng Qin. 2017b. Personalized key frame recommendation. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 315–324.

Dong and Dong (2003)
Zhendong Dong and Qiang
Dong. 2003.
HowNeta hybrid language and knowledge resource.
In
Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003 International Conference on
. IEEE, 820–824.  Fortunato (2010) Santo Fortunato. 2010. Community detection in graphs. Physics reports 486, 3 (2010), 75–174.
 Friedman et al. (1977) Jerome H Friedman, Jon Louis Bentley, and Raphael Ari Finkel. 1977. An algorithm for finding best matches in logarithmic expected time. ACM Transactions on Mathematical Software (TOMS) 3, 3 (1977), 209–226.
 Gu et al. (2017) Liqiu Gu, Kun Wang, Xiulong Liu, Song Guo, and Bo Liu. 2017. A reliable task assignment strategy for spatial crowdsourcing in big data environment. In 2017 IEEE International Conference on Communications (ICC). IEEE, 1–6.
 Guo et al. (2011) Jiafeng Guo, Xueqi Cheng, Gu Xu, and Xiaofei Zhu. 2011. Intentaware query similarity. In Proceedings of the 20th ACM international conference on Information and knowledge management. ACM, 259–268.

He et al. (2015)
Hua He, Kevin Gimpel,
and Jimmy J Lin. 2015.
MultiPerspective Sentence Similarity Modeling with Convolutional Neural Networks.. In
EMNLP. 1576–1586.  He et al. (2016) Ming He, Yong Ge, Le Wu, Enhong Chen, and Chang Tan. 2016. Predicting the Popularity of DanMuenabled Videos: A Multifactor View. In Proceedings of International Conference on Database Systems for Advanced Applications. Springer, 351–366.
 Huang et al. (2017) Faliang Huang, Xuelong Li, Shichao Zhang, Jilian Zhang, Jinhui Chen, and Zhinian Zhai. 2017. Overlapping community detection for multimedia social networks. IEEE Transactions on Multimedia 19, 8 (2017), 1881–1893.
 Hussein and Piccardi (2017) Fairouz Hussein and Massimo Piccardi. 2017. VJAUNE: A Framework for Joint Action Recognition and Video Summarization. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 13, 2 (2017), 20.
 Hyung et al. (2017) Ziwon Hyung, JoonSang Park, and Kyogu Lee. 2017. Utilizing contextrelevant keywords extracted from a large collection of usergenerated documents for music discovery. Information Processing & Management 53, 5 (2017), 1185–1200.
 Iacobacci et al. (2015) Ignacio Iacobacci, Mohammad Taher Pilehvar, and Roberto Navigli. 2015. Sensembed: Learning sense embeddings for word and relational similarity. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Vol. 1. 95–105.
 Kenter and De Rijke (2015) Tom Kenter and Maarten De Rijke. 2015. Short text similarity with word embeddings. In Proceedings of the 24th ACM international on conference on information and knowledge management. ACM, 1411–1420.
 Kusner et al. (2015) Matt J Kusner, Yu Sun, Nicholas I Kolkin, and Kilian Q Weinberger. 2015. From word embeddings to document distances. In Proceedings of the 32nd International Conference on Machine Learning (ICML 2015). 957–966.
 Lancichinetti and Fortunato (2009) Andrea Lancichinetti and Santo Fortunato. 2009. Community detection algorithms: a comparative analysis. Physical review E 80, 5 (2009), 056117.
 Lancichinetti et al. (2008) Andrea Lancichinetti, Santo Fortunato, and Filippo Radicchi. 2008. Benchmark graphs for testing community detection algorithms. Physical review E 78, 4 (2008), 046110.
 Lee and Wong (1977) DerTsai Lee and CK Wong. 1977. Worstcase analysis for region and partial region searches in multidimensional binary search trees and balanced quad trees. Acta Informatica 9, 1 (1977), 23–29.
 Levy and Goldberg (2014a) Omer Levy and Yoav Goldberg. 2014a. Linguistic regularities in sparse and explicit word representations. In Proceedings of the eighteenth conference on computational natural language learning. 171–180.
 Levy and Goldberg (2014b) Omer Levy and Yoav Goldberg. 2014b. Neural word embedding as implicit matrix factorization. In Advances in neural information processing systems. 2177–2185.
 Levy et al. (2015) Omer Levy, Yoav Goldberg, and Ido Dagan. 2015. Improving distributional similarity with lessons learned from word embeddings. Transactions of the Association for Computational Linguistics 3 (2015), 211–225.
 Li et al. (2018b) Shen Li, Zhe Zhao, Renfen Hu, Wensi Li, Tao Liu, and Xiaoyong Du. 2018b. Analogical Reasoning on Chinese Morphological and Semantic Relations. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers. ACL, 138–143.

Li
et al. (2018a)
Yixuan Li, Kun He,
Kyle Kloster, David Bindel, and
John Hopcroft. 2018a.
Local Spectral Clustering for Overlapping Community Detection.
ACM Transactions on Knowledge Discovery from Data (TKDD) 12, 2 (2018), 17.  Liao et al. (2018) Zhenyu Liao, Yikun Xian, Xiao Yang, Qinpei Zhao, Chenxi Zhang, and Jiangfeng Li. 2018. TSCSet: A Crowdsourced TimeSync Comment Dataset for Exploration of User Experience Improvement. In 23rd International Conference on Intelligent User Interfaces. ACM, 641–652.

Lv
et al. (2016)
Guangyi Lv, Tong Xu,
Enhong Chen, Qi Liu, and
Yi Zheng. 2016.
Reading the Videos: Temporal Labeling for
Crowdsourced TimeSync Videos Based on Semantic Embedding. In
Proceedings of the 30th AAAI Conference on Artificial Intelligence
.  Mihalcea and Tarau (2004) Rada Mihalcea and Paul Tarau. 2004. TextRank: Bringing order into texts. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 8–15.
 Mikolov et al. (2013) Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111–3119.
 Mueller and Thyagarajan (2016) Jonas Mueller and Aditya Thyagarajan. 2016. Siamese Recurrent Architectures for Learning Sentence Similarity.. In AAAI. 2786–2792.

Murtagh and
Contreras (2012)
Fionn Murtagh and Pedro
Contreras. 2012.
Algorithms for hierarchical clustering: an overview.
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 2, 1 (2012), 86–97. 
Murtagh
et al. (2008)
Fionn Murtagh, Geoff
Downs, and Pedro Contreras.
2008.
Hierarchical clustering of massive, high dimensional data sets by exploiting ultrametric embedding.
SIAM Journal on Scientific Computing 30, 2 (2008), 707–730.  Murtagh and Legendre (2014) Fionn Murtagh and Pierre Legendre. 2014. Ward’s hierarchical agglomerative clustering method: which algorithms implement ward’s criterion? Journal of Classification 31, 3 (2014), 274–295.
 Newman and Girvan (2004) Mark EJ Newman and Michelle Girvan. 2004. Finding and evaluating community structure in networks. Physical review E 69, 2 (2004), 026113.
 Pandove et al. (2018) Divya Pandove, Shivan Goel, and Rinkl Rani. 2018. Systematic review of clustering highdimensional and large datasets. ACM Transactions on Knowledge Discovery from Data (TKDD) 12, 2 (2018), 16.
 Pang et al. (2015) Junbiao Pang, Fei Jia, Chunjie Zhang, Weigang Zhang, Qingming Huang, and Baocai Yin. 2015. Unsupervised web topic detection using a ranked clusteringlike pattern across similarity cascades. IEEE Transactions on Multimedia 17, 6 (2015), 843–853.
 Pennington et al. (2014) Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods on Natural Language Processing, Vol. 14. 1532–43.
 Raamkumar et al. (2017) Aravind Sesagiri Raamkumar, Schubert Foo, and Natalie Pang. 2017. Using authorspecified keywords in building an initial reading list of research papers in scientific paper retrieval and recommender systems. Information Processing & Management 53, 3 (2017), 577–594.
 Ramaboa and Fish (2018) Kutlwano KKM Ramaboa and Peter Fish. 2018. Keyword length and matching options as indicators of search intent in sponsored search. Information Processing & Management 54, 2 (2018), 175–183.
 Ramezani et al. (2018) Maryam Ramezani, Ali Khodadadi, and Hamid R Rabiee. 2018. Community Detection Using Diffusion Information. ACM Transactions on Knowledge Discovery from Data (TKDD) 12, 2 (2018), 20.
 Siersdorfer et al. (2009) Stefan Siersdorfer, Jose San Pedro, and Mark Sanderson. 2009. Automatic video tagging using content redundancy. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 395–402.
 Socher et al. (2011) Richard Socher, Cliff C Lin, Chris Manning, and Andrew Y Ng. 2011. Parsing natural scenes and natural language with recursive neural networks. In Proceedings of the 28th International Conference on Machine Learning (ICML11). 129–136.
 Tarjan (1975) Robert Endre Tarjan. 1975. Efficiency of a good but not linear set union algorithm. Journal of the ACM (JACM) 22, 2 (1975), 215–225.
 Tarjan (1979) Robert Endre Tarjan. 1979. A class of algorithms which require nonlinear time to maintain disjoint sets. Journal of computer and system sciences 18, 2 (1979), 110–127.
 Wang et al. (2016b) Chenguang Wang, Yangqiu Song, Dan Roth, Ming Zhang, and Jiawei Han. 2016b. World knowledge as indirect supervision for document clustering. ACM Transactions on Knowledge Discovery from Data (TKDD) 11, 2 (2016), 13.
 Wang et al. (2017) Kun Wang, Liqiu Gu, Song Guo, Hongbin Chen, Victor CM Leung, and Yanfei Sun. 2017. Crowdsourcingbased contentcentric network: a social perspective. IEEE Network 31, 5 (2017), 28–34.
 Wang et al. (2016a) Kun Wang, Xin Qi, Lei Shu, Derjiunn Deng, and Joel JPC Rodrigues. 2016a. Toward trustworthy crowdsourcing in the social internet of things. IEEE Wireless Communications 23, 5 (2016), 30–36.
 Wu et al. (2012) Benbin Wu, Jing Yang, and Liang He. 2012. Chinese hownetbased multifactor word similarity algorithm integrated of result modification. In International Conference on Neural Information Processing. Springer, 256–266.
 Wu et al. (2014) Bin Wu, Erheng Zhong, Ben Tan, Andrew Horner, and Qiang Yang. 2014. Crowdsourced timesync video tagging using temporal and personalized topic modeling. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 721–730.
 Xu and Zhang (2017) Linli Xu and Chao Zhang. 2017. Bridging Video Content and Comments: Synchronized Video Description with Temporal Summarization of Crowdsourced TimeSync Comments.. In AAAI. 1611–1617.
 Yan et al. (2013) Xiaohui Yan, Jiafeng Guo, Yanyan Lan, and Xueqi Cheng. 2013. A biterm topic model for short texts. In Proceedings of the 22nd international conference on World Wide Web. ACM, 1445–1456.
 Yang et al. (2017) Wenmian Yang, Na Ruan, Wenyuan Gao, Kun Wang, Wensheng Ran, and Weijia Jia. 2017. Crowdsourced timesync video tagging using semantic association graph. In Multimedia and Expo (ICME), 2017 IEEE International Conference on. IEEE, 547–552.
 Yin and Wang (2014) Jianhua Yin and Jianyong Wang. 2014. A dirichlet multinomial mixture modelbased approach for short text clustering. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 233–242.

Yin and Wang (2016)
Jianhua Yin and Jianyong
Wang. 2016.
A modelbased approach for text clustering with outlier detection. In
Proceedings of Data Engineering (ICDE), 2016 IEEE 32nd International Conference on. IEEE, 625–636.  Yu et al. (2015) Zhiwen Yu, Zhu Wang, Huilei He, Jilei Tian, Xinjiang Lu, and Bin Guo. 2015. Discovering information propagation patterns in microblogging services. ACM Transactions on Knowledge Discovery from Data (TKDD) 10, 1 (2015), 7.
 Zhu (2004) Mu Zhu. 2004. Recall, precision and average precision. Department of Statistics and Actuarial Science, University of Waterloo, Waterloo 2 (2004), 30.
Comments
There are no comments yet.