Hierarchical Neural Network for Extracting Knowledgeable Snippets and Documents

08/22/2018 ∙ by Ganbin Zhou, et al. ∙ 0

In this study, we focus on extracting knowledgeable snippets and annotating knowledgeable documents from Web corpus, consisting of the documents from social media and We-media. Informally, knowledgeable snippets refer to the text describing concepts, properties of entities, or relations among entities, while knowledgeable documents are the ones with enough knowledgeable snippets. These knowledgeable snippets and documents could be helpful in multiple applications, such as knowledge base construction and knowledge-oriented service. Previous studies extracted the knowledgeable snippets using the pattern-based method. Here, we propose the semantic-based method for this task. Specifically, a CNN based model is developed to extract knowledgeable snippets and annotate knowledgeable documents simultaneously. Additionally, a "low-level sharing, high-level splitting" structure of CNN is designed to handle the documents from different content domains. Compared with building multiple domain-specific CNNs, this joint model not only critically saves the training time, but also improves the prediction accuracy visibly. The superiority of the proposed method is demonstrated in a real dataset from Wechat public platform.



There are no comments yet.


page 14

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Nowadays millions of articles are published in social media everyday. Some of them attract millions of clicks, forwardings, and favorites because of their high-quality contents, which usually provide various knowledge and experiences from different domains. For example, Figure 1 exhibits two screenshots of such articles from Wechat public platform. The first one, shown in Figure 1(a), is a popular article introducing the turning skills of driving, while the other article in Figure 1(b) summaries the tips for purchasing real estates. These two documents, due to their usefulness, receive more than views in several days, respectively.

By being able to recognize “knowledgeable” documents as well as their “knowledgeable” snippets from large web corpus helps in wide potential applications. We describe two of such applications. First, knowledgeable articles and snippets can be used as a data source for knowledge base construction. Currently, majority of popular knowledge bases, like YAGO [16], DBpedia [1] and etc., extracted knowledge based on Wikipedia, WordNet, GeoNames and so on. Compared with the data scale of social media, knowledge in these structured or semi-structured resources summarized by human beings might be limited and inflexible. Another recent knowledge base, Probase, with million concepts was automatically harnessed from the so-far largest corpus consisting of million knowledgeable sentences extracted from billion web pages [21]. However, these sentences are extracted only by the Hearst patterns [10]. For extracting more knowledgeable snippets to construct more comprehensive knowledge base, semantic-based methods are needed to complement the previous pattern-based ones.

The second potential application is knowledge-oriented services, e.g. knowledge retrieval and question-answering. In particular, these extracted knowledgeable documents and snippets can be used directly as the answers to the questions raised by users when they need some help a certain issue. For example, if a user wants to know some knowledge on purchasing real estates, then the article as shown in Fig 1(b) can be retrieved by these knowledge-oriented systems.

Motivated by these potential applications, we investigate the problem of annotating knowledgeable documents and extracting their knowledgeable snippets from large-scale web corpus. Informally, a knowledgeable document is a document containing multiple knowledgeable snippets, which describe concepts, properties of entities, or the relations among entities. In this study, we address this task by analyzing the semantics of document text based on the convolutional neural network (CNN) model. CNN has been applied in understanding images and natural language text in recent years 

[23, 24], and achieved great success in multiple applications, such as image recognition and captioning [14, 20, 22]

, text sentiment analysis 

[11, 6], non-photorealistic rendering [9], etc..

Specifically, we propose SSNN, a joint CNN-based model, to understand the abstract concept of documents in different domains collaboratively and judge whether a document is knowledgeable or not. In more detail, the network structure of SSNN is “low-level Sharing, high-level Splitting”, in which the low-level layers are shared for different domains while the high-level layers beyond the CNN are trained separately to perceive the differences of different domains. It is an end-to-end solution for document annotation without the time-consuming feature engineering work. In addition, we carefully develop the manual features for this task and train a SVM classifier. We conduct extensive experiments on the real documents from three content domains in Wechat public platform to demonstrate the superiority of the proposed SSNN.

(a) The document introduces the turning skills of driving.

(b) The document introduces the 25 pieces of tips for purchasing real estates.
Figure 1: Examples of knowledgeable documents. The blue and red sentences are knowledgeable and unknowledgeable snippets respectively.

The contributions of this study are summarized as follows:

  • We formulate the problem of knowledgeable documents and snippets extraction.

  • We propose a “low-level Sharing, high-level Splitting” Neural Network (SSNN for short) to identify knowledgeable documents and annotating knowledgeable snippets simultaneously.

  • We carefully design the manual features for the knowledgeable documents and snippets extraction task and train a SVM classifier.

  • We verify the performance of the proposed models on a real dataset from Wechat public platform. The results show that the proposed SSNN is a promising solution towards on knowledgeable document and snippets extraction.

The remainder of this paper is organized as follows. Section 2 surveys the related work. Section 3 presents the problem formulation. Section 4 details the proposed SSNN model. Section 5 details the SVM based method with the manual features. Section 6 demonstrates the experimental results. Section 7 concludes the paper.

2 Related Work

Genre Identification. Our problem is related to the genre classification problem. In genre classification, documents are mainly divided into several types, e.g. narrative, exposition and so on. Though exposition document generally provides the background information, concepts or properties of things, the proposed knowledgeable documents also discuss relations, cause and influence between entities, which include more broad definitions than the exposition documents.

The previous work about genre identification mainly contribute in two aspects: (1) Find more effective features to represent the document; (2) Propose more advanced machine learning algorithms for classification.

Kessler et al. [12] first used the term “genre” to represent any widely recognized class of texts defined by some common communicative purposes or other functional traits. Finn et al. [8] proposed that the genre classification is orthogonal to the topic classification. That is to say the documents had same topic can have different genre and documents in the same genre can also have different topics. Feldman et al. [7]

put forward a method for part of speech feature extraction and obtain a significant improvement for genre identification. Denil et al. 

[6] proposed an advanced machine learning algorithm to represent the text into a lower dimension. Such a representation can characterize the inter-class separability and intra-class compactness by the special designed intrinsic graph and penalty graph.

NLP via Neural Network. Knowledgeable document extraction is related to NLP classification. For such task, Tai et al. [17] proposed a model that predict the semantic relatedness of two sentences and sentiment classification based on LSTM model. Kim et al. [13]

proposed a CNN model based on hyperparameter tuning and static vectors for sentence classification. However, these models focus on the sentence level classification, while our work focuses on the classification in document level. Moreover, knowledgeable snippet extraction is related to the task of identifying task-specific sentences. For such research field, Denil et al. 

[6] proposed a hierarchical convolutional model which can be used for documents classification and topic-relevant sentences. Tu et al. [19] proposed a method which automatically identify high impacting sub-structures. However, both of these methods are designed for corpus consisting of documents from a same topic. While in our work, documents are come from large-scale corpus of social media with many topics.

3 Problem Formulation

In this section, we first introduce the informal definitions of knowledge snippet and knowledgeable document, and then formulate the learning problem.

Definition 1 (Knowledgeable Snippet). A snippet of a document is called knowledgeable if it conforms to one of the following descriptions.

  1. The definition of a concept or an entity.

    For example, “A company is an association or collection of individuals, whether natural persons, legal persons, or a mixture of both.” and “Google Inc. is an American multinational technology company specializing in Internet-related services and products.” are both knowledgeable snippets since they define the concept of “company” and the entity of “Google Inc.”. Note that concept also refers to abstract entity. Hence for convenience and clarity, we use entity as a unified notion hereafter.

  2. The property of an entity.

    For example, “Company members share a common purpose and unite in order to focus their various talent.” is knowledgeable. Here “members” is a property of the entity “company”.

  3. The relation between two entities.

    For example, “Because companies are legal persons, they also may associate and register themselves as a corporate group.” is knowledgeable because the sentence discusses the relation between “company” and “corporate group”. They are both entities.

  4. The cause of a relation between entities.

    The previous example is also proper for this description. Specifically, “companies are legal persons” is the cause of the relation between entity “company” and “corporate group”.

  5. The influence of the relation between entities.

    For example, “The members guarantee the payment of certain amounts if the company goes into insolvent liquidation” is a knowledgeable sentence, and “the members guarantee the payment of certain amounts” is the influence of “company goes into insolvent liquidation”.

Generally speaking, knowledgeable is highly related to the definition of entities, properties of entities and relations between entities. The examples used in Definition 1 are selected from Wikipedia.

Definition 2 (Knowledgeable Document). A document is called knowledgeable if more than half of snippets in it are knowledgeable.

These informal definitions on knowledgeable snippets and documents give us some intuitive understandings on these concepts. They are also used as a guideline for the volenteers to label the training data set for this task.

Based on the definitions above, we formulate our learning problem of extracting knowledgeable documents and snippets as follows:

Learning Problem: Given domains of documents , , , , the labeled document set from the -th domain is denoted as is supervising label of , where denotes a document in -th domain with supervising label, and is a binary label ( or ) indicating knowledgeable or unknowledgeable. The unlabeled document set from the -th domain is denoted as where denotes a document in -th domain without supervising label. This task is to predict the label for a unlabeled document, and meanwhile annotate its knowledgeable snippets.

4 The SSNN Model

In this section, we introduce the proposed SSNN model that identifies knowledgeable documents from corpus and annotate knowledgeable snippets of them.

Generally speaking, SSNN is designed to improve the generalization ability and decrease the time consuming of training. In SSNN, the low-level layers are shared while the high-level structures are split for multiple domains. This idea is motivated by the structure of human brain, and we take the vision system of human as an example. When an image is shot into the human’s eyes, the primary visual cortex firstly transforms elementary information processed by neural cells into abstract features. At this stage, information is always sharing. Then, the information begins to be split and is fed to different types of advanced cortex and let people recognize the picture from different angles [25].

Similar to the human vision, in the proposed SSNN, CNN layers are adopted to handle low-level features, namely words and sentences. The CNN layers are shared among domains, corresponding to the neural system connecting eyes and primary visual cortexes. The output of low-level layers is fed to split domain-specific high-level layers (e.g. softmax layers), which is corresponding to the advanced cortex as the vision example.

4.1 The Structures of SSNN

To predict whether a given document is knowledgeable or not, SSNN will: (1) transform the document into a document embedding; (2) feed the document embedding to a softmax layer; (3) use the softmax layer to predict whether a document is knowledgeable or not.

The structure of SSNN is shown in the Fig.2. The components used for embedding a document into a vector is divided into two levels: word-to-sentence level and sentence-to-document level.

At the word-to-sentence level, we transform words into sentence embeddings. Hence, we first generate word embeddings as the input by applying the word2vec [15] and setting dimension to . The vocabulary we used contains Chinese words, and the out-of-vocabulary words are replaced with a special token “UNK”. Then, for a sentence of the given document, the embeddings of its words compose a word matrix , where is the embedding of the -th word. is then transformed into a sentence embedding by CNN. Similarly, at sentence-to-document level, we will transform all sentence embeddings of one document into a document embedding. The embeddings of sentences compose a sentence matrix , where is the embedding of the -th sentence in a document . will be transformed into a mediate embedding by CNN.The mediate embedding will be flattened and then fed to a fully connected layer to generate the document embedding . After that,

is fed to a domain-specific softmax layer to generate prediction. The softmax layer will generate the probabilities of a document

is knowledgeable. Here, if the probability is above , will be predicted as a knowledgeable document. Otherwise, it will be unknowledgeable.

Figure 2: The structure of the proposed model.

4.2 Training Details

Each convolutional layer of CNN and CNN contains a filter bank , where is the dimension of input embeddings, is the filter width and is the number of convolutional kernels in the filter. Since the dimension of first axis of convolutional kernel and input embedding matrix are the same, every embedding matrix are transformed into vector by convolution operation. We then combine these vectors column by column into a feature map, and feed the feature map to pooling layer.

Note that the lengths of sentences and documents are different. Hence, the column number of feature map generated from sentences and documents are different, and the dimension of the mediate embedding will vary for different documents. It will be problematic for the fully connected layer which only receiving fixed-size embeddings. To ensure the dimension of sentence and document embeddings unchanged for different sentences and documents, we average each row of feature map via an average pooling layer which computes the average values along each row of embedding matrix.

When training, SSNN adapts model parameters to minimize a loss function

, which is the cross-entropy of the predicted and true labels. In detail, for domain , let denote the parameter set including the word-to-sentence layer, sentence-to-document layer and the softmax layer for domain . Let denote the predicted probability of that is knowledgeable. The loss function for is defined as follows:


Furthermore, the objective function to be minimized is as follows:


To minimize the objective function, we adopt mini-batch gradient descent technique [5] which feeds the model a fixed number of training data in each mini-batch update. This technique makes it possible for memory-limited GPUs to deal with big data set. Here, the data in a mini-batch are only chosen from an individual domain. Given a mini-batch of documents , the parameters are updated as follows:


where is the learning rate, and we set . Additionally, since the softmax layers are independent for different domains, only parameters in are updated when using a mini-batch .

4.3 Extracting Knowledgeable Sentences

So far we have discussed how to identify the knowledgeable documents from the corpus. Now we move to extract the knowledgeable snippets based on the proposed SSNN.

Specifically, we accomplish this task by adapting the method of [24]. The basic idea is that a sentence may tend to be knowledgeable if its document becomes less knowledgeable after the sentence is removed.

In particular, for a document , we first predict whether is knowledgeable or not. We then construct a pseudo-label which is the invert of the prediction. For example, if document is predicted to be knowledgeable , its pseudo-label will be unknowledgeable.

Let denote the sentence matrix of , denote the predicting label output by softmax layer, and

denote an identity matrix whose width is equal to the sentence number of

. It can be inferred that . Note that evaluating the influence in the “knowledgeable” level of if the -th sentence of is removed, is equal to evaluating the influence in if we transform the into at . We define this influence as the derivative of the loss function with respect to , which is as follows:


where is the -th column of the document matrix , is the -th column of , and is the backpropogation message to  [6].

The derivative in Eq. (4) is defined as the knowledgeable scores for each sentence. If the document is predicted knowledgeable, the higher score of a sentence indicates the more knowledgeable is the sentence. But if the document is predicted unknowledgeable, the higher score indicates the sentence is the less knowledgeable [2]. Finally, we choose sentences with top- knowledgeable scores as the knowledgeable snippets for a given document.

4.4 Analysis on Memory Consumption of SSNN

In this subsection, we analyze the memory consumption of SSNN and demonstrate that SSNN can save memory consumption compared with traditional CNN models.

Let denote the parameter numbers of the word-to-sentence and the sentence-to-document levels, and let denote the parameter numbers of the softmax layer. Hence, the parameter number of SSNN is where indicates the number of domains. For fair comparison, we assume that the structure of the domain-specific CNN is the same as one-domain SSNN. Hence, the parameter numbers of a domain-specific CNN is . To train CNNs for domains, the number of all parameters is . Let denote the saving ratio of memory consumption of SSNN w.r.t. domain-specific CNNs. can be derived as follows:


In real application, since denotes the parameter number of several complex layers of neural network, and only denotes the parameter number of a binary classifier, we obtain . Thus, the saving ratio can be approximated as . This ratio is impressive, for example, when , SSNN can save parameters w.r.t. 3 domain-specific CNNs. Thus, we argue that the proposed SSNN has benefits on both time and memory consumptions because of its sharing and splitting structure.

5 Feature Engineering Solution

In this section, we introduce another approach to knowledgeable document recognition based on feature engineering. Specifically, we train a SVM classifier for every domain with manually extracted features and use these classifiers to predict whether the unlabeled documents is knowledgeable or not. Next we will mainly focus on describing the features we adopt in such method. We divide the features into multiple categories, i.e. POS features, word features and sentence features. Part of speech (POS) features are based on the word POS tagging, word features are related to the meaning of words, and sentence features consider the comprehensive characteristics in sentence level.

5.1 POS features

According to Definition 1, knowledge is highly related with entities which are usually noun including general noun (for abstract concept) or proper noun (for concrete entity). Besides, when explaining definitions, properties or relations of entities, illustrative text with strong expositions are usually included. Hence noun, verb and verb noun might act as the leading role in knowledgeable sentences.

Review the narrative articles, which are considered unknowledgeable in this study, usually contain news events. In these documents, characters’ names, time words and location words accompanied with a great quantity of verbs might appear frequently because they might describe who did what in when and where.

In addition, some unknowledgeable documents contain lyrical and critical text. Most of them express self views or comments on other things, which are highly subjective. Hence pronoun, interjection, adverb and adjective often appear together in these articles.

Figure 3 gives some examples we have discussed above. In order to discriminate knowledgeable and unknowledgeable documents, some additional POS features are also needed. The detailed information of adopted POS features is shown as Table 1.

Figure 3: Examples for different POS.
Type POS Knowledgeable
exposition general noun, proper noun, verb and verb noun Yes
scientific proper noun, quantifier and numeral Yes
narrative characters’ name, time words, location words and verb No
critical pronoun, interjection, adverb and adjective No
lyrical idiom, exclamation, abbreviation and onomatopoetic No
advertisement website url, telephone and email No
Table 1: POS pattern in different type of documents.

By analyzing amount of examples, we observe the POS patterns appear with great regularities, so we take the POS histogram statistic features (PHSF)[7] as the POS features. That is we first tag each word in the document by a POS tagger with POS tags. We use a sliding window of length and slide it from the start to the end of a document to get all together windows. For every tag , we count the number of tag in each window and get a sequence where denotes the number of tag in window . Next, for every sequence , , the mean

and the variance

of is calculated. We finally use every and to construct the POS feature, namely the set of POS feature is .

5.2 Word features

We also identify an interesting phenomenon on the titles of typical knowledgeable documents. Knowledgeable documents’ title has a high probability using some conclusive words which indicates the documents will introduce concepts or properties of entities. Such as “complete works of” and “encyclopedia” indicate this document will introduce concept systematically, “decrypt” and “guide” indicate this document will introduce the relation of some entities from different aspects. Based on this observation, we concluded 25 keywords related to knowledge, and if a document’s title contains any one of these keywords, the word feature will be set to , otherwise . Table 2 shows the detail of the words.

forum real stuff school secret skill
collection question decrypt unscramble guide
misunderstand pattern research special mechanism
classroom difference knowledge comment method
measure answer theory study system
Table 2: Conclusive words appear in titles.

5.3 Sentence features

The sentence features come from the statistics of shallow text. These statistics are usually used in traditional document classifications. Because these features might also be predictive to knowledgeable and unknowledgeable documents, we put them into our feature set. These features include the number of words, the length of document, the number of sentences and average length of sentences[8, 18]. We also extend these features, such as the number of paragraphs, average sentences’ number of each paragraph, the number of distinct words in title and so on. The results demonstrate that these features have more or less contributions on the accuracy of identifying knowledgeable documents. All the features in the three categories are shown as Table 3.

Features No. Description
POS 1-98 mean and variance of each POS
Word 99 number of conclusive words
100 number of first personal pronoun
101 number of second personal pronoun
102 number of third personal pronoun
Sentence 103 length of title
104 length of content
105 number of words in title
106 number of words in content
107 number of distinct words in title
108 number of distinct words in content
109 number of punctuation in title
110 number of punctuation in content
111 number of paragraphs in content
112 number of sentences in content
113 average number of sentences of paragraphs in content
114 average number of words of sentences in content
Table 3: Feature descriptions.

With the proposed features as shown in Table 3

, we can train a binary support vector machine classifier 

[7] to solute knowledgeable documents identification.

6 Experiments

In this section, we verify the performance of the proposed models in a real-word dataset from Tencent Wechat public platform. First, we evaluate the effectiveness of the proposed models. Second, we demonstrate the advantages of SSNN on saving both time and memory consumption in training processes.

6.1 Data Preparation

We construct experimental datasets by the articles from Automobile, Finance and Real Estate domains in Wechat public platform. For every domain, one thousand documents are manually labeled as knowledgeable or not according to the definitions in Section 3, and we get three experimental datasets. Meanwhile, we build a mixture experimental dataset, denoted as AFR, which consists of the data from all the selected domains. Hence, we totally have four experimental datasets. The details of the dataset are shown in Table 4. Then, every experimental dataset is divided into a training set and a test set, respectively. The training set contains 75% of the samples, and the remains constitute the test set.

Name KDR Content
Automobile 19.0% News, knowledge of driving and cars.
Finance 12.9% Financial news, stock analysis and financial knowledge
Real Estate 19.1% News and advertising of real estate, knowledge of decoration
AFR 17.0% A mixture of Automobile, Finance and Real Estate.

“KDR”: Knowledgeable Documents Rate, the rate of knowledgeable documents in each domain.

Table 4: Datasets descriptions.

6.2 Experimental Settings

6.2.1 Compared Methods

There are all together twelve methods are compared in the experiments, and they can be categorized as follows. The notations and denotations of these methods are shown as Table 5

  • Convolutional neural networks methods.

    Convolutional neural networks trained by individual experimental dataset are compared here. Both average and 1-max pooling are adopted for these CNNs. Hence, there are eight CNN-based methods in the comparison.

  • SVM based feature engineering methods. Two SVM approaches are compared here. The main difference between them are the feature set used. The first one utilizes the feature set introduced in Section 5. While the other one simply uses the TF-IDF vector based on the naive bag-of-word model.

  • The proposed SSNN methods. Since SSNN can predict knowledgeability for different domains simultaneously, we just train one SSNN with the training set of AFR and perform predictions on every experimental dataset. Similar with the methods based on CNN, two pooling techniques are also conducted here. Hence, we get two versions of SSNN.

Method Training Set Domain Pooling Feature
CNN(Auto) Automobile average -
CNN(Finance) Finance average -
CNN(RE) Real Estate average -
CNN(AFR) AFR average -
CNN(Auto) Automobile 1-max -
CNN(Finance) Finance 1-max -
CNN(RE) Real Estate 1-max -
CNN(AFR) AFR 1-max -
SSNN AFR average -
SSNN AFR 1-max -
SVM AFR - Feature set refer to Sec. 5.
Table 5: Compared Methods.

6.2.2 Parameter Settings

For all compared neural networks, we build them with the help of Theano 

[3, 4]

. Specifically, the mini-batch size is set to 10, the number of epoch is 10, the learning rate is set to 0.1.

The number of convolution kernel of CNN is set to 50, the size of convolution kernel of CNN is (200, 5). Hence, under this setting, we skip the sentences less than 5 words in CNN. The number of convolution kernel of CNN is set to 10, and the size of convolution kernel of CNN is set to (10, 3). The number of nodes of fully connected layer is 10.

For the SVM, we empirically set the length of window . Both SVM and SVM are utilized linear kernels, and we set punishment coefficient .

6.3 Experimental Results

Table 6 and Table 7 demonstrate the performance of compared methods over ROC and prediction accuracies. Every compared model is tested on four different experimental datasets, namely Automobile, Finance, Real Estate and AFR.

Data set Automobile Finance RE AFR
Table 6: The ROC of different methods.
Data set Auto Finance RE AFR
Table 7: The accuracy of different methods.

From these tables, we first can observe that the proposed SSNN surpasses other baselines on most of the cases. The only exception is CNN(AFR) achieves the best performance on the Real Estate experimental set over the classification accuracy measure. On the other hand, the average pooling performs better than the 1-max pooling in SSNN on both two measures, we conjecture that the reason is average pooling keeps the semantic of context at a certain extent, which is beneficial to knowledgeable document identification.

Then, we look closer to the results of CNN based methods. They actually perform satisfactorily in their own training domains. However, when they are applied to different domains, such as conducting CNN(Finance) in Real Estate, the performance slips significantly. This suggests that in experiments, the neural network trained by only one domain data are lack of generalization ability to another domain. In addition, there are few differences between the “avg” models versus corresponding “max” models of CNN on the prediction accuracy measure. However, on ROC, except CNN(Finance) (0.6536) versus CNN(Finance) (0.6931) in Real Estate, the “avg” models hold higher ROC over all the couples of models.

Next, we review the results of SVM-related models. Since we carefully summarize multiple features for the problem of knowledgeable documents extraction, SVM outperforms SVM

on most of the cases. However, compared with the neural network models, the approaches based on SVM are still weaker because of the reasons on feature selection or model generalization.

Finally, to further demonstrate the performance of SNN-avg, the ROC curve in the experimental set AFR is shown as Fig 4. From this figure, we observe that the curve of SSNN is leading the other models at most positions. It indicates that SSNN performs better than other models in such experimental set.

(a) Roc curves.
(b) Zoom in top left corner.
Figure 4: Roc curves over the experimental set AFR.

6.3.1 Knowledgeable Snippets Extraction

As we have discussed in section 4.3, the Eq. 4 is able to extract knowledgeable sentences. Hence, we use such formula to conduct knowledgeable extraction in the experimental datasets. We discover some typical knowledgeable snippets and unknowledgeable ones as shown in Table 8. Their saliency scores are highest in the documents where they appear. Observed from the table, the main characteristic of the knowledgeable sentences is that they define some concepts or discuss properties of entities and relations, while the unknowledgeable sentences mainly talk about advertisements or news which is not durable. The result is consistent with the definitions in Section 3, which indicates that our model is also able to annotate the knowledgeable snippets.

Domain Knowledgeable Unknowledgeable Sentences
Automobile When drive in the rainstorm, switching between low and high beam lamps is in favor of discovering the obstacles in the front. There will be super unbelievable discounts in Chevrolet cars this summer, Biaoyu, Shenzhen city.
Finance Controling data means integrating the industrial chains. Last week, 79 banks public launched 858 financial products.
Real Estate Generally speaking, ceramic valve cores are durable, have high and low-temperature resistance, abrasion resistance and corrosion resistance. 4000 hardbound rooms have been completed in 5 communities, Jiazhaoye, Shenzhen.

Table 8: Extraction results of knowledgeable or not-knowledgeable sentences.

We also excerpt an example of knowledgeable documents (translated from Chinese) as shown in Fig.5. Sentences with the highest knowledgeable scores are colored blue, while sentences with the lowest scores red. A sentence is more knowledgeable if its score is higher. The numbers in the squares denote the scores of the corresponding sentences.

This document demonstrates the concept of “road rage”. The proposed model highlights five sentences in the excerpt:

Figure 5: Example of knowledgeable snippet extraction — Come and see what is road rage.
  • At the afternoon of 2015.5.30, a video shot by the tachograph was wide-spread on the Internet. The video showed a female driver was being beaten by a male who said the female scared his child when she changed lane.” (score=-0.12)

    This sentence is unknowledgeable, because it just describes the main content of an online video.

  • Road rage, also known as paroxysmal rage disorder, is a type of anger which is caused by driving pressure and frustration in the traffic jam. It also refers to the aggressive or angry behaviors of automobile drivers.” (score=0.1)

    This sentence is knowledgeable since it defines the “road rage”.

  • The driver’s emotion is usually uncontrolled, and a slight traffic jam or scratch easily leads to violence during the driving.” (score=0.24)

    This sentence talks about two symptoms of whom have a road rage, namely “the emotion is usually uncontrolled” and “traffic jam or scratch easily leads to violence”.

  • The driver’s temper is like different person when driving and not driving.” (score=0.33)

    The sentence introduces another symptom of road range, which is knowledgeable.

  • The driver cannot stop honking even if the car in front of him is just a bit slower than others.” (score=0.22).

    Again, another symptom of road range is discussed in this sentence, which is knowledgeable as well.

Observed from those extracted knowledgeable sentences in experiments, we empirically believe they could conform to the definitions in Section 3 satisfactorily.

7 Conclusions

In this paper, we introduce the concepts of knowledgeable snippets and documents, and formulate the problem of annotating knowledgeable documents and snippets. To solve the problem, we proposed a novel SSNN method and a feature engineering method to identify those knowledgeable documents from web data. In addition, SSNN can be further utilized to annotate knowledgeable snippets from documents. The experiments on real data from Wechat public platform demonstrate the effectiveness of the proposed models.

8 Acknowledgements

The research work was supported by the National Key Research and Development Program of China under Grant No. 2017YFB1002104, the National Natural Science Foundation of China under Grant No. 91546122, 61573335, 61602438, 61473274, Guangdong provincial science and technology plan projects under Grant No. 2015 B010109005.


  • [1] Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives. Dbpedia: A nucleus for a web of open data. In ISWC/ASWC, pages 722–735, 2007.
  • [2] David Baehrens, Timon Schroeter, Stefan Harmeling, Motoaki Kawanabe, Katja Hansen, and Klaus-Robert Müller. How to explain individual classification decisions. JMLR, 2010.
  • [3] Frédéric Bastien, Pascal Lamblin, Razvan Pascanu, James Bergstra, Ian J. Goodfellow, Arnaud Bergeron, Nicolas Bouchard, and Yoshua Bengio. Theano: new features and speed improvements. In NIPS Workshop, 2012.
  • [4] James Bergstra, Olivier Breuleux, Frédéric Bastien, Pascal Lamblin, Razvan Pascanu, Guillaume Desjardins, Joseph Turian, David Warde-Farley, and Yoshua Bengio. Theano: a cpu and gpu math expression compiler. In Proceedings of the Python for scientific computing conference (SciPy), 2010.
  • [5] Andrew Cotter, Ohad Shamir, Nathan Srebro, and Karthik Sridharan. Better mini-batch algorithms via accelerated gradient methods. In NIPS, pages 1647–1655, 2011.
  • [6] Misha Denil, Alban Demiraj, and Nando De Freitas. Extraction of salient sentences from labelled documents. In ICLR, 2015.
  • [7] Sergey Feldman, Marius A Marin, Mari Ostendorf, and Maya R Gupta. Part-of-speech histograms for genre classification of text. In IEEE ICASSP, pages 4781–4784, 2009.
  • [8] Aidan Finn and Nicholas Kushmerick. Learning to classify documents according to genre. JASIST, 2006.
  • [9] Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge. A neural algorithm of artistic style. Nature Communications, 2015.
  • [10] M. A. Hearst. Automatic acquisition of hyponyms from large text corpora. In COLING, pages 539–545, 1992.
  • [11] Nal Kalchbrenner, Edward Grefenstette, and Phil Blunsom. A convolutional neural network for modelling sentences. In ACL, pages 655–665, 2014.
  • [12] Brett Kessler, Geoffrey Numberg, and Hinrich Schütze. Automatic detection of text genre. In ACL, pages 32–38, 1997.
  • [13] Yoon Kim. Convolutional neural networks for sentence classification. CoRR, 2014.
  • [14] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, pages 1097–1105, 2012.
  • [15] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. Distributed representations of words and phrases and their compositionality. In NIPS, pages 3111–3119, 2013.
  • [16] Fabian M Suchanek, Gjergji Kasneci, and Gerhard Weikum. Yago: a core of semantic knowledge. In WWW, pages 697–706, 2007.
  • [17] Kai Sheng Tai, Richard Socher, and Christopher D. Manning.

    Improved semantic representations from tree-structured long short-term memory networks.

    In ACL, pages 1556–1566, 2015.
  • [18] Peng Tang, Mingbo Zhao, and Tommy WS Chow. Text style analysis using trace ratio criterion patch alignment embedding. Neurocomputing, 2014.
  • [19] Zhaopeng Tu, Yifan He, Jennifer Foster, Josef Van Genabith, Qun Liu, and Shouxun Lin. Identifying high-impact sub-structures for convolution kernels in document-level sentiment classification. In ACL, pages 338–343, 2012.
  • [20] Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. Show and tell: A neural image caption generator. In CVPR, 2015.
  • [21] Wentao Wu, Hongsong Li, Haixun Wang, and Kenny Q. Zhu. Probase: a probabilistic taxonomy for text understanding. In ACM SIGMOD, pages 481–492, 2012.
  • [22] K Xu, J Ba, and R Kiros. Show, attend and tell: Neural image caption generation with visual attention. In ICML, pages 2048–2057, 2015.
  • [23] Matthew D Zeiler, Dilip Krishnan, Graham W Taylor, and Rob Fergus. Deconvolutional networks. In IEEE CVPR, pages 2528–2535, 2010.
  • [24] Matthew D Zeiler, Graham W Taylor, and Rob Fergus. Adaptive deconvolutional networks for mid and high level feature learning. In IEEE ICCV, pages 2018–2025, 2011.
  • [25] Semir Zeki. A vision of the brain. Tex Dent J, 1993.