1 Introduction
Automatically extracting top topics of a given area is fundamental in the historical analysis of the given area. With the ability of solving this problem, not only can we gain an accurate overview of the given area, but it can also help make our society more efficient, such as giving suggestions on how to optimize the allocation of resources (e.g., research fundings) to more representative and important topics. This can also provide guidances to newcomers of the area. However, there are too many topics in almost any areas, and for any researcher, it is nontrivial for him/her to extract the top topics of the given area in a short period of time, especially if the researcher is a newcomer to the area. Therefore it is important to find a way to automatically solve this problem.
While much research has been conducted on the topic extraction problem, their main focus is basically on document topic extraction, but not on area topic extraction. For example, in [Blei, Ng, and Jordan2003, Griffiths and Steyvers2004], latent dirichlet allocation (LDA) model is used to model topics in documents and abstracts, where topics are represented as multinomial distributions over words. Topics can also be represented as keyphrases (or topical phrases), and under this perspective, keyphrases extraction task can also be viewed as topic extraction task. Different models such as frequencybased [Salton and Buckley1997], graphbased [Mihalcea and Tarau2004], clusteringbased [Grineva, Grinev, and Lizorkin2009] and so on have been explored to address the keyphrase extraction problem, but still focus on a document instead of an area.
The problem of area topic extraction is novel, nontrivial and poses a set of unique challenges as follows: (1) How to formulate the problem and using what kind of datasets and how to use it is not clear. (2) How to capture the representativeness of topics for a given area is another challenging issue. (3) The number of candidate topics in a given area may be very large. There are 14,449,404 page titles (including categories) in Wikipedia and even after we do some preprocessing on it, we still get 9,355,550 topics. Thus how to develop an efficient algorithm to apply it in practice is important too. (4) Since there are no standard benchmarks that can perfectly match this problem, how to quantitatively evaluate the results is also a challenging issue.
To address these challenges, in this paper, we give a formal definition of the problem and develop an optimization model to efficiently solve it. Our contributions can be summarized as follows:

To the best of our knowledge, this is the first attempt to formulate and address the area topic extraction problem. We formulate the problem as extracting top topics that can best represent a given area with the help of knowledge base. We theoretically prove that the problem is NPhard.

We propose an optimization model, FastKATE, to address this problem by combining both explicit and latent representations for each topic. We leverage a largescale knowledge base (Wikipedia) to generate topic embeddings using neural networks and use this kind of representations to help capture the representativeness of topics for given areas. We develop a fast heuristic algorithm to efficiently solve the problem with a provable error bound.

We evaluate the proposed model on three realworld datasets. Experimental results demonstrate our model’s effectiveness, robustness, realtimeness (return results in s), and its superiority over several alternative methods.
2 Problem Formulation
We first provide necessary definitions and then formally define the problem.
Definition 1.
Knowledge Base and Topic. A knowledge base is represented as a triple , where represents a set of knowledge concepts, and we also view this as topics in this paper. represents a set of relations between topics. represents a set of coexistences between topics, i.e., each is a sequence of topics , where .
This definition is a variation of that in [McGuinness, Van Harmelen, and others2004, Tang et al.2015]. In our work, represents a corpus consisting of massive documents, and each document is a sequence of topics. Relations may have various types; we focus on subtopic and supertopic relations in our work.
Each topic in a knowledge base already has a corresponding topical phrase, such as “Artificial Intelligence”. To help grasp the relations/similarities between these topical phrases, we also represent each topic
as a vector
in a latent feature space, where is the dimension of the feature space, which will be detailed in section 3.1. Thus each topic in our work has both explicit representation (i.e., topical phrase) and latent representation (i.e., vector).Definition 2.
Area. In this paper, an area is essentially also a topic in . Thus it has the same form and attributes as other topics in
. An area may be also a topic of some other area. For example, Machine Learning is an area, and it can also be viewed as a topic of Artificial Intelligence area.
We leverage a knowledge base to help extract topics from a given area in our work. We formally define the problem as follows.
Problem 1.
Extracting top topics in a given area.
The input of this problem includes an external knowledge base , a given area and the number of topics needed to be extracted.
The output of this problem is a set of top topics which can represent the given area best.
Our goal is to learn a function from the given input so as to extract the top topics which can represent the given area best. More specifically, is defined as:
(1) 
This problem is equivalent to selecting topics from the topics set that can represent the given area best. We use to denote the degree of how well a set of topics can represent all topics in on the given area . Without loss of generality, we assume . And since adding new topics to should not reduce the representativeness of previous extracted topics, should be monotonically nondecreasing. We also assume reasonably that topics added in early steps should not help (actually it may damage) topics added in later steps increase the value of the goal function. This means that a topic added in later steps contribute equal or possibly less to the goal function compared with that when the same topic is added in early steps. This is intuitive and reasonable because if a topic is added in later steps, some previous added topics may already have a good representation of the area, and thus this topic’s contribution to the goal function may be decreased. We will show that this attribute implies the goal function’s submodulariry [Svitkina and Fleischer2011] in the following section. Then our problem can be reformulated as follows:
where is a nonnegative and monotonically nondecreasing function.
3 The Proposed Model
We propose FastKATE (Fast topK Area Topics Extraction) to address the problem. In general, FastKATEnot only represents topics in explicit forms (phrases) as in knowledge bases, but also represents topics as vectors in a latent feature space, and uses a neural networkbased method to learn topic embeddings from an external largescale knowledge base. FastKATEfurther incorporates domain knowledge from the knowledge base to assign “general weights” to different topics to help solve the problem. We develop a heuristic algorithm to efficiently solve the defined problem and we prove our algorithm is at least of the optimal solution. We further develop a fast implementation of our algorithm which can return results in realtime.
3.1 Topics Representation
We first generate and use it as candidate topics and then train embeddings for each topic . We use Wikipedia as our knowledge base to help generate candidate topics and train topics embeddings. We extract 14,449,404 titles of all articles and categories from Wikipedia, and convert them into lower forms and remove possible duplicates and those consisting of punctuations from these titles. Finally we get 9,355,550 titles as candidate topics. Then we use an unsupervised neural networkbased method to learn the embeddings of these topics. We then preprocess the Wikipedia corpus to keep only candidate topics in the corpus, and use the preprocessed Wikipedia corpus as our training data. We adopt a similar method to that used in Word2Vec [Mikolov et al.2013]. We treat each topic as a single token, and use a SkipGram model to generate each topic’s embedding. In the SkipGram model, the training objective is to find topic embeddings that are useful for predicting surrounding topics. More formally, given a sequence of training topics,
, the objective of the SkipGram model is to maximize the average log probability
where is the size of the training context (also denoted as window size), and is defined using the softmax function:
where and are the embeddings of “input topic” and “output topic” respectively, and is the number of total candidate topics. Because is very large, this calculation is very computationally expensive. Thus we adopt a common approximation in our model: Negative Sampling (NEG) [Mikolov et al.2013], which can speed up the training process greatly. Using NEG, is replaced by:
where , is the noise distribution of topics, and is the number of negative samples of each topic. Thus the task is to distinguish the target topic from draws from the noise distribution . We also do subsampling [Mikolov et al.2013] of frequent topics in our model to counter the imbalance between rare and frequent topics: each topic in the training set is discarded with probability computed by the formula:
where is the frequency of topic and is a chosen threshold, typically around .
3.2 Top Area Topics Extraction
As stated in section 2, our problem is formulated as an optimization problem:
(2) 
where is a function that denotes the degree of how well a set of topics can represent all topics in on the given area .
NPhardness.
We first prove the problem is NPhard by reducing Dominating Set Problem[Karp1972, Gary and Johnson1979] to this problem as follows.
Proof.
For , we first define the relativeness between as , and if , we assign an undirected edge between and ; otherwise, there is no edge between and . Thus we get get an undirected graph of all concepts in , where is the set of all edges in .
Then we define as: if such that ; otherwise. And then we define as:
We then show that if we can find the maximum value , we can also decide that for the given number , whether there exists a dominating set where and such that . The reduction process is as follows: we compare with which is the number of concepts in , and according to our definition of and , it must hold that . If , then , such that , which means there exists a dominating set such that ; if , then , such that , , which means there does not exist a dominating set such that . ∎
Heuristic Algorithm.
Since the problem is NPhard, we propose an approximate heuristic algorithm in our model to solve it, as outlined in Algorithm 1, and detailed as follows. The main idea is that we select topics one by one, and in the th step, we select topic such that
where is the selected topics set before the th step. To calculate , we introduce the general weight to measure the importance of topic in the given area . We call general weight because this value will be set by utilizing some domain knowledge and may probably be not very precise and can only measure the importance of in area to some general extent. We will demonstrate the calculation process of in the following part. Then we define as:
and define as:
where represents the relativeness between and .
After we get the embeddings of topics in section 3.1, we can calculate as follows:
where and are the embeddings of and respectively.
General Weight Calculation.
To calculate the general weight of topic in the given area , we incorporate the domain knowledge from an external largescale knowledge base into our model. This shares a similar idea as Distant Supervision [Mintz et al.2009]. We still use Wikipedia as our knowledge base here, and use category information of the given area as the domain knowledge to help calculate . The idea behind the calculation of general weight is that topics in shallower depth of subcategories of are probably more important in area . More specifically, we calculate in the following steps:

Find the category that represents in Wikipedia, which is also denoted as .

For the given area , extract its all subcategories recursively from Wikipedia, where is the root category and represents the th subcategory in depth .

Calculate the general weight of topic as: , where is the depth of topic in ’s subcategories if ; otherwise (or equivalently set if we want to put all topics in ). is a monotonically decreasing function of , and can be selected empirically.
3.3 Algorithmic Analysis
We argue that Algorithm 1 has at least an approximate of the original NPhard problem. We first prove that the goal function of the original optimization problem is nonnegative, monotonically nondecreasing, and submodular, and then we use these properties to prove its error bound. By definition the goal function is nonnegative and monotonically nondecreasing; thus we only show its submodularity as follows.
Proof.
As stated before, the problem is formulated as follows:
where is the goal function which represents the degree of how well topics set can represent in the given area . For a given topic , we first denote , which means the increment to the goal function by adding to . Then we add a topic and to , and denote . By the attribute of we assume in section 2, we have , which means the goal function is submodular.
∎
Since the goal function of our problem is monotonically increasing, nonnegative and submodular, the solution generated by Algorithm 1 is at least of the optimal solution [Nemhauser, Wolsey, and Fisher1978, Kempe, Kleinberg, and Tardos2003].
3.4 Fast Implementation
The time complexity of Algorithm 1 is , where is the number of topics needed to be extracted and is the number of elements in . In practical use, , but may be (tens of) millions of order of magnitude (i.e., we extract 9,355,550 candidate topics from Wikipedia as mentioned above). Thus Algorithm 1 still seems infeasible and may take unbearable time to return results (which is actually the case in our experiments). However, we observe the following two facts:

Most of the candidate topics in the whole set are not relevant to a given area.

When the general weight of a topic is little enough, this topic’s contribution to the whole sum ( in Algorithm 1) may be little enough too.
From the above two observations, we think of the following two strategies which can greatly speed up our algorithm:

We only keep topics within a depth in the given area’s category as highquality candidate topics from the original set.

Since the general weight function of a topic is monotonically decreasing with the topic’s depth , thus we can choose a depth (with a welldefined ) such that the contributions of all topics below this depth are small enough and can be discarded without calculation.
And this can lead to a much faster algorithm with time complexity , as summarized in Algorithm 2, where and represent highquality candidate topics set and contributive topics set respectively, and in practical use we have .
4 Experimental Results
We train our model on one of the largest public knowledge base (Wikipedia). As there are no standard datasets with ground truth and also it is difficult to create such a data set of ground truth, for evaluation purpose, we collect three realworld datasets and choose five representative areas in computer science: Artificial Intelligence (AI), Computer Vision (CV), Machine Learning (ML), Natural Language Processing (NLP), and Software Engineering (SE) to compare the performance of our model with several alternative methods. But our model is not restricted to these areas and can be applied to any other areas theoretically. The datasets and codes are publicly available, and a demo is ready
^{1}^{1}1https://github.com/thuzhf/FastKATE—Inputs are: (1) : area name in the form of a topical phrase (words are connected by a underline, such as “artificial_intelligence”). (2) : the number of topics needed to be extracted. Outputs are: (1) Extracted topics ranked by and accompanied with their scores ( in Section 3.2). (2) Running time.4.1 Datasets
We download Wikipedia data from wikidump^{2}^{2}2https://dumps.wikimedia.org/enwiki/latest/ as our knowledge base , use its (preprocessed) titles of all articles and categories as , use the text of all articles as and use its category structures as . After we preprocess the titles as stated in section 3.1, we get 9,355,550 candidate topics in . As stated in section 3.1, we use full text of Wikipedia to train topics embeddings and view each topic as a whole in Word2Vec model, and we use Gensim^{3}^{3}3https://radimrehurek.com/gensim/models/word2vec.html to help implement our model. The parameter settings are as follows: vector size , window size , min count of each topic , threshold () for downsampling , min sentence length , num workers ; for other parameters, we use default settings in Gensim. The collected three realworld datasets for evaluation are detailed as follows.
ACM CCS classification tree. ACM CCS classification tree^{4}^{4}4http://www.acm.org/about/class/class/2012 is a polyhierarchical ontology and contains 2,126 nodes in total. In this tree (actually a directed acyclic graph), each node can be viewed as a topic and each nonleaf node has several children nodes as its subtopics. Although different nodes may have different number and different granularity of nodes in its subtree, it still provides us a guidance that what may be top topics in a given area.
Microsoft Fields of Study (FoS). Microsoft Fields of Study (FoS) from its Microsoft Academic Graph (MAG)^{5}^{5}5https://www.microsoft.com/enus/research/project/microsoftacademicgraph/ is a directed acyclic graph where each node also represent a topic and it contains 49,038 nodes in total. Each node in the graph is accompanied with a “level” representing its depth/granularity in the graph. The network has 4 different levels in total. Each node has supernodes of different levels as its supertopics and each supertopic is accompanied with a confidence value. The confidences of all supernodes of the same level of one topic sum to . This dataset can also provide us a guidance that what may be top topics in a given area.
Domain Experts Annotated Dataset. As there are actually no standard datasets/benchmarks which perfectly match our problem, we also let domain experts directly annotate top () topics in the five given areas without giving any single dataset for reference.
For the first two datasets, we let domain experts select top topics based on each given area’s subtopics to match our problem better. Because there may be too much nodes in certain area’s subtopics, we instruct domain experts to first select a larger set of topics than needed and then do secondary screening from them. Since we need to do annotations in all three datasets, we set up the following criterions to reduce subjectivity in the annotation process and help domain experts reach an agreement:

Selected topics should be more significant than other topics in the given area.

Selected topics should cover the whole given area as far as possible. This implies that they should not be too similar with each other in the given area, such as Artificial Neural Networks and Neural Networks should be viewed as the same topic in AI area.
After we get the results of each domain expert, we count the number of each selected topic and rank them by their counts, and choose the top from them as the ground truth of the given area. We empirically set in our experiment.
4.2 Evaluation Metrics
To quantitatively evaluate the proposed model, we consider the following two metrics.
Presion@k.
Since the number of extracted results are set to the same for domain experts and machines, we use Presion@ to measure the performance of different methods. Since the order is also important in the extracted results, we introduce another metric as follows.
Mean Average Precision (MAP).
For a single result (such as a ranked list in our experiments), AP is defined as follows:
where is the number of all correct items (i.e., the length of humanannotated ranked list); n is the length of the machine extracted ranked list (which is the same as in our experiments); equals 0 when the th item is incorrect or equals the precision of the first items in the ranked list. MAP is then calculated by averaging the APs over all results.
4.3 Comparison Methods
For each given area , we first extract all its subcategories within a depth of (), and use them as candidate topics . We then extract all articles of these candidate topics from Wikipedia for LDA and TextRank methods here, where represents the corpus in Wikipedia as introduced in section 2.

Topic TFIDF (TFIDF): We calculate each candidate topic’s tfidf [Jones1973] value in the whole Wikipedia corpus (viewing each article as a document), and rank all candidate topics by these values.

LDA: We train LDA [Blei, Ng, and Jordan2003] model on all documents in . For each candidate topic (note that this is in the form of a topical phrase, and it is not the extracted topics in LDA which is actually multinomial distributions over words), we calculate its weight as follows:
where is the number of topics extracted by the LDA model, is the probability of th topic of the LDA model in the th article, and is the probability of in th topic of the LDA model. When training, we remove those documents with words. We utilize Gensim^{6}^{6}6https://radimrehurek.com/gensim/models/ldamodel.html to help implement this model, and we use all default parameters of it except we set .

TextRank: We run TextRank [Mihalcea and Tarau2004] algorithm on each article in , and for each candidate topic (in the form of a topical phrase), we calculate its weight as follows:
where is the weight generated by TextRank of in th article of .

FastKATE: This is our model outlined in Algorithm 2. Due to the unbearable running time of Algorithm 1, we think it is impractical and thus do not compare its result with others. We empirically select . We select two different settings of to compare their performances and time costs: (1) (denoted as FastKATE1), (2) (denoted as FastKATE2).
4.4 Results and Analysis
Accuracy Performance.
Table 1 lists the performances of different methods used in the problem of extracting top topics in a given area. In terms of Precision@, our model FastKATE2 performs consistently the best on all three datasets and in all five areas. In terms of MAP, our model FastKATE2 performs the best in cases. This suggests our model can not only extract more correct top topics but also rank them in more accurate order. We can also see that FastKATE1 (it is different from FastKATE2 only in parameter settings) performs the second best in most cases, which suggests that even with different parameter settings, our model is still very effective comparing to other methods so that our model is also robust.
We note that average performances of all methods on the first two datasets (ACM CCS and Microsoft FoS) are worse than on the third dataset, which is annotated by domain experts specially for this problem. This is easy to understand since there are actually no existing datasets/benchmarks that can perfectly match this problem, and although the first two datasets are highlyrelated to the problem, they are not specialized for this purpose. And this is the reason that we annotate our own datasets with the help of domain experts directly, and we think the third dataset is more capable of reflecting the performances of different methods on this problem.
It is beyond our expectation that FastKATE2 performs better than FastKATE1 in most cases, because FastKATE2 uses smaller contributive topics set than FastKATE1 () and thus seems accessing less information than FastKATE1. We think one possible reason is that the contributive topics set becomes more noisy when they go deeper, and when we restrict the depth to only one, we have cleaner data and thus may get better results. Besides, as stated in section 3.4, when the depth is smaller, our algorithm runs faster. We record the average running time of our two models over 100 times runs on all five areas in Table 3. We can see that FastKATE2 is faster than FastKATE1 and can return results in realtime.
4.5 Case Study
Table 2 lists extracted topics in AI area using TFIDF, LDA, TextRank and FastKATE2 respectively. We can see that most extracted topics by our model are of highquality and are more convincing compared to other methods.
5 Related Work
Our work is mainly related to the work from the following three aspects: topic modeling, automatic keyphrase extraction and word/phrase embedding. (1) Topic Modeling. Topic modeling has been widely used to extract topics from largescale scientific literature [Blei, Ng, and Jordan2003, Griffiths and Steyvers2004, Steyvers and Griffiths2007]. Topics in these models are usually in the form of multinomial distributions over words, which makes it hard for researchers to identify which specific topics these distributions stand for [Mei, Shen, and Zhai2007]. To address this challenge, many work has been conducted to find an automatic or semiautomatic way to label these topic models [Mei, Shen, and Zhai2007, Ramage et al.2009, Lau et al.2011], which alleviate this problem to some extent. (2) Automatic Keyphrase Extraction. There are mainly two approaches to extracting keyphrases: supervised and unsupervised. In supervised methods, the keyphrase extraction problem is usually recasted as a classification problem [Witten et al.1999, Turney2000] or as a ranking problem [Jiang, Hu, and Li2009]. Existing unsupervised approaches to keyphrase extraction can be categorized into four groups [Hasan and Ng2014]: graphbased ranking [Mihalcea and Tarau2004], topicbased clustering [Liu et al.2009], simultaneous learning [Wan, Yang, and Xiao2007] and language modeling [Tomokiyo and Hurst2003]. (3) Word/Phrase Embedding. Feature learning has been extensively studied by the machine learning community under various headings. In natural language processing (NLP) area, feature learning of words/phrases is usually referred to as word/phrase embedding, which means embedding words/phrases into a latent feature space [Roweis and Saul2000, Mikolov et al.2013]. This method can help calculate relations/similarities between words/phrases. In our work, we embed topics into a latent feature space, which is similar to this line of work.
6 Conclusion
In this paper, we formally formulate the problem of top area topics extraction. We propose FastKATE in which topics have both explicit and latent representations. We leverage a largescale knowledge base (Wikipedia) to learn topic embeddings and use this kind of representations to help capture the representativeness of topics for given areas. We develop a heuristic algorithm together with a fast implementation to efficiently solve the problem and prove it is at least of the optimal solution. Experiments on three realworld datasets and in five different areas validate our model’s effectiveness, robustness, realtimeness (return results in s), and its superiority over other methods. In future, we plan to integrate more knowledge bases and also try to apply our model to a broader range of problems.
References
 [Blei, Ng, and Jordan2003] Blei, D. M.; Ng, A. Y.; and Jordan, M. I. 2003. Latent dirichlet allocation. Journal of machine Learning research 3(Jan):993–1022.
 [Gary and Johnson1979] Gary, M. R., and Johnson, D. S. 1979. Computers and intractability: A guide to the theory of npcompleteness.
 [Griffiths and Steyvers2004] Griffiths, T. L., and Steyvers, M. 2004. Finding scientific topics. Proceedings of the National academy of Sciences 101(suppl 1):5228–5235.
 [Grineva, Grinev, and Lizorkin2009] Grineva, M.; Grinev, M.; and Lizorkin, D. 2009. Extracting key terms from noisy and multitheme documents. In Proceedings of the 18th international conference on World wide web, 661–670. ACM.
 [Hasan and Ng2014] Hasan, K. S., and Ng, V. 2014. Automatic keyphrase extraction: A survey of the state of the art. In ACL (1), 1262–1273.
 [Jiang, Hu, and Li2009] Jiang, X.; Hu, Y.; and Li, H. 2009. A ranking approach to keyphrase extraction. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, 756–757. ACM.
 [Jones1973] Jones, K. S. 1973. Index term weighting. Information storage and retrieval 9(11):619–633.
 [Karp1972] Karp, R. M. 1972. Reducibility among combinatorial problems. In Complexity of computer computations. Springer. 85–103.
 [Kempe, Kleinberg, and Tardos2003] Kempe, D.; Kleinberg, J.; and Tardos, É. 2003. Maximizing the spread of influence through a social network. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, 137–146. ACM.
 [Lau et al.2011] Lau, J. H.; Grieser, K.; Newman, D.; and Baldwin, T. 2011. Automatic labelling of topic models. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language TechnologiesVolume 1, 1536–1545. Association for Computational Linguistics.
 [Liu et al.2009] Liu, Z.; Li, P.; Zheng, Y.; and Sun, M. 2009. Clustering to find exemplar terms for keyphrase extraction. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1Volume 1, 257–266. Association for Computational Linguistics.
 [McGuinness, Van Harmelen, and others2004] McGuinness, D. L.; Van Harmelen, F.; et al. 2004. Owl web ontology language overview. W3C recommendation 10(10):2004.
 [Mei, Shen, and Zhai2007] Mei, Q.; Shen, X.; and Zhai, C. 2007. Automatic labeling of multinomial topic models. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, 490–499. ACM.
 [Mihalcea and Tarau2004] Mihalcea, R., and Tarau, P. 2004. Textrank: Bringing order into text. In EMNLP, volume 4, 404–411.
 [Mikolov et al.2013] Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G. S.; and Dean, J. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, 3111–3119.
 [Mintz et al.2009] Mintz, M.; Bills, S.; Snow, R.; and Jurafsky, D. 2009. Distant supervision for relation extraction without labeled data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2Volume 2, 1003–1011. Association for Computational Linguistics.
 [Nemhauser, Wolsey, and Fisher1978] Nemhauser, G. L.; Wolsey, L. A.; and Fisher, M. L. 1978. An analysis of approximations for maximizing submodular set functions—i. Mathematical Programming 14(1):265–294.
 [Ramage et al.2009] Ramage, D.; Hall, D.; Nallapati, R.; and Manning, C. D. 2009. Labeled lda: A supervised topic model for credit attribution in multilabeled corpora. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1Volume 1, 248–256. Association for Computational Linguistics.
 [Roweis and Saul2000] Roweis, S. T., and Saul, L. K. 2000. Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326.
 [Salton and Buckley1997] Salton, G., and Buckley, C. 1997. Termweighting approaches in automatic text retrieval. Morgan Kaufmann Publishers Inc.
 [Steyvers and Griffiths2007] Steyvers, M., and Griffiths, T. 2007. Probabilistic topic models. Handbook of latent semantic analysis 427(7):424–440.
 [Svitkina and Fleischer2011] Svitkina, Z., and Fleischer, L. 2011. Submodular approximation: Samplingbased algorithms and lower bounds. SIAM Journal on Computing 40(6):1715–1737.
 [Tang et al.2015] Tang, J.; Zhang, C.; Cai, K.; Zhang, L.; and Su, Z. 2015. Sampling representative users from large social networks. In AAAI, 304–310. Citeseer.
 [Tomokiyo and Hurst2003] Tomokiyo, T., and Hurst, M. 2003. A language model approach to keyphrase extraction. In Proceedings of the ACL 2003 workshop on Multiword expressions: analysis, acquisition and treatmentVolume 18, 33–40. Association for Computational Linguistics.
 [Turney2000] Turney, P. D. 2000. Learning algorithms for keyphrase extraction. Information retrieval 2(4):303–336.

[Wan, Yang, and Xiao2007]
Wan, X.; Yang, J.; and Xiao, J.
2007.
Towards an iterative reinforcement approach for simultaneous document summarization and keyword extraction.
In ACL, volume 7, 552–559.  [Witten et al.1999] Witten, I. H.; Paynter, G. W.; Frank, E.; Gutwin, C.; and NevillManning, C. G. 1999. Kea: Practical automatic keyphrase extraction. In Proceedings of the fourth ACM conference on Digital libraries, 254–255. ACM.
Comments
There are no comments yet.