Opinion summarization aims to generate a concise and digestible summary of user opinions, like those from the internet sources, such as blogs, social media, e-commerce websites, etc. It is especially helpful when the large and growing number of such opinions becomes overwhelming for users to read and process [16, 8]. In this work, we focus on extractive opinion summarization from online product reviews. The goal of this task is to take a collection of reviews of the target product (e.g., a television) as input and selects a subset of review excerpts as a summary. The last two boxes of Figure 1 show an example of user reviews of a television and a corresponding extractive summary.
This example illustrates that opinion summarization differs from the more general task of multi-document summarization in two major ways. First, while general summarization aims to retain the most important content, opinion summarization needs to cover a range of popular opinions and reflect their diversity . Second, opinion summary is more centered on the various aspects (i.e., components, attributes, or properties) of the target product, and their corresponding sentiment polarities . For example, highlighted sentences in Review 3 of Figure 1 express reviewer’s negative opinions about the aspects of Sound and Image. To reflect these differences, hu2004mininghu2004mining introduced a three-step pipeline to create an opinion summary by 1) mining product-related aspects and identifying sentences related to those aspects; 2) analyzing the sentiment of the identified sentences; and 3) summarizing the results. Each of these three tasks has often been addressed using supervised methods. Despite the fairly high performance, these methods require the corresponding human-annotated data. Even worse, they suffer from the inability to adapt across different domains or product categories (e.g., televisions and backpacks have different aspects). In this paper, we address these problems without the usage of human annotation.
Previous works addressed these problems using pure unsupervised methods, but found it is challenging to detect the aspect-related segments of reviews (e.g., those highlighted in Figure 1
) with both high precision and recall. A better solution is to utilize knowledge sourced from existing external information about the target product i.e., the information beyond the customers’ reviews. For example, on Amazon’s product webpage, we can obtain not only customer reviews but also product-related information, such as the overall description, the feature descriptions (The top of Figure 1 gives an example), and attributes tables. These external information sources widely exist on e-commerce websites and are easily accessible. More importantly, they are closely related to the aspects of products and therefore are great resources to facilitate the aspect identification task. Automatically learning aspects from such external sources can reduce the risk that human-assigned aspects may be biased, unrepresentative, or not have the desired granularity. Meanwhile, it makes the model easy to adapt to different product categories. Here we use the feature descriptions of products as the information source, and leave other sources for future work.
In this work, we propose a generative approach that relies on the aspect-aware memory (AspMem) to better leverage this knowledge during aspect identification and opinion summarization. AspMem, which is inspired by Memory Networks 
, is an array of memory cells to store aspect-related knowledge obtained from external information. These memory cells cooperate with the model throughout learning, and judge the relevance of review sentences to the product aspects. Then the relevance is combined with the sentiment strength to determine the salience of an opinion. Finally, we extract a subset of salient opinions to create the final summary. By formalizing the subset selection process as an Integer Linear Programming (ILP) problem, the resulting summary maximizes the collective salience scores of the selected sentences while minimizing information redundancy.
We demonstrate the benefits of our model on two tasks: aspect identification and opinion summarization, by comparing with previous state-of-the-art methods. On the first task, we show that even without any parameters to tune, our model still outperforms previously reported results, and can be further enhanced by introducing extra trainable parameters. For the summarization task, our method exceeds baselines on a variety of evaluation measures.
Our main contributions are three-fold:
We address the task of opinion summarization without using any task-specific human supervision, by incorporating domain knowledge from external information.
We propose a generative approach to better leverage such knowledge.
We experimentally demonstrate the effectiveness of the proposed method on both aspect identification and summarization tasks.
2 Related Work
This work spans two lines of research: aspect identification of review text, and review summarization, which are discussed next.
2.1 Aspect identification
Customers give their aspect-related opinions by either explicitly mentioning the aspects (e.g., high price
) or using implicit expressions (e.g., expensive), which makes aspect identification a challenging task. Supervised methods use sequence labeling models or text classifiers to identify the aspects. Rule-based methods rely on frequent noun phrases and syntactic patterns [14, 28]. Most unsupervised methods are based on LDA and its variants, and interpret the latent topics in reviews as aspects [24, 31]. However, LDA does not perform well in finding coherent topics from short reviews. Also, while topics and aspects may overlap, there is no guarantee that these two are the same.
To address the first problem, he2017unsupervisedhe2017unsupervised propose ABAE, an unsupervised neural architecture, to enhance the topic coherence by leveraging pre-trained word embeddings. They learn the embedding for each aspect from the word embedding space through a reconstruction loss. For the second problem, angelidis2018summarizingangelidis2018summarizing propose MATE, which determines the aspect embeddings in ABAE using embeddings of a few aspect-related seed-words. These seed-words are extracted from a small dataset (about 1K sentences) with human-annotated aspect labels. We borrow their idea of using aspect embeddings and seed-words. The difference is that we collect the seed-words from external information automatically. Also, while both of their models are discriminative, we propose a generative model to better leverage the seed-words.
2.2 Opinion summarization
Most methods in multi-documents summarization are extractive in nature, i.e., rank and select a subset of salient segments (i.e., words, phrases, sentences, etc.) from reviews to form a concise summary . The ranking of each unit relies on a score to evaluate its salience, and the selection is conducted greedily  or globally [23, 26, 3]. For example, yu2016productyu2016product score phrases based on their popularity and specificity. ganesan2012micropinionganesan2012micropinion rank phrases based on their representativeness and readability and then create the summary via depth-first search. angelidis2018summarizingangelidis2018summarizing combine aspect and sentiment to identify salient opinions, which is also adopted in our work. The difference is that we use a more precise and flexible method to calculate the aspect-relevance of reviews. Meanwhile, rather than selecting the review segments greedily which can yield sub-optimal solutions, we use ILP to find its optimal subset.
To the best of our knowledge, the only work that uses external information to enhance summarization is by narayan2017neuralnarayan2017neural, who use title and image captions to assist supervised news summarization. Another direction focuses on abstractive methods to generate new sentences from the source text [11, 5, 2].
3 Problem Formulation
Extractive opinion summarization aims to select a subset of important opinions from the entire opinion set. For product reviews, the opinion set is a collection of review segments of a certain product. Formally, we use to denote all the products belonging to the -th category (e.g., televisions or bags) in the corpus. Given a target product , the corpus contains reviews of this product, while each review contains segments . We also collect the feature description of the product as external information, which contains feature items . The summarization model aims to select a subset of important opinions that summarize reviews of the product .
As previously mentioned, one challenge during summarization is to identify aspect-related opinions. In Sec. 4, we show how the proposed AspMem can tackle this problem, and how to incorporate domain knowledge to enhance model performance. The ranking and selection of the review segments are described in Sec. 5.
4 Aspect Identification
4.1 AspMem: Aspect-aware memory
This section describes the proposed AspMem model to identify the aspect-related review segments. AspMem contains an array of memory cells to store aspect-related information. Each cell relates to one specific aspect, and has a low-dimensional embedding in the semantic space, where is the dimension of the embedding. Each word in a review segment also has an embedding in the same semantic space.
Similar to topic models, we assume the review segment
is generated from these aspect (topic) memories. However, the LDA-based topic models parameterize the generation probability at word-level, which is too flexible to model short segments in reviews. We instead regard the review segment as a whole from a single aspect during generation, but allow every word to have a different contribution to the segment representation.
Given a review segment , the probability that this segment is generated by the -th aspect
where is the embedding of the segment , and is defined as the weighted average over embeddings of the words in :
is the attention weight of the word and is proportional to ’s generation probability. That is, we focus more on those words which are more likely to be generated by the aspect memories. To compute these weights, we define the probability of being generated from in a similar way:
Without any prior domain knowledge of the aspects, the latent embeddings
and the prior probabilities of aspectsare parameters (denoted by
) and can be estimated by minimizing the negative log-likelihood of the corpus(i.e., all the review segments belonging to the same product category):
The estimation of the likelihood part is similar to Eq. 4. The second term is a regularization term, where is the aspect embedding matrix with row normalization, and
is the identity matrix. It encourages the learned aspects to be diverse, i.e., the aspect embeddings are encouraged to be orthogonal to each other.is the hyper-parameter of the regularization.
Once we obtain all the parameters, we can calculate the probability of the review segment belonging to the aspect as
and then select the aspect with the highest posterior probability as the identified aspect.
4.2 Incorporating Domain knowledge
The aspect embeddings estimated merely from the data have several shortcomings. First, the model may learn some topics that are irrelevant to the aspects of products, such as sentiments and user profiles. Second, it is difficult to control the granularity of the learned aspects, which may lead to too coarse- or fine-grained aspects.
To address these problems, a simple yet effective method is to use domain knowledge about products. Specifically, rather than estimating according to Eq. 6, one could collect several aspect-related seed-words, (e.g., picture, color, resolution, and bright for the Display aspect), and average the embeddings of these seed-words to produce . Previous works have shown the benefit of such knowledge [9, 1], but they have to encode this knowledge manually or from the human-annotated data, which makes these methods less easy to adapt across product categories.
As we mentioned in Sec. 1, feature descriptions of products can be a valuable external resource for seed-words mining. Here we describe our unsupervised method of collecting the seed-words from it. To increase the size of this resource, we assume all products in the same category have shared aspects, and collect seed-words from the category level. For each product category , we collect the feature items from all products of the same category as the document, i.e., , and then apply TF-IDF to extract seed-words from it 111We also tried other algorithms, but the differences were not significant.. For TF-IDF to work, we need the seed-words to have high term frequency and the general words have high document frequency. We therefore aggregate all the items in as one single document, and regard the remaining items belonging to other categories as individual documents to build the corpus. For example, assume we have six product categories, while each category contains ten products, and each product has ten feature descriptions. We therefore have 600 feature descriptions in total. To extract the seed-words of one category (e.g., the TV), we concatenate the 100 TV-related descriptions as one single document, while regarding the other 500 descriptions as individual documents. We then calculate the TF-IDF of each word based on these 501 documents. Finally, we select the top words with the highest TF-IDF value as seed-words of the product category .
5 Summary Generation
In summary generation stage, we first evaluate the salience of each opinion segment, and then select a subset of opinions which form the final summary.
5.1 Salience of the opinion
Following angelidis2018summarizingangelidis2018summarizing, we evaluate the salience of a review segment from two perspectives: the relevance to aspects, and the sentiment strength.
Relevance depicts how relevant a segment is to the various aspects of the product. Since one segment may relate to more than one aspect (e.g., The color is excellent but the sound is terrible.), we calculate relevance at the word level rather than the segment level. Recall that the relevance of a word to an aspect memory is proportional to the cosine similarity between their embeddings. We assign each word its most related aspect memory (by operation), and calculate the relevance of the entire segment as the averaged relevance over all words (by operation). That is,
We use the seed-words extracted from Sec. 4.2 as the aspect-related memory, and and are the weight and word embedding of the -th seed-word. Here the and can be regarded as the unnormalized conditional and prior probabilities in Eq. 4.
is an activation function to filter the general words whose cosine similarity with any aspects is less than. is the step function. Compared with the relevance measure adopted by angelidis2018summarizingangelidis2018summarizing, which uses the probability difference between the most probable aspect and the general one, our score takes a soft assignment between words and aspects, and thus allows the segment to relate to more than one aspect. Also, by regarding each seed-word as a fine-grained aspect, it does not require the seed-words to be clustered into aspects.
reflects customers’ preferences regarding products and their aspects, which is helpful in decision making. Since sentiment analysis is not the major contribution of this work, we directly apply the CoreNLP
and a sentiment lexicon222https://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html#lexicon to get the sentiment distribution of the reviews. The sentiment distribution is then mapped onto range as the sentiment score . Sentences with stronger sentiment polarities will have higher values.
Finally, we evaluate the salience of one opinion segment by multiplying the two scores:
5.2 Opinion selection
An ideal summary would contain as many high-salience opinions as possible. However, care should be taken to avoid redundant information. Also, there has to be a limit on the length of the summary (i.e. no longer than words). These goals can be formalized as an ILP problem. We introduce an indicator variable to indicate whether to include the -th segment in the final summary, and then find the optimal of the following objective:
where is the similarity between and .
is an auxiliary binary variable that will beiff both and equal to , and this is guaranteed by Eq. 12 - 13. Eq. 14 is used to restrict the length of the summary, where is the length of . We solve the ILP with Gurobi 333http://www.gurobi.com/.
We utilize OpoSum, a review summarization dataset provided by angelidis2018summarizingangelidis2018summarizing to test the efficiency of the proposed method. This dataset contains about 350K reviews from the amazon review dataset  under six product categories: Laptop bags, Bluetooth headsets, Boots, Keyboards, Televisions, and Vacuums. Each review sentence is split into segments using a rhetorical structure theory (RST) parser  to reduce the granularity of opinions. The annotated corpus includes ten products from each category, and ten reviews from each product. They annotate each review segment with an aspect label and produce summaries for each product. We describe the details below:
Aspect information. Each product category has nine pre-defined aspect labels. Each segment is labeled with one or more aspects, including a General aspect if it does not discuss any specific one. The annotated dataset is split into two equal parts for validation and test. Based on the validation data, they extract 30 seed-words for each aspect and produce the corresponding aspect embedding as a weighted average of seed-words embeddings.
Final summary. For each product, the annotators create a summary by selecting a subset of salient opinions from the review segments and limiting its length to words. Each product has three referenced summaries created by different annotators, which are used only for evaluation.
Their dataset does not contain any external information. We therefore randomly collect the feature descriptions from about 100 products for each category. Table 1 gives a statistics about this data. 444Available on https://github.com/zhaochaocs/AspMem
6.2 Experiments on aspect identification
We first investigate the model’s ability to identify aspects, which aims to label each review segment with one of the nine aspects (eight specific aspects and one General aspect) as labeled in the dataset. The method is described in Sec. 4. However, instead of using the seed-words obtained from external information (Sec. 4.2), we still use those provided with the dataset to enable fair comparison with prior works. Our external seed-words will be used in the summarization experiments (Sec. 6.3).
For the eight specific aspects, we assign their corresponding memory cells with the average embedding of the 30 seed-words provided by OpoSum. For the general aspect, although OpoSum also provides 30 corresponding seed-words, we handle it differently for the following reasons. First, while the knowledge of specific aspects can be encoded as a few seed-words, it is hard to represent the General aspect in the same way. A better method is to allow the model to find its intrinsic patterns by relaxing the corresponding General embedding as trainable parameters. Also, since the number of the General reviews is approximately ten times more than the specific aspect on average, it is reasonable to assign more memory cells for the General aspects. Therefore, besides the fixed General embedding provided by MATE, we have another enhanced model with five extra memory cells to encode the General aspect. These extra memory cells are initialized randomly and trained to minimize the log-likelihood in Eq. 6.
We use -dimensional word embeddings which are pre-trained on the training set via word2vec . These embeddings are fixed during training. For simplicity, the prior distribution of aspects is set as uniform. We train the model with batch size of 300, and optimize the objective using Adam  with a fixed learning rate of and an early stopping on the development set. The is set as . Notice that the model without the extra aspect memories does not have any trainable parameters and therefore can directly be applied for prediction using Eq. 7.
We compare the proposed method with ABAE and MATE, two state-of-the-art neural methods mentioned in Sec. 2, as well as a distillation approach  that uses the pre-trained BERT  as the student model. To ensure a fair comparison, all models utilize the same seed-words. The performance is evaluated through multi-label score.
|w/ extra memory||60.0||62.0||55.8||61.8||60.0||61.8||60.2|
Table 2 shows the average scores for the four models on the six categories. MATE performs better than ABAE by introducing the human-provided seed-words, which demonstrates the effectiveness of domain knowledge. However, MATE applies the same neural architecture as ABAE, which may not be the best fit to fully leverage the power of the introduced knowledge. Our generative model instead directly cooperates with the aspect memory, not only during the prediction stage but also during the segment encoding. Without any trainable parameters, our method outperforms ABAE and MATE on all the categories and achieves a 5.1% increase on average. It indicates that AspMem can get a better aspect-aware segment representation for aspect identification. The extra latent aspect embeddings of the General aspect (AspMem w/ extra memory) help the model better fit the intrinsic structure of the data, which further improves the performance by 6.0%. When comparing with BERT, our model still has better performance on three categories and achieves the same average score. Note that while BERT is a pre-trained model with 110M parameters, our model only has 1K parameters.
To further demonstrate the contribution of the extra memories, Figure 2 provides the confusion matrices of the results with and without them. The comparison shows that extra memories improve the true-positive rate of the General aspect from 0.44 to 0.60, while only slightly hurting those of other aspects. Table 3 shows the automatically learned General aspects by listing their nearest words in the embedding space. Compared with the single General aspect provided by MATE, our model successfully identifies the more varied General aspects from the reviews, such as the Noun, Verb, Adjective, Number, and Problem.
|noun||tv television set hdtv item tvs product|
|adj||good great better awesome superb|
|verb||figure afford get see find hear watch|
|number||dd dddd d ddd|
|problem||issue problem occur encounter flaw|
|MATE||buy purchase money sale deal week|
6.3 Experiments on Summarization
In this experiment, we investigate the utility of AspMem for summarization, using the seed-words from external sources and the selection procedure described in Sec. 5. We refer to our method as AspMemSum.
With the method described in Sec. 4.2, we select top seed-words according to their TF-IDF values, and use their word embeddings as the aspect memories. The similarity threshold is set as . The length of the summary is limited to words or less to enable comparison with the ground-truth summaries. Similar to previous works, we add a redundancy filter to remove the repeated opinions by setting when otherwise as . Other settings are the same as those in the last experiment. We employ ROUGE  to evaluate the results. It measures the overlapping percentage of unigrams (ROUGE-1) and bigrams (ROUGE-2) between the generated and the referenced summaries. We compare our method with the reported results in angelidis2018summarizingangelidis2018summarizing.
|MATE + MILNET||44.1||21.8|
|1-3 Inter-annotator Agreement||54.7||36.6|
|MATE||Picture is crisp and clear with lots of options to change for personal preferences. Plenty of ports and settings to satisfy most everyone. The sound is good and strong. But the numbers of options available in the on-line area of the Tv are numerous and extremely useful! I am very disappointed with this TV for two reasons : picture brightness and channel menu. The software and apps built into this TV are difficult to use and setup Unit developed a high pitch whine|
|AspMem||Unit developed a high pitch whine. The picture is beautiful. This TV looks very good. The sound is clear as well. there is a dedicated button on the remote. I am very disappointed with this TV for two reasons : picture brightness and channel menu. which is TOO SLOW to stream HD video… and it will not work with an HDMI connection because of a conflict with Comcast’s DHCP.|
|Human||Picture is crisp and clear with lots of options to change for personal preferences. Plenty of ports and settings to satisfy most everyone. The sound is good and strong. But the numbers of options available in the on-line area of the Tv are numerous and extremely useful! I am very disappointed with this TV for two reasons : picture brightness and channel menu. The software and apps built into this TV are difficult to use and setup Unit developed a high pitch whine|
Table 4 reports the ROUGE-1 and ROUGE-2 scores of each system 555MILNET is a sentiment analyzer but its pre-trained model is not public. We therefore replaced it with CoreNLP and obtained the results of MATE as and . There is no significant difference. and the inter-annotator agreement among three annotators. Our method (AspMemSum) significantly outperforms the baselines on both ROUGE scores (approximate randomization [27, 4], ). When removing the redundancy filtering (w/o filtering), it achieves the highest performance. This observation is different from that made by angelidis2018summarizingangelidis2018summarizing who found that redundancy filtering improved the ROUGE scores of results produced by MATE. Upon eyeballing the generated summaries we found that in absence of redundancy filtering, AspMem
’s summaries often included the overlapping part of the three references (i.e., the segments with similar opinions but from different references) more than once. This results in the improvement of ROUGE scores: the more matched n-grams are found, the better the results. However, we prefer to avoid redundancy in order to improve readability.
Effectiveness of opinion selection
During the opinion selection, we conduct an ablation study to investigate the contribution of the two salience scores: for the relevance and for the sentiment. As shown in Table 4, removing the relevance score drops R1 and R2 by 5.1 and 5.2, respectively. Similarly, without sentiment, R1 and R2 drop by 6.1 and 7.5. It demonstrates that both these scores are necessary to capture the salience of an opinion segment.
Finally, we back off our opinion selection procedure to the greedy method to have a fairer comparison with the baseline. As shown in Table 4 (w/o ILP), under the same greedy strategy, our method still outperforms the baselines, but using ILP can further improve the results.
Effectiveness of seed-words
During the summarization, we extract the seed-words from external information, whereas those used in MATE (denote by ) are extracted from customer reviews with the help of aspect labels. Figure 3 provide the distribution of two seed-sets in word embedding space. We analyzed the difference between the two seed-sets, and find that about of words in one seed-set do not appear in the other seed-set. Even the remaining shared seed-words have different weights. Another observation is that the seed-words from feature descriptions tend to be nouns, while those from review texts contain more adjectives. It can also be reflected in Figure 3, where the words from two seed-sets are separated into two parts. It reflects the fact that the content in feature descriptions is more objective than that in customer reviews, making it a better source to analyze the aspect relevancy than the reviews themselves.
We then replace our seed-words with those used in MATE to delineate the contributions of the model from that of the seed-set. When using the same seed-words, our model achieves 45.6 and 24.5 for ROUGE-1 and ROUGE-2, which are still better than the results of MATE. This indicates that the model itself also contributes to the performance gain.
Finally, we analyze the effect of two seeds-related hyperparameters on ROUGE metrics: the size of the seed-set, and the similarity thresholdof seed-words (see in Eq. 8). We vary the size of the seed-set from 10 to 200, and from 0.1 to 0.5. The results are shown in Figure 4. When there are only a few seed-words, the model performance rapidly increases with the growth of the seed-set size. For larger seed-sets (more than words), the number of noisy words increases and this slightly hurts the performance. Meanwhile, we find that our model is also robust to the choice of , especially for small values (less than ).
Table 5 shows summaries of the same product generated by MATE, our method (AspMemSum), and one of the human annotators. Similar to humans, MATE and AspMemSum are also able to select aspect-related opinions. The difference is that AspMemSum learns these aspects without any human effort.
In this work, we propose a generative approach to create summaries from online product reviews without specific human annotation. At the model level, we introduce the aspect-aware memory to fully leverage the domain knowledge. It also reduces the parameters and computation cost of the model. At the data level, we collect the domain knowledge from external information rather than through human effort, which makes the proposed method easier to adapt to other product categories. By comparing with the state-of-the-art models on both aspect identification and opinion summarization tasks, we experimentally demonstrate the effectiveness of our approach. Future works can design better measures for opinion selection, and incorporate abstractive methods to enhance readability of the generated summaries.
-  (2018) Summarizing opinions: aspect extraction meets sentiment prediction and they are both weakly supervised. In Proceedings of the 2018 Conference on EMNLP, pp. 3675–3686. Cited by: §4.2, Table 2.
-  (2019) Unsupervised multi-document opinion summarization as copycat-review generation. arXiv preprint arXiv:1911.02247. Cited by: §2.2.
Ranking with recursive neural networks and its application to multi-document summarization. In 29th AAAI conference, Cited by: §2.2.
-  (1992) The statistical significance of the muc-4 results. In Proceedings of the 4th MUC, pp. 30–50. Cited by: §6.3.
-  (2019) MeanSum: a neural model for unsupervised multi-document abstractive summarization. In ICML, pp. 1223–1232. Cited by: §2.2.
-  (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of NAACL-HLT, pp. 4171–4186. Cited by: §6.2.
-  (2014) A hybrid approach to multi-document summarization of opinions in reviews. In Proceedings of the 8th INLG Conference, pp. 54–63. Cited by: §1.
-  (2015) Towards opinion summarization from online forums. In Proceedings of RANLP, pp. 138–146. Cited by: §1.
-  (2017) Lexicons on demand: neural word embeddings for large-scale text analysis.. In IJCAI, pp. 4836–4840. Cited by: §4.2.
-  (2012) Text-level discourse parsing with rich linguistic features. In Proceedings of the 50th ACL, pp. 60–68. Cited by: §6.1.
-  (2010) Opinosis: a graph based approach to abstractive summarization of highly redundant opinions. In Proceedings of Coling 2010, pp. 340–348. Cited by: §2.2.
An unsupervised neural attention model for aspect extraction. In Proceedings of the 55th ACL, pp. 388–397. Cited by: §1, Table 2.
-  (2016) Ups and downs: modeling the visual evolution of fashion trends with one-class collaborative filtering. In proceedings of the 25th international conference on WWW, pp. 507–517. Cited by: §6.1.
-  (2004) Mining and summarizing customer reviews. In Proceedings of the tenth ACM SIGKDD international conference on KDD, pp. 168–177. Cited by: §2.1.
-  (2019) Training neural networks for aspect extraction using descriptive keywords only. In The 2nd Learning from Limited Labeled Data (LLD) Workshop, Cited by: §6.2, Table 2.
-  (2011) Comprehensive review of opinion summarization. Technical report UIUC. Cited by: §1, §2.2.
-  (2014) Adam: a method for stochastic optimization. arXiv:1412.6980. Cited by: §6.2.
-  (2002) From single to multi-document summarization. In Proceedings of the 40th ACL, Cited by: §1.
-  (2004) Rouge: a package for automatic evaluation of summaries. Text Summarization Branches Out. Cited by: §6.3.
-  (2015) Sentiment analysis: mining opinions, sentiments, and emotions. Cambridge University Press. Cited by: §1.
Fine-grained opinion mining with recurrent neural networks and word embeddings. In Proceedings of the 2015 Conference on EMNLP, pp. 1433–1443. Cited by: §2.1.
-  (2008) Visualizing data using t-sne. JMLR 9 (Nov), pp. 2579–2605. Cited by: Figure 3.
-  (2007) A study of global inference algorithms in multi-document summarization. In European Conference on Information Retrieval, pp. 557–564. Cited by: §2.2.
-  (2007) Topic sentiment mixture: modeling facets and opinions in weblogs. In Proceedings of the 16th international conference on WWW, pp. 171–180. Cited by: §2.1.
-  (2013) Distributed representations of words and phrases and their compositionality. In NIPS, pp. 3111–3119. Cited by: §6.2.
-  (2010) Opinion summarization with integer linear programming formulation for sentence extraction and ordering. In Proceedings of the 23rd ICCL: Posters, pp. 910–918. Cited by: §2.2.
-  (1989) Computer-intensive methods for testing hypotheses. Wiley New York. Cited by: §6.3.
-  (2009) An unsupervised approach to product attribute extraction. In European Conference on Information Retrieval, pp. 796–800. Cited by: §2.1.
-  (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 conference on EMNLP, pp. 1631–1642. Cited by: §5.1.
-  (2007) Manifold-ranking based topic-focused multi-document summarization.. In IJCAI, Vol. 7, pp. 2903–2908. Cited by: §2.2.
-  (2016) Mining aspect-specific opinion using a holistic lifelong topic model. In Proceedings of the 25th international conference on WWW, pp. 167–176. Cited by: §2.1.
-  (2014) Memory networks. arXiv:1410.3916. Cited by: §1.
-  (2013) A biterm topic model for short texts. In Proceedings of the 22nd international conference on WWW, pp. 1445–1456. Cited by: §4.1.