E-commerce websites like Amazon.com incorporate Product Community Question Answering (PCQA) into their websites to provide additional information about their products. Questions are usually posted by customers before their purchases and answers are provided by existing product owners or sellers. Compatibility issues are one popular topic in PCQA. As shown in Figure 1, one customer may write a question like “Will it work with Surface Pro 3?”; existing customer may reply with “Yes.”. From those 4 QA pairs discussing a Microsoft mouse, we know that the Microsoft mouse is compatible with “Microsoft Surface Pro 3” and “Windows 10” but incompatible with “iPad”. Furthermore, we have no idea whether “Samsung Galaxy Tab 2 10.0” is compatible or not with this mouse. Similar to our previous work in product reviews [Xu et al.2016], we call the mouse target entity and those 4 products complementary entities of the target entity. Each complementary entity forms a complementary relation with the target entity. Each yes/no answer further assigns a compatibility label to each complementary entity.
Knowing which entity is compatible and which one is not is important because customers need to buy compatible ones and avoid incompatible ones. It is also important for manufacturers to realize the compatibility issues of their product. Further, recommender systems need to be aware of such issues and stay out of trouble of recommending incompatible products for their valued customers.
Problem Statement: We deal with the problem of identifying compatible and incompatible products from QA pairs in PCQA. More specifically, given a yes/no QA pair, we want to recognize complementary entities from questions and assign compatibility labels (compatible, incompatible or unknown) to them according to the polarity (yes, no or neutral) of the answers.
We observe that compatibility issues are mostly discussed via yes/no questions rather than open questions. This is because customers tend to ask specific questions in PCQA. We leave the work of mining compatible/incompatible products on open questions to future work. Given the structure of a QA pair, our method naturally has a two-stage framework: Complementary Entity Recognition (CER) [Xu et al.2016] and yes/no answer classification. For the first stage, we employ a similar approach as in [Xu et al.2016]; for the second stage, it is reduced to a yes/no answer classification problem [McAuley and Yang2016]. We observe that the second stage provides further research opportunity since the polarities of many yes/no answers are implicit. For example, “Will it work with Surface Pro 3? It works.” has no explicit “Yes” but it is still a yes answer. Therefore, exploiting implicit yes/no answers can further help to identify even more compatible/incompatible entities.
To the best of our knowledge, there are no largely annotated implicit yes/no answers for PCQA. To save time-intensive annotation efforts, we leverage a distant PU-learning (learning from positive and unlabeled examples) method [Liu et al.2003, Elkan and Noto2008] without using any human annotated answer. This is possible due to a simple observation: the beginning “Yes” or “No” word in explicit yes/no answers can serve as distant labels and can be used to expand implicit yes/no answers. For example, “Yes, it works” and “It works” have the same polarity. But the first answer is explicit and the second one is implicit. So the beginning word “Yes” can label “Yes, it works” as a yes answer and further the implicit answer “It works” may also be labeled as a yes answer due to its similarity with the former explicit yes answer.
Besides yes and no answers, we assume that there are also neutral answers. For example, the last answer in Figure 1 is a neutral answer and we have no obvious distant label for that type of answers. The framework of PU-learning (learning from positive and unlabeled examples) comes to rescue since it only requires positive examples and we already have many unlabeled answers. The idea of obtaining positive examples is simple: we leverage explicit answers (both yes and no answers) as positive examples and those explicit answers can expand to implicit answers via the PU-learning framework. Since all the explicit answers are distantly labeled, we have no human annotation effort at all. To further separate yes and no answers, we utilize a binary classifier trained from explicit yes/no answers to classify all positive examples labeled by PU-learning.
The major contribution of this paper can be summarized as follows: we propose the problem of mining compatible/incompatible products from PCQA; we propose a two-stage framework to solve this problem without using any human annotated data. The rest of this paper is organized as follows: we describe related works in Section 2; In Section 3.2 and 4 we describe the proposed two-stage framework; we conduct experiments in Section 5 and then draw our conclusion.
2 Related Works
The problem of Complementary Entity Recognition (CER) is first proposed by Xu et. al. [Xu et al.2016]. However, our previous work focuses on product reviews and consider CER as a special kind of aspect extraction problem [Liu2015]. Determining the polarities of compatibility is reduced to a traditional sentiment classification problem. This paper focuses on yes/no QAs in PCQA and the polarities of compatibility is a yes/no answer classification problem.
CER is closely related to entity recognition (e.g., Named Entity Recognition (NER)[Nadeau and Sekine2007, Zhou and Su2002] problem). The major differences are that many complementary entities are not named entities and CER heavily relies on the context of an entity (e.g., “iPhone” in “I like my iPhone” is not a complementary entity). Complementary entities are also studied as a social network problem in recommender systems [Zheng et al.2009, McAuley et al.2015]. We discussed the benefit of CER over social network problem in [Xu et al.2016] so we omit here but keep a performance comparison in Section 5.
Community Question and Answering (CQA) has been well studied in literature [Liu et al.2008, Nam et al.2009, Li and King2010, Anderson et al.2012]. More specifically, product Community Question and Answering (PCQA) is studied in [McAuley and Yang2016, Liu et al.2016]. They both try to find relevance between reviews and questions. [McAuley and Yang2016] takes questions from PCQA as queries and retrieve relevant reviews that can answer those queries. [Liu et al.2016] considers questions in PCQA as summaries of reviews to help customers to identify relevant reviews.
Extracting compatible/incompatible products from PCQA is very important. Based on our experience of annotating PCQA, we notice that PCQA usually addresses compatibility issues that are not well addressed by product description. This is because the number of complementary products for a target product can be unlimited so it is impractical to cover all of them. We also bring out the test dataset used in [Xu et al.2016] for a comparison (Section 5). We notice that PCQA addresses compatibility issues in a different perspective compared to product reviews. PCQA tends to be specific on compatibility issues; reviews are free to talk about their experiences (e.g, opinions on features/aspects). For example, customers tend to ask more specific questions like “Will it work with Surface Pro 3” rather than “Will it work with my tablet?” since the latter question is pointless; reviews are typical datasets for opinion mining and aspects extraction [Liu2015]. Also, it is common to see general complementary products like “It works with my tablet.” in reviews since reviewers do not need to specify which tablet they have.
Determining the polarity of a yes/no answer is closely related to answer summarization subtask B in SemEval-2015 Task 3 [Màrquez et al.2015]. The proposed problem differs from this subtask B in that our problem indirectly utilizes the polarity of an answer to classify complementary entity rather than directly summarizes the usefulness of an answer to a question. McAuley et. al [McAuley and Yang2016] classifies the polarity of a PCQA answer by simply training an SVM on unigrams of labeled answers. From their predictions, we observe that they may only label explicit yes/no answers (e.g., answers begin with a “Yes” or “No”) and put many implicit answers (e.g., “I think it works.” implies a yes answer) as uncertain. Identifying more implicit yes or no answer is crucial to the proposed problems since a complementary entity does not provide much information without its compatibility label (compatible, incompatible or uncertain).
3 Two-stage Framework and CER
In this section, we first introduce the two-stage framework of the proposed method. Then we briefly introduce the method for CER in [Xu et al.2016].
3.1 Two-stage Framework
Since complementary entities are mentioned in yes/no questions and their polarities of compatibility information are in answers, the proposed method naturally has a two-stage framework:
Complementary Entity Recognition: we extract complementary entities from questions using dependency paths almost the same as in [Xu et al.2016]. It utilizes a large amount of unlabeled reviews under the same category as the target entity to expand knowledge about domain-specific verbs.
Identifying Polarities of Yes/No Answers: then we determine the polarity (yes, no or neutral) of yes/no answers for each question with complementary entity and assign a compatibility label (compatible, incompatible or unknown) to it. We form this 3-class classification via PU-learning and a binary SVM classifier in Section 4.
3.2 Complementary Entity Recognition
We briefly introduce the method used in [Xu et al.2016] and how the dependency paths can be used in questions of PCQA (details of dependency paths can be found in the original paper). The basic idea is to use dependency paths to identify the context of complementary relations around complementary entities. Dependency paths can match dependency relations parsed through dependency parsing111We use Stanford CoreNLP (http://stanfordnlp.github.io/CoreNLP/) as our dependency parser, which parses a sentence into a set of dependency relations. In our previous work, we notice that the verbs used to indicate a complementary relation can be unlimited and product specific. So we utilize another novel set of dependency paths that are in high precision but low recall to expand knowledge about complementary entities on a large amount of unlabeled review. We use similar ideas in this paper since verbs in questions of PCQA are also unlimited and product specific. But we do not incorporate candidate complementary entities into dependency paths when performing extractions because complementary entities are rather specific and diverse in PCQA and general entities are rarely mentioned.
We still keep candidate complementary entities when expanding knowledge about domain-specific verbs. The knowledge expansion process is the same as our previous work. We start with seed verbs “work” and “fit”. Then we first expand candidate complementary entities on the large unlabeled reviews. Then we use those candidate complementary entities to expand domain-specific verbs, e.g., “insert” for micro SD card and “hold” for tablet stand. The idea of using reviews rather than questions in PCQA to expand domain knowledge is that reviews contain a lot of the same general complementary entities (e.g. “tablet”) that can easily appear in different reviews. However, “Samsung Galaxy S6” may be in low frequency in PCQA.
4 Identifying the Polarities of Yes/No Answers
After CER, we need to identify whether a product is compatible or not with the target product. We assume a yes/no answer can clearly identify the polarities of the compatibility of a complementary entity for the target entity. We only classify the polarities of answers for successful extraction of complementary entities.
We assume that largely annotated yes and no answers are not available for training. We observe that the explicit mentions of “Yes” or “No” at the beginning of each answer are indicators of yes or no answers respectively. So they can be used for prediction directly. However, not every answer in PCQA begins with an explicit “Yes” or “No” word, but the polarity of the answer can still be implicitly expressed. For example, “Yes, it works.” and “It works.” have the same yes polarity, but the latter answer does not have an explicit word “Yes”. From the test data in Section 5, we observe that using explicit mentions of “Yes” or “No” contribute about 60% of accuracy of yes or no answer classification. Without identifying those implicitly mentioned polarities, the polarities of compatibility for many complementary entities are uncertain.
4.2 Distant PU-Learning Classifier and Binary Classifier
We distribute the classification task into 2 classifiers. First, we use PU learning to train a classifier that can separate yes or no answers from neutral answers. Second, we train a yes or no binary classifier by using the explicit yes/no examples.
From the previous examples of “Yes, it works.” and “It works.”, we observe that the beginning word “Yes” or “No” is optional for a yes or no answer respectively. So “Yes” can be served as a distant label for the training example “It works”. We select all answers beginning with “Yes” or “No” as training examples and take the first words as distant labels and transform the remaining words of the answer to features. However, we notice that there is no obvious distant label for neutral answers (e.g., “I am not sure.”). Therefore, it is impossible to train a 3-class classifier directly.
answers. PU-learning is a machine learning method using only positive and unlabeled examples (no negative examples are labeled). To get positive examples, we first combine all examples distantly labeled by “Yes” or “No” (the first word in an answer) together. Unlabeled examples can be easily collected from PCQA answers as long as the first word is not “Yes” or “No”. Please note that unlabeled examples contain both implicit yes/no answers andneutral answers. We utilize the implementation of PU learning described in [Liu et al.2003].
Lastly, we train a yes/no binary classifier using the same positive examples (explicit “Yes” or “No” answers) used in PU learning. But this time we separate the distant labels “Yes” and “No”. By combining a PU-learning classifier and a binary classifier, we actually build a 3-class classifier for implicit yes/no answer classification.
During testing, we ensemble the results from the first “Yes” or “No” word prediction, the PU-learning classifier and the binary classifier together. We first detect whether the answer is an explicit yes/no answer by checking the first word. If the first word is a “Yes” (or “No”), we output label yes (or no); otherwise we use PU-learning classifier to predict whether the answer is an implicit yes/no answer or neutral; if it outputs negative, we consider the answer as neutral; otherwise we consider it as an implicit yes/no answer and use the binary classifier to predict yes or no. We demonstrate this method using SVM as the base classifier for both the PU-learning classifier and the binary classifier in Section 5. In reality, other base classifiers can also be adopted.
5 Experimental Results
In this section, we first describe the dataset used for testing; then we introduce the evaluation methods and the baselines; lastly, we analyze the results.
We crawl questions with at least one answers from product Community Question and Answering of Amazon.com and choose 4 products for test purpose. The 4 products are “stylus”, “micro SD card”, “mouse” and “tablet stand”. We label complementary entities mentioned in each question and the answers as yes, no or neutral. The whole test dataset is labeled by 3 annotators independently. The initial agreement is 93%. Then disagreements are discussed and final agreements are reached among all annotators. To obtain knowledge about domain-specific verbs, we use 6000 reviews for each product similar as in [Xu et al.2016]. We also select about 220 reviews for each product and label them in a similar way to show the difference between product QA community and reviews. The agreement for reviews is 82%. The statistics of the datasets222The dataset will be available on the first author’s website: https://www.cs.uic.edu/~hxu/ can be found in Table 1.
|Micro SD Card||277||352||223||0.63||200||162||16||45|
|Micro SD Card||216||802||193||0.24||134||173||15||5|
We observe that PCQA has higher densities (complementary products per sentence) of mentions of complementary entities. Further, PCQA has unique complementary entities since repeatedly asking the same question does not make sense. So identifying complementary entities from PCQA is much effective than that from customer reviews. Based on our experience of annotation, complementary entities mentioned in PCQA and in customer reviews are different. In PCQA, potential buyers frequently mention specific complementary entities as named entities (e.g., “Microsoft Surface Pro 3”) to make their questions more accurate; in customer reviews, complementary entities can be general complementary products like “tablet”, “phone”, which is much less meaningful than specific products.
We also read the product descriptions of these 4 products and count the number of compatible products, including general products like “Android tablets”. There are 13, 9, 5 and 55 compatible products for the stylus, micro SD card, mouse, tablet stand respectively. No incompatible products are mentioned in descriptions. So we can conclude that PCQA provides more information about compatibility issues.
5.2 Compared Methods and Evaluation Methods
We first perform separate evaluations on CER and yes/no answer classification. Then we combine those two stages together to evaluate the overall accuracy. For CER, we count true positive, false positive and false negative to compute precision , recall and F1-score . We consider each question as an instance and the dependency paths are applied to each sentence in that question and then the extractions combined to form one prediction. A prediction contributes to one count of true positive when the extracted complementary products match the labeled complementary entities in one question; more or less predicted complementary entities in one question are treated as one count of false positive; failed extraction from one question is treated as one count of false negative.
Noun Phrase Chunker: Most of the product names mentioned in questions of PCQA are noun phrases, we use the same noun phrase chunking pattern as the proposed method to extract noun phrases directly from questions and take them as complementary products.
UIUC NER: We use UIUC Named Entity Tagger [Ratinov and Roth2009] to perform Named Entity Recognition (NER) on questions in PCQA. UIUC NER has 18 labels in total and we consider words or phrases labeled as “PRODUCT” or “ORG” as predictions of complementary products.
Sceptre: We also retrieve the top 25 complements for the same 4 products from [McAuley et al.2015]’s Sceptre and adapt their results for a comparison. Direct comparison is impossible since they deal with a link prediction problem and consider “Items also bought” as complementary products for training/testing. We label and compute the precision for the top 25 predictions and assume annotators have the same background knowledge for both their datasets and ours. We observe that the predicted complementary products are irrelevant products like “network cables”, “mother board”, etc. and all 4 products have similar complementary products. We mostly consider “Windows” as complementary products for “Mouse”.
CER6K: This method is the method proposed in [Xu et al.2016]. Specifically, it uses 6000 reviews to expand domain-specific verbs.
|Product||NP Chunker||UIUC NER||Sceptre||CER6K|
|Micro SD Card||0.734||0.632||0.68||0.843||0.336||0.481||0.16||0.973||0.798||0.877|
|Product||Yes/No||Sentiment Parser||One-Class SVM||3-Class SVM||PU SVM||Overall Results|
|Micro SD Card||0.673||0.646||0.7||0.682||0.776||0.755|
Next, we perform a separate evaluation on yes/no answer classification. We assume the accuracies of complementary entities extraction are 100% and errors do not affect answer classification. We only classify answers to questions that have labeled complementary entities.
Yes/No: This simple baseline predicts the polarities of yes/no answers based on the first “Yes” or “No” word in an answer; if the first word is not “Yes” or “No”, it predicts the answer as neutral.
Sentiment Parser: We utilize the RNN-based sentiment parser [Socher et al.2013] to get the sentiment polarities of the first sentences in answers. We observe that opinions expressed in answers can indicate the polarities of answers. For example, “It works well.” indicates a positive answer. We use the results of sentiment parsing to get more implicit yes/no answers and combine the explicit answers outputted by Yes/No baseline.
One-Class SVM(Bigram): Similar to PU learning, one-class SVM is also a classifier without using negative training examples. But one-class SVM does not use unlabeled data during the training process. This means the neutral answers are only available in testing. We feed one-class SVM with 20000 explicit yes/no answers as training examples. We utilize Scikit Learn333http://scikit-learn.org/ as the implementation of one-class SVM. Then similar to the proposed method, we train a yes/no binary SVM classifier and pipeline Yes/No, One-Class SVM and binary yes/no classifier together.
3-Class SVM(Bigram): We train a 3-class SVM classifier using the answer predictions from [McAuley and Yang2016]. Their method originally uses 1000 labeled data as the training data for answer prediction. Since the labeled training data is not available, we use their predictions as the training data. We select 4000 examples for each class as training examples and ensemble the results with Yes/No baseline.
PU SVM(Bigram): This is described in Section 4. We use 3000 yes answers and 3000 no answers as positive examples and 6000 unlabeled answers. We use bigrams as features for prediction and PU learning method in [Liu et al.2003] as the implementation of PU learner.
Finally, we combine the results of CER6K and PU SVM to get the Overall Results.
5.3 Result Analysis
CER: From Table 2, we can see that CER6K performs the best. NP chunker performs better than UIUC NER because NER heavily relies on capital letters as features but PCQA tends to have typos in lower case (e.g., “samsung” instead of “Samsung”). The precision of Sceptre is low because the “Items also bought” training data tend to be noisy for accessories. We further observe that the recall of “tablet stand” is relatively low. We examine the data and find that many errors are due to parsing errors (e.g., the POS tagger treats “stand” as a verb, which makes dependency parsing incorrect.).
Yes/No Answer Classification: In Table 3, we compare the results of yes/no answer classification. The first 5 methods are performed only on the answers with questions that have human-annotated complementary entities. The last column is the combined results of CER6K and PU SVM(Bigram). All numbers are accuracies of classification for yes, no and neutral. Given many explicit yes/no answers, the Yes/No baseline performs relatively good. Sentiment parser performs worse than the Yes/No baseline. We examine the results and find that sentiment parser tends to produce more errors on negative opinions. 3-class SVM(Bigram) does not have much improvement over Yes/No baseline. This is because [McAuley and Yang2016]’s predictions are mostly explicit yes or no answers. We guess they mostly label implicit yes or no answers as neutral. PU learning performs better than One-class SVM because PU learning also leverages unlabeled data, even though the size of training data is smaller. In the overall results, we achieve accuracy around 70%.
In this paper, we propose the problem of mining compatible and incompatible products from product Community Question and Answering (PCQA). We propose a two-stage framework to solve this problem. We first extract complementary entities from each question using a dependency rule-based method; then we determine the labels of compatibility for complementary entities from the polarities of yes/no answers. We leverage a distant PU learning method to identify extra implicit polarities of yes/no answers without using any human-labeled training data. Experiments show that the proposed method can exploit more implicit answers.
This work is supported in part by NSF through grants IIS-1526499 and CNS-1626432. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan X GPU used for this research.
- [Anderson et al.2012] Ashton Anderson, Daniel Huttenlocher, Jon Kleinberg, and Jure Leskovec. 2012. Discovering value from community activity on focused question answering sites: a case study of stack overflow. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 850–858. ACM.
- [Elkan and Noto2008] Charles Elkan and Keith Noto. 2008. Learning classifiers from only positive and unlabeled data. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 213–220. ACM.
- [Li and King2010] Baichuan Li and Irwin King. 2010. Routing questions to appropriate answerers in community question answering services. In Proceedings of the 19th ACM international conference on Information and knowledge management, pages 1585–1588. ACM.
- [Liu et al.2003] Bing Liu, Yang Dai, Xiaoli Li, Wee Sun Lee, and Philip S Yu. 2003. Building text classifiers using positive and unlabeled examples. In Data Mining, 2003. ICDM 2003. Third IEEE International Conference on, pages 179–186. IEEE.
- [Liu et al.2008] Yandong Liu, Jiang Bian, and Eugene Agichtein. 2008. Predicting information seeker satisfaction in community question answering. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pages 483–490. ACM.
- [Liu et al.2016] Mengwen Liu, Yi Fang, Dae Hoon Park, Xiaohua Hu, and Zhengtao Yu. 2016. Retrieving non-redundant questions to summarize a product review. pages 385–394.
- [Liu2015] Bing Liu. 2015. Sentiment Analysis: Mining Opinions, Sentiments, and Emotions. Cambridge University Press.
- [Màrquez et al.2015] Lluís Màrquez, James Glass, Walid Magdy, Alessandro Moschitti, Preslav Nakov, and Bilal Randeree. 2015. Semeval-2015 task 3: Answer selection in community question answering. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015).
- [McAuley and Yang2016] J. McAuley and A. Yang. 2016. Addressing complex and subjective product-related queries with customer reviews. In World Wide Web.
- [McAuley et al.2015] J. J. McAuley, R. Pandey, and J. Leskovec. 2015. Inferring networks of substitutable and complementary products. In KDD.
- [Nadeau and Sekine2007] David Nadeau and Satoshi Sekine. 2007. A survey of named entity recognition and classification. Lingvisticae Investigationes, 30(1):3–26.
- [Nam et al.2009] Kevin Kyung Nam, Mark S Ackerman, and Lada A Adamic. 2009. Questions in, knowledge in?: a study of naver’s question answering community. In Proceedings of the SIGCHI conference on human factors in computing systems, pages 779–788. ACM.
- [Ratinov and Roth2009] Lev Ratinov and Dan Roth. 2009. Design challenges and misconceptions in named entity recognition. In Proceedings of the Thirteenth Conference on Computational Natural Language Learning, pages 147–155. Association for Computational Linguistics.
[Socher et al.2013]
Richard Socher, Alex Perelygin, Jean Y Wu, Jason Chuang, Christopher D Manning,
Andrew Y Ng, and Christopher Potts.
Recursive deep models for semantic compositionality over a sentiment
Proceedings of the conference on empirical methods in natural language processing (EMNLP), volume 1631, page 1642. Citeseer.
- [Xu et al.2016] Hu Xu, Sihong Xie, Lei Shu, and Philip S. Yu. 2016. Cer: Complementary entity recognition via knowledge expansion on large unlabeled product reviews. In Proceedings of IEEE International Conference on Big Data.
- [Zheng et al.2009] Jiaqian Zheng, Xiaoyuan Wu, Junyu Niu, and Alvaro Bolivar. 2009. Substitutes or complements: another step forward in recommendations. In Proceedings of the 10th ACM conference on Electronic commerce, pages 139–146. ACM.
- [Zhou and Su2002] GuoDong Zhou and Jian Su. 2002. Named entity recognition using an hmm-based chunk tagger. In proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pages 473–480. Association for Computational Linguistics.