|Microsoft Surface Pro 4 (128 GB, 4 GB RAM, Intel Core i5)|
|Q:||Can the M processor handle photoshop?|
|A:||It does run Photoshop very well on our internal test unit.|
|Q:||Does the surface pro 4 support the Google Play app store?|
|A:||No, it does not support Google Play.|
|Q:||can this run fallout 4|
|Q:||Does this connect to a 5G home wireless network?|
|Q:||Can you use this for sketching?|
Learning about the compatibility of a product that is functionally complementary to a to-be-purchased product is an important task in e-commerce. For customers, before they purchase a product (e.g., a mouse), it is natural for them to ask whether the to-be-purchased one can work properly with the intended complementary product (e.g., a laptop). Such a query is driven by customers’ needs on product functionality, where compatibility can be viewed as a special group of functions. In fact, a function need is the very first step of purchase decision process. Whether a product can satisfy some function needs or not (a.k.a. satisfiability of a function need) even leads to the definition of product. In marketing, product is defined as “anything that can be offered to a market for attention, acquisition, use or consumption that might satisfy a want or need” [Kotler and Armstrong2010]. For sellers or manufacturers, satisfiability on function needs are equally important as being fully-aware of existing and missing functions is crucial in increasing sales and improve products. Therefore, exchanging the information about functions is important to customers, sellers, and manufacturers.
Given its importance, however, such function (we omit “need” for simplicity) information is not fully available in product descriptions. Just imagine the cost of compatibility test over the huge number of products, or sellers’ intention of hiding missing functions. Fortunately, customers may occasionally exchange such knowledge online with other customers or sellers. This allows us to adopt an NLP-based approach to automatically sense and harvest this knowledge on a large scale.
In this paper, we address two (2) closely related novel NLP tasks: product compatibility analysis and product function satisfiability analysis. They are defined as following.
Product Compatibility Analysis: Given a corpus of texts, identify all tuples , , , where and are a pair of complementary products (entities), and indicates whether the two entities are compatible (1), not (2) or uncertain (3).
Note that two complementary entities can be incompatible. For example, a mouse is a product functionally complementary to “Microsoft Surface Pro 4”, as Surface Pro 4 doesn’t come with a mouse. However, due to different interfaces, not all mouse models can work with Surface Pro 4 properly. By slightly extending to a function expression (e.g., identifying “work with Microsoft Surface Pro 4” instead of “Microsoft Surface Pro 4”), we have a more general task.
Product Function Satisfiability Analysis: Given the same corpus, identify all tuples , , , where is a function expression, and indicates whether can satisfy (1) the function or not (2), or uncertain (3).
Note that functions derived from complementary entites are just one type of function called extrinsic functions. Functions in may also include functions derived from the product itself as intrinsic functions. For example, “draw a picture” is an intrinsic function for “Microsoft Surface Pro 4”. A function expression may consist a function word (e.g., “work with” or “draw”) and a function target (e.g., “Microsoft Surface Pro 4” or “picture”). So function target is a general term of complementary entity.
Two challenges come immediately after formalizing these two tasks. First, the quality of the tuples depends on the data source (namely, the corpus). Corpora that are dense and accurate regarding compatibility and functionality information are preferred. Second, although general information extraction models are available, novel models that can jointly identify the entities (or function expressions) and compatibility (or satisfiability) in an end-to-end manner are preferred.
We address the first challenge by annotating a high-quality corpus. In particular, Amazon.com allows potential consumers to communicate with existing product owners or sellers via Product Community Question Answering (PCQA). As an example, in Table 1 we show 5 QA pairs addressing the functionality of Microsoft Surface Pro 4 (). We can see that the first 4 questions address the extrinsic functions (on complementary entities) and the last one is an intrinsic function. Specifically, “photoshop” is a compatible entity, “Google Play app store” and “fallout 4” are incompatible entities, “5G home wireless network” is a complementary entity with uncertain compatibility, and “use for sketching” is an uncertain function. Observe that the to-be-purchased product can be identified using the product title of the product page. We focus on extracting or from the question and detect or from the answers. We leave infrequently-asked open questions (e.g., “what products can this tablet work with?”) to future work and only focus on yes/no questions. The details of the annotated corpus can be found in Section Experimental Result.
Given the corpus, we address the second challenge by formulating these two (2) tasks as sequence labeling problems, which fuse the information from both the questions and the answers. We propose a model called Dual Attention Network (DAN) to solve the sequence labeling problems. DAN addresses two technical challenges. First, the questions and answers are usually brief 222the longest question has only 82 words with rather limited context. Second, the polarity information in many answers are implicit (without an explicit “Yes” or “No” in the very beginning, e.g., the 1st and 3rd answers in Table 1). DAN resolves these challenges by taking the question and the answer together as a QA story (or document) and perform two reading comprehensions [Richardson, Burges, and Renshaw2013, Rajpurkar et al.2016] on such a story as side information for sequence labeling. For example, it may not be obvious that “photoshop” is a complementary entity by reading the question only. However, the word “run” in the answer is a strong indicator that “photoshop” is a complementary entity and the word “well” indicates a positive polarity. We conduct experiments quantitatively and qualitatively to show the performance of DAN. The proposed dual attention architecture is not limited to the proposed tasks, but may potentially be applied to other QA tasks.
Complementary products are studied in the context of recommender system [McAuley, Pandey, and Leskovec2015]. In [McAuley, Pandey, and Leskovec2015], topic models are used to predict substitutes and complements (without compatibility) of a product. But their work takes the outputs of Amazon’s recommender system as the ground truths for complements, which can be inaccurate. Instead, we take an information extraction approach similar to the Complementary Entity Recognition (CER) task in [Xu et al.2016b, Xu, Shu, and Yu2017]. But we perform in a supervised setting on an annotated QA corpus [McAuley and Yang2016, Xu et al.2016a]. We further generalize to a more fundamental task: function satisfiability analysis. Although [Xu et al.2017] has a preliminary study on product functions, to the best of our knowledge, this is the first time to study a fully end-to-end model with satisfiability analysis.
Deep neural network[LeCun, Bengio, and Hinton2015, Goodfellow, Bengio, and Courville2016]
has drawn attention in the past few years due to its impressive performance on NLP tasks. Long Short-Term Memory (LSTM)[Hochreiter and Schmidhuber1997] is shown to achieve the state-of-the-art results on many NLP tasks [Greff et al.2015, Lample et al.2016, Tan, Xiang, and Zhou2015, Nassif, Mohtarami, and Glass2016, Wang and Nyberg2015]. Attention mechanism [Larochelle and Hinton2010, Denil et al.2012] is effective in NLP tasks, such as machine translation[Bahdanau, Cho, and Bengio2014], sentence summarization [Rush, Chopra, and Weston2015]Tang, Qin, and Liu2016], question and answering [Li et al.2016] and reading comprehension [Kumar et al.2015, Xiong, Zhong, and Socher2016]. There are also studies of neural sequence labeling [Lample et al.2016, Ma and Hovy2016]. However, traditional sequence labeling takes a single sequence as the to-be-labeled input. The proposed task naturally has two inputs: the question with the to-be-labeled tokens and the answers with the polarity. Inspired by the task of reading comprehension, we take one more step to fuse the question and the answer together as a story and perform reading comprehension on such a QA story. Instead of learning question-aware representations of the story as in reading comprehension, we learn question-aware (or answer-aware) representations of the QA story as the side information to enrich the representation of the question (or the answer, resp.).
Model Overview and Preliminary
Formally, we define the 2 inputs sequence labeling problem as following. Given a QA pair (we use bold symbols to indicate sequences), we label each word in the question as a label sequence , where and is the length of the question. Here is the label space and the two proposed tasks only differ in the label space . For product compatibility analysis, the label space is , indicating Other non-entity words, Compatible, Incompatible and Uncertain entity words. For product satisfiability analysis, the label space is , indicating Other non-function words, Satisfiable, UNsatisfiabile, Uncertain function target words, and Satisfiable, UNsatisfiabile, Uncertain Function words.
The proposed network is illustrated in Figure 1. The question and the answer are first concatenated to form a QA story . Then the QA pair and the story are passed into a shared embedding layer (not shown in the figure), followed by three respective BLSTM (Bidirectional Long Short-Term Memory [Hochreiter and Schmidhuber1997, Schuster and Paliwal1997]
) Context Layers to obtain contextual representations. So the vector representation at each position is encoded with the information from nearby words. The contextual representation of the QA story is attended (read) by the contextual representations of the question and the answer, respectively. This is done via two separate attention modules. The attention process can be viewed as both the question and the answer read the QA story to form their corresponding side information. Then the side information is concatenated with the original contextual representation for the question and the answer, respectively. Now we call them QA-augmented question and answer. Later, we pass the QA-augmented question and answer representations to the Question Context 2 layer and the Answer Context 2 layer, respectively. The second context layers here are used to learn the representations encoding both the original context representations and the QA story. Note that we only learn a single vector from the Answer Context 2 layer as thepolarity vector since we need a single vector to represent the polarity of the whole answer sequence. Lastly, the polarity vector is duplicated times, each of which is concatenated to the representation of each word in the question, and output the label sequence
via a dense+softmax layer shared for each word in the question. Thus, both the question and the answer help to decide the output labels in an end-to-end manner.
Note that more complicated deep architecture for sequence labeling can be leveraged (e.g., modeling label dependency using CRF or learning better features from character-level embedding as in LSTM-CRF [Lample et al.2016] or LSTM-CNNs-CRF [Ma and Hovy2016]), and here we mainly focus on how to leverage a QA story to augment side information and perform sequence labeling. Next, we briefly introduce preliminary layers that are common in most NLP models.
Input Layers Let the sequence and denote the question and the answer, respectively, where denote the length of the answer. A question (or an answer) may contain multiple sentences and we simply concatenate them into a single sequence. We set as the maximum number of words in any question; since an answer can be as long as more than 2000 words, we simply make the answer the same length as the question () by removing words beyond the first 82 words. Intuitively, the beginning of an answer is more informative.
We transform ( and , resp.) into an embedded representation ( and , resp.) via a word embedding matrix , and is the dimension (we set it as 300) of word vectors. We pre-train the word embedding matrix via fasttext model [Bojanowski et al.2016]
and fine tune the embeddings when optimizing the proposed model. The fasttext model allows us to obtain embeddings for out-of-vocabulary words (which is common in product QAs) from character n-grams embeddings. The pre-training is discussed in Section Experimental Result.
BLSTM Context Layers The embedded word sequences (, and ) are fed into the Question Context 1 layer, the Answer Context 1 layer, and the QA Story Context layer, respectively. BLSTM [Hochreiter and Schmidhuber1997, Schuster and Paliwal1997] is an important variant of RNN due to its ability to model long-term dependencies and contexts in both forward and backward directions in a sequence. The key component of an LSTM unit is the memory cell, which avoids overwriting the hidden state at every time step. An LSTM unit decides to update the memory cell via input, forget and output gates. We set all the output dimensions of BLSTM layers as 128. We omit the details of the update mechanism and interested readers may refer to [Hochreiter and Schmidhuber1997] for details. Note that other variants of RNN such as GRU [Chung et al.2014]
can also be used, here we mainly focus on how attention mechanism can help improve the performance. After passing the question and the answer embedding through these BLSTM layers, we have the hidden representations, and for the question, answer and QA story, respectively.
Dual Attention Network
Next, we leverage attention mechanism to allow both the question and the answer to enrich their representations. Attention mechanism [Larochelle and Hinton2010, Denil et al.2012] is popular in recent years due to its capability of modeling variable length memories rather than fixed-length memory. We utilize attention mechanism to synthesize side information from the QA story. We introduce two attention mechanisms: question attention and answer attention. By reading the QA story, they both get side information by the fact that the question and the answer in a QA pair are connected. Intuitively, the words in the answer depend on the question and the question can help infer compatibility information from the answer. According to our experience of dataset annotation, we find that some entities are hard to label by reading only the question or the answer. However, if we read the question and answer as a whole, more often we get the idea of what the QA pair discusses about. For example, a question like “straight talk?” for a cell phone can be hard to label. If we have an answer “yes, it works with straight talk well.”, we can somehow guess “straight talk” should be a carrier. Similarly, the “straight talk” indicated by the question also helps us to identify the polarity word “well” in the answer. This is very important in identifying implicit polarities in the answer. We mimic this procedure of human’s reading comprehension using the following attention mechanism.
Let the output from the QA story Context layer be . For the question (answer) attention, we obtain the attention weight for the -th question (answer) word when reading the -th word in the QA story via a dot product and further normalized by a softmax function:
Then the contextual information for the -th word in the question (or answer) is the weighted sum over all words in the QA story:
We concatenate with as the representation of the -th word in the question (answer): . Similarly, we have another answer attention module to obtain a sequence of answer word representations and the -th word in the answer is denoted as .
Next, and are fed into the Question Context 2 layer and the Answer Context 2 layer, respectively, which are similar to the stacked BLSTM [El Hihi and Bengio1995] to obtain better representation of the sequences. We utilize two structures of BLSTMs: the many-to-many structure on questions and the many-to-one structure for learning the answer representation since we care about the answer polarity more than word-by-word representations. The answer representation is a concatenation of the last and the first outputs of a forward and backward LSTMs, respectively. Finally, we have for the question sequence and for the polarity representation of the whole answer.
Now we form the joint model in an end-to-end manner, by merging the question branch and the answer branch into prediction . So the labels for each word in question can affect both the question and answer branches to learn better representations. To match the output length of the answer branch to the output length of the question branch, we obtain copies of and concatenate each copy with the output of the question branch at each word position. Then we reduce the dimension of each concatenated output to via a fully-connected layer by weights and bias shared among all positions of the question:
where is the representation the
-th position in question. We output the probability distribution over label spacefor the -th question word via a softmax function:
where represents all trainable parameters.
Finally, we optimize the cross entropy loss function:
where represents all the training examples and is the ground truth for the -th question word and label in the -th training example. So is a one-hot vector. We leverage Adam optimizer [Kingma and Ba2014] to optimize this loss function and set the learning rate as 0.001 and keep other parameters the same as the original paper. We set the dropout rate to 0.1. The batch size is set to 128.
During testing, the prediction for each position in the question is computed as:
Lastly, for function satisfiability analysis, we extract function words and function targets with polarities over label space ; for compatibility analysis, we extract complementary entities with polarities over label space .
In this section, we discuss the details of the annotated corpus, and experimentally demonstrate the superior performance of DAN.
Corpus Annotation and Analysis
|Product||QA||% with Fun.||Intr. Fun.||Extr. Fun.|
|Micro SD Card||281||81.85||1||229|
We crawled about 1 million QA pairs from the web pages of products in the electronics department of Amazon. These 1 million QAs combined with all electronics customer reviews [McAuley, Pandey, and Leskovec2015] are used to train word embeddings. We use customer reviews because the texts in QAs are too short to train good quality embeddings. The combined corpus is 4 GB.
We select 42 products with 7969 QA pairs in total as the to-be-annotated corpus. The corpus is labeled by 3 annotators independently. The general annotation guidelines are as the following:
only yes/no QAs should be labeled;
a function expression should be labeled as either intrinsic function or extrinsic function;
each function expression is labeled with separate function words and function targets;
function words are verbs and prepositions around the function targets;
function targets are token spans containing nouns, adjectives, or model numbers;
abstract entities such as “picture”, “video”, etc. are considered as function targets for intrinsic functions;
specific entities are considered as complementary entities (function targets for extrinsic functions). They are not limited to products from Amazon, but also include general entities like “phone”, or service providers like “AT&T”;
implicit yes/no answers should also be labeled to increase the recall rate;
implicit answers without direct experience on the target product are labeled as uncertain answers (e.g., “I am not sure but it works for my android phone.”).
All annotators initially agreed on their annotations on 83% of all QA pairs. Disagreements are then resolved to reach final consensus annotations.
|Micro SD Card||168||16||46||9|
Due to limited space, the statistics of 18 selected products with annotations are shown in Table 2 and 3. We can see that the majority functions are extrinsic functions, indicating the importance of product compatibility analysis. This is close to our common sense as complementary entities can be unlimited, whereas the intrinsic functions are usually limited. We observe that accessories (the last 5 products) have a higher percentage of functionality related questions than main products (the first 13 products). This is as expected since accessories are poorly described in the product description and accessories usually have many complementary entities. From Table 3, we can see that the polarity distribution is not even: most products have more satisfiable functions than unsatisfiable or uncertain ones. This is because customers are more likely to ask a question to confirm functionality before purchasing, and many unsatisfiable functions are thus identified in advance without asking a question. The only exception is the relatively new product VR headset, which is problematic due to its short time of testing on the market.
We further investigate product descriptions of these 18 products and count the number of compatible products mentioned, as shown in the last column of Table 3. Interestingly, no incompatible entities can be found, justifying the need for compatibility analysis on incompatible products from user-generated data.
The corpus is preprocessed using Stanford CoreNLP 333http://stanfordnlp.github.io/CoreNLP/ regarding sentence segmentation, tokenization, POS-tagging, lemmatizing and dependency parsing. The last 3 steps provide features for the Conditional Random Fields (CRF) [Lafferty, McCallum, and Pereira2001] baseline. We shuffle all QA pairs and select 70% of QA pairs for training, 10% for validation and 20% for testing.
We compare DAN with the following baselines.
CRF: This baseline is to show that a traditional sequence labeling model performs poorly. Note that CRF [Lafferty, McCallum, and Pereira2001] can only be evaluated on extraction without polarity detection since it cannot incorporate the answer into the model. We train CRF models using Mallet444http://mallet.cs.umass.edu/. We use the following features: the words within a 5-word window, the POS tags within a 5-word window, the number of characters, binary indicators (camel case, digits, dashes, slashes and periods), and dependency relations for the current word obtained via dependency parsing.
QA S-BLSTM: This baseline does not have any attention module. We use it to show that attention mechanism improves the results.
QA CoAttention: This baseline is inspired by [Xiong, Zhong, and Socher2016], where the question and the answer part directly attend to each other, without form the QA story. We use this baseline to demonstrate that attending on a QA story is better.
DAN (-) Answer Attention: This baseline does not have the answer attention module in DAN. We use this baseline to show that the answer attention also helps to improve the performance on polarity detection.
Product Compatibility Analysis
|DAN (-) Ans. Attention||63.9||80.2||81.5|
We first evaluate the performance of product compatibility analysis. Note that the label space of this task is , as described in Section Model Overview and Preliminary. We consider an extracted entity that has more than or equal to 50% overlapping words with the ground truth entity as a positive extraction. The polarity of a positive extraction is computed as the majority type voted from all words in such a positive extraction. So a true positive example must have at least 50% overlapping words and a match on polarity. Any positive extraction with no corresponding ground truth entity is considered as a false positive example. Any example with mismatched polarity or negative extraction is considered as a false negative example. We average the computed from the above definition as the PCA column, shown in Table 4. Further, by only considering a positive extraction as a true positive and a negative extraction as a false negative, we compute CER for complementary entity recognition. Given a positive extraction, we further compute the classification accuracy over 3 polarity types to show the effectiveness of polarity detection.
Result Analysis: We can see that the performance of DAN outperforms other baselines on PCA , CER , and polarity accuracy. The attention mechanism boosts the performance of CER a lot. With the attention on the answer, the polarity is more accurate when DAN is compared with DAN (-) Answer Attention. The baseline QA CoAttention indicates that attending on the QA story is better than attending the question or the answer alone. Lastly, CRF performs poorly on learning better word representations.
Product Function Satisfiability Analysis
|DAN (-) Ans. Attention||62.5||78.0||83.0|
We then evaluate the performance of product function satisfiability analysis, which requires the label space of all models to be . Similar to the previous task, we consider an extracted function target that has more than or equal to 50% overlapping words with the ground truth function target as a positive function target extraction. If there is at least one function words that are correctly predicted, or the function word from the ground truth is missing, we consider such case as a positive function word extraction. A true positive extraction
is generated when both a positive function target extraction and a positive function word extraction happen. The rest of evaluation metrics is the same as in the previous subsection except the change of corresponding terms, as shown in Table5.
Result Analysis: We can see that the performance of DAN outperforms other baselines. DAN improves over DAN (-) Answer Attention a little for function need recognition, but a lot for polarity detection due to answer attention. The baseline QA CoAttention indicates that the QA story in DAN is better given the longer QA story rather than the question or the answer alone. Further, we notice that the performance of QA S-BLSTM and QA CoAttention are close. So the short question or answer alone may not have enough information. Sometimes it may even introduce noise. Lastly, CRF performs poorly because of its poor representation learning capability.
To get a better sense of the extracted function expressions (or needs), we sample a few predictions for 5 popular products from DAN, as shown in Table 6. We observe that many function needs are indeed customers’ high priority needs. Most function needs are extrinsic functions and their function targets can be interpreted as complementary products. For example, it is important to know that the Tablet is not designed for high-performance games like “fallout 4”, or google apps are not runnable on Cellphone 1. Intrinsic functions are also identified, such as “waterproof” or “support multi pairing”. Knowing whether Apple Watch is waterproof or not is very important when deciding whether to buy such a product.
In this paper, we propose two closely related problems: product compatibility analysis and function satisfiability analysis, where the second problem is a generalization of the first problem. We address this problem by first creating an annotated corpus based on Product Community Question and Answering (PCQA). Then we propose a neural Dual Attention Network (DAN) to solve these two (2) problems in an end-to-end manner. Experiments demonstrate that DAN is superior to a wide spectrum of baselines. Applications of this model can be found in e-commerce websites and recommender systems.
This work is supported in part by NSF through grants IIS-1526499, CNS-1626432, and NSFC 61672313. We would also like to thank anonymous reviewers for their valuable feedback to improve this paper.
- [Bahdanau, Cho, and Bengio2014] Bahdanau, D.; Cho, K.; and Bengio, Y. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
- [Bojanowski et al.2016] Bojanowski, P.; Grave, E.; Joulin, A.; and Mikolov, T. 2016. Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606.
- [Chung et al.2014] Chung, J.; Gulcehre, C.; Cho, K.; and Bengio, Y. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555.
- [Denil et al.2012] Denil, M.; Bazzani, L.; Larochelle, H.; and de Freitas, N. 2012. Learning where to attend with deep architectures for image tracking. Neural computation 24(8):2151–2184.
[El Hihi and Bengio1995]
El Hihi, S., and Bengio, Y.
Hierarchical recurrent neural networks for long-term dependencies.In NIPS, volume 400, 409. Citeseer.
- [Goodfellow, Bengio, and Courville2016] Goodfellow, I.; Bengio, Y.; and Courville, A. 2016. Deep learning. Book in preparation for MIT Press.
- [Greff et al.2015] Greff, K.; Srivastava, R. K.; Koutník, J.; Steunebrink, B. R.; and Schmidhuber, J. 2015. Lstm: A search space odyssey. arXiv preprint arXiv:1503.04069.
- [Hochreiter and Schmidhuber1997] Hochreiter, S., and Schmidhuber, J. 1997. Long short-term memory. Neural computation 9(8):1735–1780.
- [Kingma and Ba2014] Kingma, D., and Ba, J. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
- [Kotler and Armstrong2010] Kotler, P., and Armstrong, G. 2010. Principles of marketing. pearson education.
- [Kumar et al.2015] Kumar, A.; Irsoy, O.; Su, J.; Bradbury, J.; English, R.; Pierce, B.; Ondruska, P.; Gulrajani, I.; and Socher, R. 2015. Ask me anything: Dynamic memory networks for natural language processing. CoRR, abs/1506.07285.
[Lafferty, McCallum, and
Lafferty, J.; McCallum, A.; and Pereira, F. C.
Conditional random fields: Probabilistic models for segmenting and
labeling sequence data.
Proceedings of the Eighteenth International Conference on Machine Learning (ICML 2001), Williams College, Williamstown, MA, USA, June 28 - July 1, 2001, 282–289.
- [Lample et al.2016] Lample, G.; Ballesteros, M.; Subramanian, S.; Kawakami, K.; and Dyer, C. 2016. Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360.
[Larochelle and Hinton2010]
Larochelle, H., and Hinton, G. E.
Learning to combine foveal glimpses with a third-order boltzmann machine.In Lafferty, J. D.; Williams, C. K. I.; Shawe-Taylor, J.; Zemel, R. S.; and Culotta, A., eds., Advances in Neural Information Processing Systems 23. Curran Associates, Inc. 1243–1251.
- [LeCun, Bengio, and Hinton2015] LeCun, Y.; Bengio, Y.; and Hinton, G. 2015. Deep learning. Nature 521(7553):436–444.
- [Li et al.2016] Li, P.; Li, W.; He, Z.; Wang, X.; Cao, Y.; Zhou, J.; and Xu, W. 2016. Dataset and neural recurrent sequence labeling model for open-domain factoid question answering. arXiv preprint arXiv:1607.06275.
- [Ma and Hovy2016] Ma, X., and Hovy, E. 2016. End-to-end sequence labeling via bi-directional lstm-cnns-crf. arXiv preprint arXiv:1603.01354.
- [McAuley and Yang2016] McAuley, J., and Yang, A. 2016. Addressing complex and subjective product-related queries with customer reviews. In World Wide Web.
- [McAuley, Pandey, and Leskovec2015] McAuley, J. J.; Pandey, R.; and Leskovec, J. 2015. Inferring networks of substitutable and complementary products. In KDD.
- [Nassif, Mohtarami, and Glass2016] Nassif, H.; Mohtarami, M.; and Glass, J. 2016. Learning semantic relatedness in community question answering using neural models. ACL 2016 137.
[Rajpurkar et al.2016]
Rajpurkar, P.; Zhang, J.; Lopyrev, K.; and Liang, P.
Squad: 100,000+ questions for machine comprehension of text.
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2383–2392. Austin, Texas: Association for Computational Linguistics.
- [Richardson, Burges, and Renshaw2013] Richardson, M.; Burges, C. J.; and Renshaw, E. 2013. MCTest: A challenge dataset for the open-domain machine comprehension of text. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, 193–203. Seattle, Washington, USA: Association for Computational Linguistics.
- [Rush, Chopra, and Weston2015] Rush, A. M.; Chopra, S.; and Weston, J. 2015. A neural attention model for abstractive sentence summarization. arXiv preprint arXiv:1509.00685.
- [Schuster and Paliwal1997] Schuster, M., and Paliwal, K. K. 1997. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing 45(11):2673–2681.
- [Tan, Xiang, and Zhou2015] Tan, M.; Xiang, B.; and Zhou, B. 2015. Lstm-based deep learning models for non-factoid answer selection. arXiv preprint arXiv:1511.04108.
- [Tang, Qin, and Liu2016] Tang, D.; Qin, B.; and Liu, T. 2016. Aspect level sentiment classification with deep memory network. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 214–224. Austin, Texas: Association for Computational Linguistics.
- [Wang and Nyberg2015] Wang, D., and Nyberg, E. 2015. A long short-term memory model for answer sentence selection in question answering. ACL, July.
- [Xiong, Zhong, and Socher2016] Xiong, C.; Zhong, V.; and Socher, R. 2016. Dynamic coattention networks for question answering. arXiv preprint arXiv:1611.01604.
- [Xu et al.2016a] Xu, H.; Shu, L.; Zhang, J.; and Yu, P. S. 2016a. Mining compatible/incompatible entities from question and answering via yes/no answer classification using distant label expansion. arXiv preprint arXiv:1612.04499.
- [Xu et al.2016b] Xu, H.; Xie, S.; Shu, L.; and Yu, P. S. 2016b. Cer: Complementary entity recognition via knowledge expansion on large unlabeled product reviews. In Proceedings of IEEE International Conference on Big Data.
- [Xu et al.2017] Xu, H.; Xie, S.; Shu, L.; and Yu, P. S. 2017. Product function need recognition via semi-supervised attention network. In Proceedings of IEEE International Conference on Big Data.
- [Xu, Shu, and Yu2017] Xu, H.; Shu, L.; and Yu, P. S. 2017. Supervised complementary entity recognition with augmented key-value pairs of knowledge. arXiv preprint arXiv:1705.10030.