Product Function Need Recognition via Semi-supervised Attention Network

12/06/2017 ∙ by Hu Xu, et al. ∙ Lehigh University University of Illinois at Chicago 0

Functionality is of utmost importance to customers when they purchase products. However, it is unclear to customers whether a product can really satisfy their needs on functions. Further, missing functions may be intentionally hidden by the manufacturers or the sellers. As a result, a customer needs to spend a fair amount of time before purchasing or just purchase the product on his/her own risk. In this paper, we first identify a novel QA corpus that is dense on product functionality information [The annotated corpus can be found at < hxu/>.]. We then design a neural network called Semi-supervised Attention Network (SAN) to discover product functions from questions. This model leverages unlabeled data as contextual information to perform semi-supervised sequence labeling. We conduct experiments to show that the extracted function have both high coverage and accuracy, compared with a wide spectrum of baselines.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Functionality is a fundamental concern for customers when they decide to buy a new product. From customers’ perspective, before they purchase a product, it is natural for them to ask what the to-be-purchased one can do and cannot do. From sellers’ perspective, selling fully-functioned products can increase sales, and yet selling products with missing functions can lead to catastrophic customer dissatisfaction. From manufacturers’ perspective, missing functions reported by customers can help improve their products. In marketing, the term product is defined as “anything that can be offered to a market for attention, acquisition, use or consumption that might satisfy a want or need” [1]. It is crucial to ensure that the functions of a product can satisfy customers’ needs. Therefore, conveying the information about functions successfully to customers is important for both manufacturers and sellers.

Apple 13 ” MacBook Pro
(2.5GHz Intel Core i5, 4GB RAM, 500GB HDD)
Q: Can I use this for video editing
A: No, it does not support Google Play.
Q: Can I make video calls to other non Apple computers ? ?
A: yes you can if they have Skype , Tango , or oovoo
Q: Will it be useful for music production ?
I have not used it for music production ;
however , I believe that it would be and have
several friends who use it specifically for that purpose .
Q: Can I use Microsoft Office on this MacBook Pro ?
You can but maybe you wo n’t want to .
The current Apple MacBook Pro is shipping with the
Mavericks operating system , which includes Pages ,
Numbers , and Keynote at no cost .
TABLE I: A few QA pairs for a laptop: function expressions are underlined with function words (e.g., verbs, adjectives or prepositions) bolded.

In e-commerce platforms, one issue to convey such information is that products cannot be physically presented to customers before purchasing. To overcome such limitation, many alternative approaches are deployed, i.e., using descriptions, pictures, and videos. However, detailed functionality information may not be readily available for the following reasons. 1) The cost of testing functions multiplied by a large number of products can be extremely high. For example, it is impossible to test so many PCs whether they can run specific high-performance PC games. 2) Some missing functions are deliberatively hidden from descriptions by sellers to avoid hurting sales.

Fortunately, functionality information can be exchanged between customers and sellers via online platforms, such as forums and community QA. This allows us to adopt an NLP-based approach to automatically sense and harvest product functions on a large scale. We formulate a novel text mining task called Function Need Recognition (or FNR for short). A function need is defined as a sequence of words indicate a function expression (e.g., “make video calls”). In this paper, we only focus on product function needs and leave satisfiability issues (e.g., whether a product can “make video calls”) to future work 222A comprehensive study of product function satisfiability can be found at AAAI-2018 [2]. .

This task is non-trivial and the following challenges have to be addressed. First, to ensure extraction quality, corpora that are dense and accurate in product functionality information are preferred. To the best of our knowledge, there is no existing study on such a corpus to meet these requirements. Second, the number of function needs can be unlimited. How to ensure unexpected function needs can be detected is important.

We address the challenges by first identify and annotate a high-quality corpus. In particular, allows potential consumers to communicate with existing product owners or sellers regarding product functions via Product Community Question Answering (PCQA for short). Four (4) QA pairs talking about a laptop sold on Amazon are shown in Table I. Observe that the name of target product (to-be-purchased) can be identified using the metadata of the target product. But 4 function needs (“use for video editing”, “make video calls”, “useful for music production”, and “use Microsoft Office”) should be identified from the questions.

Given the corpus, we then formulate the problem as a sequence labeling task on questions. We propose a deep sequence labeling model called Semi-supervised Attention Network (SAN) to solve this problem. The key property of SAN is to use attention mechanism to summarize unlabeled data as side information for short labeled questions. For example, let us assume only the 1st question is in the labeled data and all other 3 questions are in unlabeled data. Then words like “use” or “video” in other 3 questions can serve as side information to help identify that “use for video editing” is a function. Also, another advantage of using unlabeled data is that the embeddings of words do not appear in labeled data can still be tuned during training. To the best of our knowledge, this is the first attempt to use attention mechanism in a semi-supervised setting.

Ii Model and Preliminary

Ii-a Model Overview

We briefly introduce the proposed Semi-supervised Attention Network (SAN) in this section. The idea of the network is to couple RNN-based sequence labeling network with attention on unlabeled data. The proposed network is illustrated in Fig. 1. The left side can be viewed as a supervised sequence labeling model. It reads in a (labeled) question and outputs label sequence , where . The right side is the semi-supervised part. A few unlabeled questions

are fed into a bank of BLSTMs (Bidirectional Long Short-Term Memory

[3, 4], one for each unlabeled question) with attentions (called bank attention). The attended results are served as side information for the (labeled) question. The key point here is, given a labeled question, we need to learn the weights on how to attend (or read) unlabeled questions. Note that both supervised and semi-supervised parts share the same embedding layer. This also gives the opportunity to tune embeddings of words not appear in the labeled questions. Such a tuning is impossible in supervised settings. All unlabeled questions share the same weights for their BLSTM layers (not shown in the figure). After each word in the labeled question obtains the side formation, we feed the augmented labeled question into another BLSTM layer. Then we generate label sequence

via a softmax layer. Overall, the labeled question can leverage unlabeled questions to decide the output labels in an end-to-end manner.











Bank Attention

(on Unlabeled Questions)

Fig. 1: Semi-supervised Attention Network (SAN): the bottom 4 words are an input (labeled) question. They are labeled as , , , , indicating “Works with iphone” is a function expression. On the right is bank attention on unlabeled questions (sample questions are omitted).

Ii-B Preliminary

Embedding Layer We pair each labeled question with a few unlabeled questions (for both the training data and the test data). Unlabeled questions are similar questions from the same category as the labeled question returned by a search engine. Let the sequence and denote the labeled question and the -th unlabeled question, respectively. Here and denote their respective lengths. When a question contains multiple sentences, we concatenate them into a single sequence. We separate the sentences by a special token EOS. We set , which covers 99.5% of lengths of labeled questions. Questions longer (shorter) than

words are truncated (padded with zeros). We can view


, resp.) as a matrix of one-hot column vectors.

is later transformed into embedded representation (, resp.). We pre-train the word embedding via skip-gram model [5]. Then we fine-tune the embeddings when optimizing the proposed model.

BLSTM Layer The embedded question sequences ( and ) are fed into the labeled BLSTM and the unlabeled BLSTMs, respectively. We use and to denote the outputs of these BLSTM layers for the labeled question and unlabeled questions, respectively. We show important notations in Table II, which is used in the next section.

Iii Semi-supervised Attention Network

Notation Explanation

-th hidden representation of the labeled

question ()
Hidden representations of the -th unlabeled
The -th word in the labeled question
The -th word in an unlabeled question
Indicator of transformed representation
for the labeled question
Indicators of transformed represention
for the unlabeled question
Level 1 attention weights for the
-th word in on the -th word in .
Level 2 attention weights for the
-th word in on
Level 1 attended representation:
the -th word in attends on unlabeled question
Level 2 attended representation: the -th word
in attends on all
TABLE II: Notations

Iii-a Bank Attention

The key point of SAN is to leverage attention mechanism for semi-supervised learning. We utilize attention mechanism to synthesize side information from unlabeled data for each word in a labeled question. The idea is that words in unlabeled data may have useful information for sequence labeling when they talk about similar products. We introduce a hierarchical attention mechanism. As traditional attention mechanism, we let each word in a labeled question to attend a word in an unlabeled question. This is level 1 attention. On the higher level, we pair a labeled question with multiple related unlabeled questions. Note that different questions may not equally contribute side information to the labeled question. So we allow one word in the labeled question to attend on the results of level 1 attention on multiple questions. We use the term bank attention to refer to one word in a labeled question hierarchically attending to unlabeled questions. The details are shown in Fig. 2.

We try to get the side information for the -th word in the labeled question. We first transform the word representations of the labeled question and unlabeled question via respective fully connected layers. Then the representations are activated by :


where , , and are trainable weights. The -th word in the labeled question first obtain the attention weight for the -th word in the -th unlabeled question via a dot product. Then the weights are normalized by a softmax function:


This is the level 1 attention weights. Let denote the side information of the -th word in the labeled question for the -th unlabeled question (representation after the first-level attention). It is the weighted sum over all words in the -th unlabeled question.


Later, we have a level 2 attention over different unlabeled questions. Again we first transform the side information of the -th word for each unlabeled question:


Then the level 2 attention weights are again obtained via dot products normalized by a softmax function:


And finally the side information vector for the -th word in the labeled question (representation after level 2 attention) is:


Lastly, we concatenate with as the representation of the -th word in the question: .

Fig. 2: Bank Attention: the -th word representation obtains its side information from multiple unlabeled questions such as the 1st unlabeled question . The red arrows indicate level 1 attention among different words in one unlabeled question (we omit the arrows for the other 4 questions). The blue arrows indicate level 2 attention among multiple representations of unlabeled questions.

Iii-B Sequence Labeling

After obtaining the representation of the labeled question with side information, we feed into another BLSTM layer. So we have two LSTM layers for the labeled question, which is similar to the stacked BLSTM [6] (S-BLSTM). We use S-BLSTM to obtain better sequence representation. Then we have for the labeled question sequence. We reduce the dimension of to the size of the label set via a fully connected layer:



. We output the probability distribution over labels

for the -th question word via a softmax function:



represents all trainable parameters, including parameters in LSTM cells and word embeddings. Finally, we optimize the cross entropy loss function over the training dataset:


where represents all the training examples. is the ground truth for the -th question word and label in the -th training example. We leverage Adam optimizer [7] to optimize the whole network. We set the learning rate as 0.001 and keep other parameters the same as the original paper. We set the dropout rate to 0.2. The batch size is set to 256.

Iv Experimental Result

Product QA % of QAs with Functions
DSLR 327 20.18
E-Reader 271 31.37
Speaker 153 30.72
Tablet 329 42.86
Cellphone 1 170 57.65
Cellphone 2 330 41.82
Laptop 1 297 18.86
Laptop 2 425 54.59
Netbook 199 44.72
TV 306 46.41
TV Console 183 54.1
Gaming Console 212 70.28
Apple Watch 331 28.1
VR Headset 444 76.13
Stylus 266 71.05
Micro SD Card 283 81.27
Mouse 259 66.02
Tablet Stand 214 88.79
Total 4999 51.07
TABLE III: Statistics of 18 labeled products. QAs: number of QA pairs; % of QAs with Functions: percentage of QA pairs containing function needs.

Iv-a Corpus Annotation, Analysis, and Preprocessing

We crawled about 1 million QA pairs from the pages of products in the electronics department from Amazon as the training corpus for skip-gram model [5] to obtain word embedding matrix .

We further annotated a subset of 4999 QA pairs from 18 products for model training and testing. The basic statistics of the corpus is shown in Table III. The corpus is labeled by 3 annotators independently. The general annotation guidelines are as follows:

  1. only yes/no QAs should be labeled;

  2. a function expression is labeled as a function target with an optional function verb;

  3. a function target can be specific entities (e.g., “iPhone”), general entities like “video” or service providers like “AT&T”;

  4. a function target should be labeled as token spans containing nouns, adjectives, or model numbers (e.g., “Samsung micro SD EVO”);

  5. expressions about specific aspects or accessories are not considered as function expressions. This is because aspects or accessories are not closely related to the functionality of the product as a whole;

  6. nouns that are subjective are not regarded as function target (e.g., the word “need” in “Can it fit my need ?”);

  7. the optional function word can be a verb (e.g., “produce” in “produce music”) or its noun form (e.g., “production” in “music production”); we also include the adjunct word (e.g., “with” in “work with iPhone”) for extrinsic function expression;

  8. some function expression does not have function word, e.g., “Does Skype ok on this?”;

All annotators initially agreed on their annotations (same function targets and function words) on 81% of all QA pairs. Disagreements are then resolved to reach final consensus annotations.

We observe that accessories (the last 5 products) have a higher percentage of the function need related questions than those of main products (the first 13 products). This is expected since one accessory may work with multiple devices and thus have more functions.

The annotated corpus is preprocessed using Stanford CoreNLP 333 We have the following steps: sentence segmentation, tokenization, POS-tagging, lemmatizing and dependency parsing. The last 3 steps provide features for the Conditional Random Fields (CRF) [8] baseline.

We also select the most similar 5 unlabeled questions under the same category as the labeled question returned by, as the question bank.

We only perform sentence segmentation and tokenization on these unlabeled questions to save preprocessing time. Lastly, multiple sentences in both labeled and unlabeled questions are concatenated together. We set the maximum length of a question to be 40. This covers 99.5% labeled questions in full length.

After preprocessing, one example contains a labeled question, 5 unlabeled questions, and one labeled answer. We shuffle all examples and select 70% for training, 10% for validation and 20% for testing. The validation set is used to avoid overfitting on the training data.

Iv-B Baselines

CRF 0.798 0.611 0.692
S-BLSTM 0.844 0.673 0.749
SAN (-) BLSTM2 0.83 0.7 0.759
SAN 0.839 0.721 0.776
TABLE IV: Different methods for Function Need Recognition (FNR) in precision, recall and F1-score.

We compare the following baselines with SAN:

  1. CRF: We use Mallet555 as the CRF implementation. We train a CRF model using exactly the same training data as the proposed method. We use the following manually created features:

    1. the words within a 5-word window;

    2. the POS tags within a 5-word window;

    3. the number of characters;

    4. binary indicators (camel case, digits, dashes, slashes and periods);

    5. dependency relations for the current word obtained via dependency parsing.

    We use CRF as a baseline to show the performance of a non-deep learning method.

  2. S-BLSTM: This baseline is a traditional S-BLSTM with 2 layers (by removing the bank attention from SAN). It is a supervised baseline. We use this baseline to show that using purely supervised data is not good enough. Unlabeled data can help to improve the performance.

  3. SAN (-) BLSTM2: This baseline does not have the second layer of BLSTM for the labeled question. We use this baseline to show that S-BLSTM works better for our problem. We use 5 unlabeled questions in both this baseline and SAN.

Result Analysis From Table IV, we can see that the proposed SAN framework performs the best on F1-score. Although CRF is a non-deep learning model, its precision is not bad since we use dependency relations as features. However, the recall of CRF is very low since it can only train weights on words appear in the training data. All deep learning models have better recalls than CRF. S-BLSTM has the best precision as it is trained using only the training data. However, its recall is relatively low. It still suffers the problem that training data can not further tune embeddings of words not appeared in the training data. SAN (-) BLSTM2 shows that the additional BLSTM layer is effective in learning better representations. Lastly, SAN significantly improves the recall by further adjusting the weights for different unlabeled questions. It only loses 0.5% on precision compared that with S-BLSTM.

V Related Work

Both data mining and natural language processing communities study sentiment analysis on products

[9, 10, 11, 12, 13]. However, Product Community Question and Answering (PCQA) only draws attention in recent years [14, 15]. PCQA is studied as a relevance ranking problem in [14, 15]. Given a question, they retrieve relevant reviews to augment existing answers. Instead, we observe that PCQA also contains valuable fine-grained information for extraction. Product function needs are an important type of such information. Functions may contain both intrinsic functions and extrinsic functions [2]. Extrinsic functions are closely related to complementary products (taking whether one product can work with another as a function) [16, 17, 18]. But we observe that from the perspective of functionality, how two products can work together is also important. For example, “install Windows 10” and “run Windows 10” are two different functions.

Although CNN [19, 20] and Long Short-Term Memory (LSTM) [3] are both used in NLP tasks, LSTM is more commonly used in sequence labeling [21, 22]. Attention mechanism is popular in image recognition [23, 24]. It is later used in natural language processing [25, 26]. However, attention mechanism is only used in supervised settings. We adapt attention for a semi-supervised setting [27]. Traditional semi-supervised learning uses unlabeled data as training examples [28] directly. Instead, we use unlabeled data as side information for labeled examples.

Vi Conclusion

In this paper, we propose the task of Function Need Recognition (FNR), which is to identify function needs queried by customers. We leverage a Semi-supervised Attention Network (SAN) to solve this problem by leveraging unlabeled data as attended side information. Experiments demonstrate that the SAN is better than a number of baselines.


This work is supported in part by NSF through grants IIS-1526499, and CNS-1626432, and NSFC 61672313.