Query Tracking for E-commerce Conversational Search: A Machine Comprehension Perspective

10/08/2018 ∙ by Yunlun Yang, et al. ∙ Taobao 0

With the development of dialog techniques, conversational search has attracted more and more attention as it enables users to interact with the search engine in a natural and efficient manner. However, comparing with the natural language understanding in traditional task-oriented dialog which focuses on slot filling and tracking, the query understanding in E-commerce conversational search is quite different and more challenging due to more diverse user expressions and complex intentions. In this work, we define the real-world problem of query tracking in E-commerce conversational search, in which the goal is to update the internal query after each round of interaction. We also propose a self attention based neural network to handle the task in a machine comprehension perspective. Further more we build a novel E-commerce query tracking dataset from an operational E-commerce Search Engine, and experimental results on this dataset suggest that our proposed model outperforms several baseline methods by a substantial gain for Exact Match accuracy and F1 score, showing the potential of machine comprehension like model for this task.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Searching is often the first step for a user when he want to seek specific products on an E-commerce website. Traditional search interface restricts the user to search statelessly, and the user has to manually revise the query based on the results in last search. However, we find that consecutive queries issued in a short period are typically related. A basic example is asking “dress” followed by “red dress”. Therefore making good use of the context queries in previous search may bring more efficient interaction.

With the renaissance of neural network and the development of conversation techniques, interaction between humans and machines in a conversational manner has become popular, e.g. smart assistants, chatbots for chit-chat and task-oriented dialog (Li et al., 2017; Li et al., 2016; Zhu et al., 2017; Wang et al., 2016). By adopting the conversation ideas in search systems, the so-called “conversational search” (Radlinski and Craswell, 2017) takes the user’s latest queries into consideration and enables a user to express his new search intention in a naturally chatting way, which makes the entire search process more fluent.

Conversational search in E-commerce and task-oriented dialog share the same goal of understanding users’ demands and helping complete corresponding tasks (Yan et al., 2017). For example, both the dialog system for flight reservation and E-commerce conversational search will chat with the user and acquire his preferences, such as flight destination and item property respectively, finally provide the user expected flight tickets or products. The performance of such systems significantly relies on input understanding module.

In task-oriented dialog systems, natural language understanding mainly composes of two components: slot filling and dialog state tracking. Slot filling aims at extracting slot values and corresponding slots in input utterances. Dialog state tracker monitors the user’s current intention by maintaining a compact representation, which is usually called “dialog state”, and takes the results of slot filling into account to update the dialog states. The dialog states are further fed into the dialog manager module to decide the subsequent interaction.

Nevertheless, the query understanding of conversational search in E-commerce is of great difference with the natural language understanding in task-oriented dialog system. First, in a dialog system, the complexity of user utterance is often proportional to the diversity of what the system can offer to users. Since E-commerce search systems aim to help consumers find their desired items from a large variety of products, consumers’ descriptions for products are supposed to be particularly diverse. If slot filling is applied, large amount of semantic slots should be defined to cover all product properties, e.g. brand, color, size, style, skirt length and so on, and it is an expensive manual work. Second, unlike task-oriented dialog, a user may try different colors, brands and other keywords to approach satisfied item when interacting with search systems, and this raises high requirements to the state tracking module. For instance, if a user first searches for “Adidas shoes”, and then types in “Nike”, his current intention should be “Nike shoes” since both “Adidas” and “Nike” are tagged as “brand” by slot filling , but if first input is “fairy dress” and then “cute”, the user probably wants to search for “fairy and cute dress”, although both “fairy” and “cute” are words that describe the style of clothes. Hence, simple slot based state tracking may not be able to handle complex situations in E-commerce conversational search. Last but not the least, it is important to summarize current user intention in a query and feed it to search engine in conversational search

(Ren et al., 2018; Kumar and Joshi, 2017). We call the query that represents current user intention “internal query”. Comparing with task-oriented dialog, the internal query plays the role of “user state”, and is updated after each round of interaction. It is noteworthy that queries in E-commerce search commonly only contain several unordered keywords that describe some key properties of user’s needed products so as the internal query. An example of user input queries and corresponding internal queries are shown in Table 1.

#Turn User Input Query Internal Query
1 sport shoes sport shoes
2 Adidas Adidas sport shoes
3 Nike black black Nike sport shoes
4 ventilated ventilated black Nike sport shoes
Table 1. An example of user input queries and corresponding internal queries.

Therefore, to attack the challenges mentioned above and improve the tracking of the internal query in E-commerce conversational search, We formulate the problem as a context-aware query tracking task. The goal is to consider a user’s query history in a conversational way and output a query indicating the user’s current intention. To handle this task, we involve a machine comprehension perspective, and propose a neural network based query tracking model. In our model, we update the internal query without the help of slot, and turn the problem to a word-level binary classification.

The main contributions of our work are: 1) We define a real-world task of context-aware query tracking for the E-commerce conversational search system (Section 2), and we propose a novel neural network model based on machine comprehension perspective (Section 3). 2) We develop a Chinese E-commerce conversational search dataset for query tracking from online search engine query logs. 3) Multiple experiments suggest that our proposed model outperforms several baseline methods showing the potential of machine comprehension like model for this task.

2. Problem

In E-commerce conversational search, at each turn of interaction, a user types in a query, and the goal of query tracking is to consider the user’s query history in a conversational way and provide a query indicating the user’s current intention. The output query is also called ”internal query”, and will be further fed to the search engine to get search results. Assuming that the update of internal query satisfies Markov property, i.e. the user changes his intention only based on the last search, we can get the new internal query by considering only the last internal query and current input query.

Formally, given the last internal query , and the current input query , the goal of our problem is to generate the new internal query such that

Which is sequence generation task. It is noteworthy that the words in must appear in . And the words in query are assumed unordered since the word order of queries in E-commerce search hardly affects the results of search engine. Based on these observations, we can solve the task by deciding whether each word in should be reserved and taking the union set of remaining words of and as . Formally, the goal turns to finding a sequence of labels , one for each word in the last internal query, such that

where and indicates whether the word should be reserved or discarded in the new internal query.

3. Approach

In this section, we first discuss the connections between query tracking and machine reading as well as their main differences. And then motivated by ideas from machine reading, we propose a neural network based query tracking model.

3.1. Machine Comprehension V.S. Query Tracking

Machine reading has become a research hotspot recently. In this task, given a document, the goal is to answer a question related to the document (Wang and Jiang, 2016). Therefore, each data sample in machine reading is a triplet, . The framework of mainstream machine comprehension model typically consists of three phases: Encoding, Matching and Predicting. High-level representations of all words in documents and questions are produced in Encoding phase, and the question features are fused into document features by attention-based word pair matching in Matching phase. At last, the model will directly predict answer, which is typically a word or span of the document.

The goal of query tracking is similar to that of machine reading: given the current input query, find the snippets of the last internal query that should be reserved. Let denote the reserved words from , then a data sample in query track is also a triplet, , in which can be further translated into one-hot labels y.

The differences are obvious as well. First, answers in machine comprehension must be a word or a sequence of consecutive tokens from the documents, whereas the reserved words are not guaranteed sequential. As a result, directly predicting the word or answer span is no more suitable for query tracking. Second, as stated above, words in queries are assumed unordered, which means a query is actually a bag of words and representing queries with sequential models like LSTM-RNN may cause data inefficiency when training. Third, query tracking in a search session is a recurrent process and previous query tracking error may propagate through the subsequent interactions, hence the performance of query tracking should be more reliable than machine comprehension.

Overall, we follow the basic framework of machine comprehension and build our model for query tracking task.

3.2. Proposed Model

Motivated by the connections and differences between machine reading and query tracking, we propose a neural network based model to handle this task, which is composed of the same three parts: Encoding, Matching, Predicting, as shown in Figure 1.

Figure 1. The framework of our proposed model.

3.2.1. Encoding

Consider the last internal query and the current input query . First the words are converted to their respective word embeddings, and , where

. Typically, these word embedding will be further fed to a bi-directional recurrent neural network to get higher-level representations in machine reading. However, considering the assumption that a query is a bag of unordered words, applying the RNN model which takes sequential information into account may hurt the training efficiency. In query tracking, we propose to leverage self attention based encoder as a more suitable substitute. For a query whose embedding matrix is

, its encoding process is formulated as:

Where is row-wise and we use scaled dot-product as attention scoring function. More specifically, we exploit the multi-head mechanism (Vaswani et al., 2017) as well:

Where is the number of heads, , and is the word dimension in each head.

We also employ a feature enhancement technique (Mou et al., 2016) widely used in sentence pair modeling. Supposing that , we concatenate the features and feed the result to a fully-connected layer:

Where denotes element-wise product,

denotes activation function, and

. The element-wise difference and product can be regarded as a strong prior for

and intuitively help capture the relation between words. Comparing with simple addition (residual connection in Transformer) and concatenation, the feature enhancement brings better performance in experiments. After Encoding, the representations of

and are denoted as and respectively.

3.2.2. Matching

Similar to fusing passage and question representations in machine reading, we aim to incorporate the information of current input query into the last internal query in Matching phase. In query tracking, for determining whether words in internal query should be discarded, it is important to check for contradictions111”Contradiction” here usually means belonging to the same property, like ”Adidas” and ”Nike”, ”red” and ”black”. of individual word pairs between queries. To realize the idea, we utilize word-by-word attention (Bahdanau et al., 2014) to make information flow between queries. Formally,

Where . The feature enhancement is applied as well:

Since self attention is a special case of inter-sentence attention, their implementations are similar.In Matching phase, the words in are expected to find and absorb their counterpart information in by word-by-word attention.

3.2.3. Predicting

In machine comprehension, answers are extracted generally by selecting the word or span with high confidence in documents. Due to the difference that reserved word may consist of inconsecutive snippets, we employ a straightforward strategy and directly predict the labels of words, i.e. binary classification for each word. Before predicting, we apply self attention based encoding again to to get . Experimental results show that the second encoding operation helps words in the same phrase (e.g. brand ”vero moda”) behave consistently in classification. Based on , the binary classification are formulated as:

Where , and y is the predicted label. We train the network by minimizing the sum of cross entropy between ground truth label and predicted distribution.

4. Experiments

4.1. Dataset

To collect reliable query tracking data, we extract user input query from the query log of an online Chinese E-commerce search system. Since the online search system is not conversational and users interact with it in a single-turn way, we need to mine the change of user intentions from user input session.

First, query pairs which are typed in consecutively by the same user within minutes are extracted, and those of which frequencies are less than are filtered out. We regard the two queries in a pair are respectively the internal queries of two consecutive interaction round in conversational search, i.e. and in our definition. And Chinese word segmentation is applied to these queries. Then, we consider the difference set of the words in two queries of a pair as . For example, supposing the query pair (translated into English) is (”Adidas shoes”, ”Nike shoes”), the generated data triplet is (”Adidas shoes”, ”Nike”, ”Nike shoes”). At last, the triplets of which is empty or meaningless are removed, and we split the rest into three parts, get // thousand samples as train/validation/test set.

4.2. Implementation Details

Word embeddings used in our model are initialized by pretraining on a large shopping related corpus and fine-tuned during training. The head number and dimension of multi-head attention are and

. All queries are padded to a maximum sequence length of

. We use Adam as the optimization algorithm. The initial learning rate is and further decayed exponentially. For regularization, we apply dropout to the word embedding and the output of attention layer with a ratio of , and conduct early stopping on validation set.

4.3. Main Results

We evaluate our proposed model and several baseline methods, including:

  • Slot baseline We use a well-maintained product attribute knowledge base as the slot definition and develop a dynamic programming algorithm to match the slots in queries. In tracking, new value will replace the old one for the same slot. All slot values are concatenated as .

  • LSTM baseline 1 Model both queries with LSTM and feed the final state of as the initial state of .

  • LSTM baseline 2 Use bidirectional LSTM as encoder and the rest parts is the same as our proposed model.

We report the query-level Exactly Matched accuracy and word-level F1 score as evaluation metric.

Methods EM F1
Slot baseline 66.7 72.1
LSTM baseline 1 85.1 90.5
LSTM baseline 2 86.8 91.5
Our proposed model 86.9 91.6
Table 2. Performances of baseline methods and proposed model on query tracking dataset.

Table 2 is the results of baseline methods and proposed model on query tracking dataset. The unsupervised slot based method is limited by the coverage of product knowledge base and the effectivity of slot filling algorithm, and yields a much worse result than the supervised methods. This suggests that the E-commerce conversational search is a complex scenario and requires abundant knowledge and elaborate tracking rules to handle it in an unsupervised way. The other two strong baselines are both based on LSTM RNN encoding. LSTM baseline 1 outperforms Slot baseline by a large margin. LSTM baseline 2 achieves a comparable result with our proposed model. However, taking model efficiency into account, our proposed model is faster than LSTM baseline 2 in training speed and in test speed. While yielding close results, our self attention based model is more efficient. We also modify our model by adding a bidirectional LSTM layer before self attention encoding to increase the capacity, but no further improvement is observed.

4.4. Ablation Test

Methods EM F1
Full model 86.9 91.5
- pretrained embedding (randomly initialized instead) 86.0 90.9
- self attention based encoding 85.8 90.8
- multi-head mechanism 86.4 91.1
- feature enhancement (concatenation instead) 86.5 91.2
- feature enhancement (addition instead) 86.1 91.0
Table 3. Ablation test results of proposed model on query tracking dataset.

The results of ablation test are shown in Table 3. Self attention based encoding is crucial to the performance of our model, and only using word embedding will hurt the EM accuracy by . With pretrained word embedding, the EM accuracy improves from to , suggesting that well-initialized parameters are helpful to the final results. We have also tried other feature combination strategies and find element-wise product and difference are more important features. And the experimental results support that multi-head is a useful mechanism in attention.

5. Conclusion

In this paper, We define a real-world task of context-aware query tracking for the E-commerce conversational search system, and propose a novel attention based neural network model on machine comprehension perspective. For evaluation, we develop a Chinese E-commerce conversational search dataset for query tracking from online search engine query logs. And experiments suggest that our proposed model outperforms several baseline methods showing the potential of machine comprehension like model for this task and the efficiency of attention based model.


  • (1)
  • Bahdanau et al. (2014) Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv (2014).
  • Kumar and Joshi (2017) Vineet Kumar and Sachindra Joshi. 2017. Incomplete Follow-up Question Resolution using Retrieval based Sequence to Sequence Learning. In SIGIR.
  • Li et al. (2016) Jiwei Li, Michel Galley, Chris Brockett, Georgios Spithourakis, Jianfeng Gao, and Bill Dolan. 2016. A Persona-Based Neural Conversation Model. In ACL.
  • Li et al. (2017) Xiujun Li, Yun-Nung Chen, Lihong Li, Jianfeng Gao, and Asli Celikyilmaz. 2017. End-to-End Task-Completion Neural Dialogue Systems. In IJCNLP.
  • Mou et al. (2016) Lili Mou, Rui Men, Ge Li, Yan Xu, Lu Zhang, Rui Yan, and Zhi Jin. 2016.

    Natural Language Inference by Tree-Based Convolution and Heuristic Matching. In

  • Radlinski and Craswell (2017) Filip Radlinski and Nick Craswell. 2017. A theoretical framework for conversational search. In CHIIR.
  • Ren et al. (2018) Gary Ren, Xiaochuan Ni, Manish Malik, and Qifa Ke. 2018. Conversational Query Understanding Using Sequence to Sequence Modeling. In WWW.
  • Vaswani et al. (2017) Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In NIPS.
  • Wang et al. (2016) Beidou Wang, Martin Ester, Jiajun Bu, Yu Zhu, Ziyu Guan, and Deng Cai. 2016. Which to view: Personalized prioritization for broadcast emails. In WWW.
  • Wang and Jiang (2016) Shuohang Wang and Jing Jiang. 2016. Machine comprehension using match-lstm and answer pointer. arXiv (2016).
  • Yan et al. (2017) Zhao Yan, Nan Duan, Peng Chen, Ming Zhou, Jianshe Zhou, and Zhoujun Li. 2017. Building Task-Oriented Dialogue Systems for Online Shopping.. In AAAI.
  • Zhu et al. (2017) Yu Zhu, Hao Li, Yikang Liao, Beidou Wang, Ziyu Guan, Haifeng Liu, and Deng Cai. 2017. What to do next: modeling user behaviors by time-LSTM. In AAAI.