One of the key challenges for a recommender system is to predict the probability that a target user likes a given item, taking into account the user’s history and their similarity to other users. However, making predictions in this way does not explain why the item matches with the users’ preferences. Recent works have introduced the concept of explainable recommender systems, which try to generate explanations according to users’ preferences rather than only predicting a numerical rating for an item. In this work we develop an approach using character-level neural networks to generate readable explanations.
Current explainable recommendations propose to mine user reviews to generate explanations. In (Zhang et al., 2014)
they propose an explicit factor model, where they first extracts aspects and user opinions by phrase-level sentiment analysis on user generated reviews, then generate both recommendations anddisrecommendations according to the specific product features and personalised to the user’s interests and the hidden features learned. On the other hand, in (He et al., 2015) they propose a tripartite graph to enrich the user-item binary relation to a user-item-aspect ternary relation. In each of the these work, they propose to extract aspects from reviews to generate explainable recommendations, but they do not consider user opinions and influences from social relations as a source of explanation. In (Ren et al., 2017) they propose the social collaborative viewpoint regression model, which detects viewpoints and uses social relations as a latent variable model. This model is represented as tuples of a concept, topic, and a sentiment label from both user reviews and trusted social relations.
Explanations generated in this manner lack natural language expressions, since the sentences are generated in a modular way. However, it is well established by (Tintarev and Masthoff, 2007) that a good explanation must be clear, and interesting to the target user, since this information has a significant influence on the user’s decision. On-line user-generated reviews present clear and interesting information about items, since they describe personal usage experience from users. Furthermore, this source plays an important role on the user side, since he/she tend to trust the opinion of other users (Pu et al., 2012; Knijnenburg et al., 2012; Cosley et al., 2003).
Recurrent neural networks (RNNs) have recently demonstrated to show very good performance in natural language generation, since the generating function can be automatically learned from massive text corpora. Due RNNs suffers from gradient vanishing problem, long-short term memory (LSTM) has been applied to the text generation field, and leads to significant improvements on this issue. Another advantage of using LSTM is the ability to keep in memory the long-range dependencies among words and characters. The combination of RNNs with LSTM have shown promising results on such different text datasets as Shakespeare poem, scientific papers, and linux source code generation (Karpathy et al., 2015).
Most natural languages text generation approaches focus on the raw textual content and often neglect their contextual information. This context, such as the specific location, time and sentiment are important factors in the creation of user generated on-line reviews and should not be neglected. Recent research on recommender systems demonstrated improvements achieved by including context (Adomavicius and Tuzhilin, 2015). This paper incorporates this information to enrich the generated sentences with particular contextual features.
In this paper, we propose a technique for the automatic generation of explanations, based on generative text reviews given a vector of ratings that express opinions about different factors of an item. Our method is based on a character-level LSTM trained on a sub-sample from the large real-world dataset BeerAdvocate. It is divided into three modules: a context encoder, LSTM decoder, and the review generation. The ratings are normalised, then concatenated to the characters to feed the LSTM cells, which can generate characters that are contextualised by the normalised ratings. The generative review module has a weighted generation based on ratings vector as input. The weights learns soft alignments between generated characters and sentiment, where we adaptively compute encoder-side context vectors used to predict the next characters.
Automatic generated review-oriented explanations, are useful for companies and users, who can benefit from helpfulness aspect of the explanations to assess an item recommendation. (Bartoli et al., 2016)
shows character-level generation has advantages over other techniques such as unsupervised learning of grammar and punctuation, and can be more efficient than word-level generation, since it allows for the prediction and generation of new words and strings.
This paper presents as contributions:
A context-aware review generation based on rating scores
Generate readable reviews in a human perspective.
2. Problem Formulation
In this section, we provide the basic definition and preliminaries to generate natural language explanations. Given a set of items , and target user :
An item is a product (beer) represented by .
Explicit feedback is an action represented by the matrix , where is a user, is an item, and represents a rating that the user have been given to item . Considering the each rating is a vector corresponding to a set of five features appearance, aroma, palate, taste, and overall.
Reviews are another explicit feedback in text format represented by the matrix , where is a user, is an item, and represents a review that the user have been given to item .
2.1. Problem Statement
Ratings are attributes to express opinions from a user about a certain item, however it is difficult to compose a judgement of a product based only on the rating score. Therefore, user-generated reviews are richer, since the user can give explanations according to different features and aspects of a specific item. There are many approaches to generating explanations for different types of recommender systems, including collaborative filtering (Herlocker et al., 2000) and case-based approaches (McSherry, 2005). Explanations showed to increase the effectiveness of the recommendation and the user’s satisfaction (Tintarev and Masthoff, 2012) in various evaluations methods. Current state of the art in explainable recommender systems does not offer human-oriented explanations. To address this particular issue, our model is defined to target the problem of generating explanations in a review-oriented and natural language basis.
We formulate the item explanation generation problem as follows. Given input ratings vector , we aim to generate item explanation , maximizing the conditional probability . Note, rating is the average values from the evaluation of target item in a fixed numerical representation, while the review is considered a character sequence of variable length. We set as in our task, as we have features with different ratings values. The model learns to compute the likelihood of generated reviews given a set of input ratings. This conditional probability is represented in the Eq. 1.
3. Related Works
Neural networks have started to attract attention in recommender systems community only recently. In (Ko et al., 2016) they study recurrent neural networks in different architectures for a collaborative recommender system with experiments showing good performance. Despite good performance, this example of work suffers from the same problem as the other works that it is not explainable.
The work of (Almahairi et al., 2015) is among the first where a recommender system is utilising the review text as side information to improve the performance of recommender system and the solutions are rooted in recurrent neural networks. Our work differs from this work as we are in fact trying to generate explanations in the form of a user-generated review to improve a user’s understanding of recommended items.
What we would like to achieve however is an alignment between variables or features which lead to a recommendation of one item or another and a descriptive text where rules about the text composition are learned from the existing reviews. Therefore, we would like to achieve similar alignment as others have achieved in different domains such as text generation for images as in (Karpathy and Fei-Fei, 2017).
Learning the rules for generating the reviews can be accomplished by representing input as sentences, words or characters. In (Meng et al., 2016) and (Tang et al., 2016) they propose a tree-based neural network model for natural-language inference based on words and their context. We study character-level explanation generation to further improve the state of the art. The work of (Karpathy et al., 2015) provides the first insights into why the LSTM variant of neural networks has such good performance. Similar technique were used on (Bartoli et al., 2016), where they build on the previous work to generate product reviews in the restaurant domain.
Encoding rating vectors in the training phase allows the system to calculate the probability of the next character based on the given rating. In previous work, (Dong, Huang, Wei, Lapata, Zhou, and XuT, Dong et al.) showed an efficient method for generation of next the word in the sequence when we add an attention mechanism, showing that this idea improves performance for long sequences.
Character-level generation has shown improvement over word-level on the text generation problem using RNNs (Bartoli et al., 2016). This is because, on the character-level, the neural network can autonomously learn grammatical and punctuation rules. In (Bartoli et al., 2016) they mention the character-level RNN provides slightly worse performance than the equivalent word-based model, however it shows improvements in terms of computational cost, which grows with the size of the input and output dictionaries, an in contrast, it allows for the prediction and generation of new words and strings.
In (Lipton et al., 2015) they focus on character-level review generation and classification where the ratings are used as auxiliary information. Our work differs from both aforementioned approaches for character-level text generation in utilising richer data (ratings are used to explicit quality of a product in different features, identified as a source of user’s preference) and providing a first attempt to generate explanations with character level networks to reflect user differences and preferences.
4. Generated Explanations
4.1. Recurrent Neural Network
Recurrent neural networks (RNNs) are feed-forward networks with temporal verifying activation, processing and learning sequential data. While in the training step, given an input vector in time and the cell state of previous time step , where the input weight matrix is represented by and state weight metrices refers to , the RNNs then pass the cell state to the next time step and propose a prediction value
via a softmax layer which consists of a non-linear softmax function, as shown in Eq.2.
According to Eq. 2, if we continue feeding the same values to , the input weight matrix and state weight matrix1997) introduced Long short term memory (LSTM) cells, and was later improved by (Gers et al., 2000) using forget gates to discard some information.
LSTM is an improved version of RNNs controlled by sequential connection of gates: forget gate, input gate and output gate. When receiving an input data at time and the cell state from previous time step , those values will be concatenated together for the next computation. It will feed the forget gate initially, where it decides which information has to be discarded. There, represents the results via the forget gate in time , and refers to the weight matrix and bias, respectively. The next step for LSTM cells is to determine which information should be stored in cell state through the input gate. At the update step, means the input gate results, and are its parameters. The cell creates a candidate state through a layer. Using the candidate state with the previous cell state, forget gate results and input gate results to update the current state . Finally, the data goes to output gate, where it uses function layer to determine which part of the cell state is the output, then it multiplies with the current cell state to give as result the character with the highest probability.
4.2. Generative Concatenative Network(GCN)
Generative RNN models can be applied in many fields as most data can be represented as a sequence, especially for text generation. State weights benefits generative RNNs to generate coherent text, where one character can be fed into the network at a time step and these affect the state weights. This project builds on the generative concatenative network presented by (Lipton et al., 2015), which uses an LSTM RNN character-based generation model, adding auxiliary information according to ratings for different feature preferences.
In (Karpathy et al., 2015) they define a character-level language model given a sequence of characters as input to an LSTM neural network, calculate the probability of the next character in the sequence with a function at each time step then generate the character as output. Given a set of characters we encode all characters with 1-of-C vectors , and feed them to the recurrent network to obtain a sequence of hidden vectors as the last layer of the network . To obtain predictions for the next character in the sequence, the output goes to the top layer of a sigmoid activation function to a sequence of vectors , where and is a parameter matrix. The output vectors are interpreted as holding the probability of the next character in the sequence and the objective is to minimize the average cross-entropy loss over all targets.
In (Lipton et al., 2015) they propose to generate text, conditioned on an auxiliary input , where the input is concatenated with the character representation , as it is seen in Fig. 1. They train their network based on the concatenated information input . At training time, is a feature of the training set, while during the generation step, they define some , concatenating it with each character sampled from . They replicate the auxiliary information at each input to allow the model to focus on learning the complex interactions between the auxiliary input and the language, rather than just memorising the input. However, they consider only the overall rating or temperature for a certain item, neglecting the user’s preference in different aspects.
4.3. Context Encoder
Similar to (Karpathy et al., 2015) and (Lipton et al., 2015), our model is based on LSTM RNNs network to generate reviews. Our model adds a set of auxiliary information to each character in the context encoder module.
In our model, the context encoder module encodes the input character using one-hot encoding and concatenates a set of ratings to it, before feeding it into network as we can see in Fig.2. In our experiments, we generate a dictionary for all the characters in the corpus to record their positions, which will be used as the encoding process in the training step and for decoding in the generating step. For each character in the reviews, a one-hot vector will be generated by using its position in that dictionary. Then the one-hot vector will be concatenated with a set of auxiliary informations which relies on the review, as shown in Eq. 4. Meanwhile, in terms of the auxiliary information, our model uses a set of numeric values of the users’ ratings, which are rescaled to the range .
4.4. Generative Explanation
As mentioned previously, (Lipton et al., 2015) proposed a GCN model concatenating characters with some auxiliary information, i.e. overall rating or temperature, being able to generate some remarkable samples. It uses one piece of auxiliary information to enrich the probability to define the next character.
We propose an improvement to the concatenation process, where we consider a vector of auxiliary data, i.e a set of the ratings scores for different features of items, instead of only one dimension of auxiliary information. During the review generation our model generates distinct pieces of text tuned to the distribution of applied ratings.
A non-linear layer is used in our model to compute the probability for all characters. During the generation process the model concatenates a text, which is a symbol in each review, concatenated with a series of ratings scores to the model. Then the model passes its output to a layer, as shown in 5, where is the output of a LSTM cell, and are the weight and bias of layer, respectively.
This procedure is applied recursively and a group of characters is generated until we find the pre-defined symbol.
By using LSTM cells for character-level explainable review generation, and merging with the vector of ratings, we allow the model to learn grammar and punctuation, being more efficient than word-level models (Bartoli et al., 2016), since our model can predict and generate new words and strings. Therefore, our model generates explanations for recommender systems with a review-oriented perspective, adding improvements on the quality of the explanation text presented to to the user in the form of a review.
5.1. Parameters Definition
Empirical experiments used a customised LSTM RNN library written in Python and using Tensorflow. There are 2 hidden layers with 1024 LSTM cells per layer. During training, a wrapper mechanism is used to prevent over-fitting. Feed-in data was split by 100 batches with batch size of 128 and each batch has a sequence length of 280.
We tested our model in a sub-sample from the large real-world dataset: BeerAdvocate. The original dataset consists of approximately 1.5 million reviews retrieved from 1998 to 2011. Each review includes rating 111Original ratings were normalized with values between 0-1. in terms of five categories: appearance, aroma, palate, taste, and overall impression. Reviews include item and user ids, followed by each of these five ratings, and a plain text review. The summarised statistical information from the extracted sub-sample is shown on the Table 1.
5.3. Data Preparation
The BeerAdvocate dataset contains several beer categories, and we selected a sub-sample dataset based on just 5 categories: ”american ipa” ,”russian imperial stout” ,”american porter”,”american amber/red ale” and ”fruit/vegetable beer”. Considering some reviews are probably too short or even empty that would cause problems with training, we filter our sub-sample to include only reviews with at least 50 characters. For our experiments we concentrate on generating reviews conditioned on the size of reviews of each beer categories, we select 4k reviews of each category for our training datasets.
We first generate a dictionary for all characters, i.e. punctuation, numbers and letters, then transform each character into a one-hot vector using that dictionary. We train the network based on a sequential approach, where each review is fed into a sequence, to do so it is essential to remind the network of the and position of each of the reviews. We do this by appending and symbols, i.e. and , to each reviews for both the training and generation modules. In order to generate explanations for different ratings, we concatenate the input characters with the ratings of the review the character belongs to. In addition, we normalise the scale of the ratings to .
5.4. Evaluation Metrics
Current methods to explain recommendations do not have a natural language way to present the information to the user. Our proposed method explains the recommendation to a target user in a style of a user-generated review. To measure the quality of the presented text, we used a suite of natural language readability metrics : Automated Readability index (ARI) (Liu et al., 2015), Flesch reading ease (FRE) (Pera and Ng, 2012), Flesch-Kincaid grade level (FGL) (Pera and Ng, 2013), Gunning-Fog index (GFI) (Pera and Ng, 2012), simple measure of gobbledygook (SMOG) (Mc Laughlin, 1969), Coleman Liau index (CLI) (Pera and Ng, 2013), LIX (Anderson, 1983), and RIX(Pera and Ng, 2013). Flesch reading ease score is considered the oldest method to calculate the readability through the analysis of number of words and sentence length. An updated version of this metric is the Flesch-Kincaid grade level. The Gunning Fog index is commonly used to confirm a text can be read easily by the intended audience. The SMOG score is a improvement of Gunning Fog index, showing better accuracy overall. Automated Readability index relies on a relation of the number of characters per word. The Lix score gauges the word length by the percentage of long words.
The initial test of our explanation generation is about readability. We use 8 readability evaluation metrics as mentioned above for both generated and reference reviews.
We first select 10 reviews from our sample dataset as the reference reviews. By using the same users and items from these 10 reviews, as well as considering the different learning curve of the model in different epochs, we generated 10 reviews per epoch from the model. We then apply the readability metrics to the generated and reference reviews to evaluate the text. The readability results are shown in Fig. 3, where it is observed the generated reviews reach the same level of readability as the user reviews on all metrics after 20 epochs.
As Fig. 3 shows, the readability evaluation metrics illustrate the capacity of the model to generate reviews which are close to the user’s style of writing. We use the readability scores of the generated reviews from the final epochs and normalise to the scores obtained from the reference reviews to demonstrate the relative readability in Figure 4. This emphasises the neural network generated reviews are close in style to the human written reviews. This is determined by a broad range of readability metrics which are sensitive to different qualities of the text. It is important for our explanations that they are legible, easy to understand, and appear to be written in a recognisable style.
We established our model can generate natural language text which reaches the overall readability level of the user-generated reviews. We now investigate the different kinds of explanations that can be generated when we modify the auxiliary values at the generation stage. We are using the ratings from 5 aspects of the beers as auxiliary values, and they represent each users preferences and general ratings opinion about a target beer. We choose a user-item pair , and compute the average ratings for each feature for both user and item . The precise contribution of user/item ratings is controlled with a weighting parameter , and we demonstrate three different text samples to compare through Eq. 6. As Eq. 6 shows, controls the auxiliary values and we then generate reviews based on them. When is close to , the generated review will be more like a review that the user will write. With close to 0, the generated review will be closer to the general rating of all users for that beer. To investigate the divergence of generated reviews, we set equal to 1, 0.5, 0, which refers to the opinion of the user on general beers, the review the user might compose on that beer, and the general reviews of the beer.
According to Fig. 5, the first review () shows the opinion of the user on general beers, which have a positive sentiment overall. When we look into the last review (), the common view of that beer is in a negative sentiment, represented by negative sentence, as ”I wouldn’t recommend it”. With the generated review displays a relatively neutral attitude towards the beer.
In this paper, we propose a model to automatically generate natural language explanations for recommender systems.
Our explanations provide easily intelligible and useful reasons for a user to decide whether to purchase a certain product. This has important benefits for the field of recommender systems since these explanations can help a user to make a better decision and more quickly, as users place a high degree of trust in the reviews of others.
As our experiments with natural language readability metrics show, we were able to generate readable English text with specific characteristics that match user-generated review text.
In the future we will focus on further extensions of the automatic generation of natural language explanations in two ways: (1) personalised explanations that benefit the user’s preferences, where the explanation of the product is tailored to the users ratings, preferred aspects and expressed sentiments; (2) we will test our model in larger reviews domains such as hotels and restaurants.
Acknowledgements.This work is supported by Science Foundation Ireland through through the Insight Centre for Data Analytics under grant number SFI/12/RC/2289, and Conselho Nacional de Desenvolvimento Científico e Tecnológico - CNPq (grant# 206065/2014-0).
- Adomavicius and Tuzhilin (2015) Gediminas Adomavicius and Alexander Tuzhilin. 2015. Context-aware recommender systems. In Recommender systems handbook. Springer, 191–226.
Almahairi et al. (2015)
Amjad Almahairi, Kyle
Kastner, Kyunghyun Cho, and Aaron
Learning Distributed Representations from Reviews for Collaborative Filtering. InProceedings of the 9th ACM Conference on Recommender Systems (RecSys ’15). ACM, New York, NY, USA, 147–154.
- Anderson (1983) Jonathan Anderson. 1983. Lix and rix: Variations on a little-known readability index. Journal of Reading 26, 6 (1983), 490–496.
- Bartoli et al. (2016) A. Bartoli, A. d. Lorenzo, E. Medvet, D. Morello, and F. Tarlao. 2016. ”Best Dinner Ever!!!”: Automatic Generation of Restaurant Reviews with LSTM-RNN. In 2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI). 721–724.
- Cosley et al. (2003) Dan Cosley, Shyong K Lam, Istvan Albert, Joseph A Konstan, and John Riedl. 2003. Is seeing believing?: how recommender system interfaces affect users’ opinions. In Proceedings of the SIGCHI conference on Human factors in computing systems. ACM, 585–592.
- Dong, Huang, Wei, Lapata, Zhou, and XuT (Dong et al.) Li Dong, Shaohan Huang, Furu Wei, Mirella Lapata, Ming Zhou, and Ke XuT. Learning to Generate Product Reviews from Attributes.
- Gers et al. (2000) Felix A Gers, Jürgen Schmidhuber, and Fred Cummins. 2000. Learning to forget: Continual prediction with LSTM. Neural computation 12, 10 (2000), 2451–2471.
- He et al. (2015) Xiangnan He, Tao Chen, Min-Yen Kan, and Xiao Chen. 2015. TriRank: Review-aware Explainable Recommendation by Modeling Aspects. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management (CIKM ’15). 1661–1670.
- Herlocker et al. (2000) Jonathan L Herlocker, Joseph A Konstan, and John Riedl. 2000. Explaining collaborative filtering recommendations. In Proceedings of the 2000 ACM conference on Computer supported cooperative work. ACM, 241–250.
- Hochreiter and Schmidhuber (1997) Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735–1780.
- Karpathy and Fei-Fei (2017) Andrej Karpathy and Li Fei-Fei. 2017. Deep Visual-Semantic Alignments for Generating Image Descriptions. IEEE Trans. Pattern Anal. Mach. Intell. 39, 4 (2017), 664–676.
- Karpathy et al. (2015) Andrej Karpathy, Justin Johnson, and Fei-Fei Li. 2015. Visualizing and Understanding Recurrent Networks. CoRR abs/1506.02078 (2015). International Conference on Learning Representaions.
- Knijnenburg et al. (2012) Bart P Knijnenburg, Martijn C Willemsen, Zeno Gantner, Hakan Soncu, and Chris Newell. 2012. Explaining the user experience of recommender systems. User Modeling and User-Adapted Interaction 22, 4-5 (2012), 441–504.
et al. (2016)
Young-Jun Ko, Lucas
Maystre, and Matthias Grossglauser.
Collaborative Recurrent Neural Networks for Dynamic
Recommender Systems. In
Proceedings of The 8th Asian Conference on Machine Learning, ACML 2016, Hamilton, New Zealand, November 16-18, 2016.(JMLR Workshop and Conference Proceedings), Robert J. Durrant and Kee-Eung Kim (Eds.), Vol. 63. JMLR.org, 366–381.
- Lipton et al. (2015) Zachary Chase Lipton, Sharad Vikram, and Julian McAuley. 2015. Capturing Meaning in Product Reviews with Character-Level Generative Text Models. CoRR abs/1511.03683 (2015).
- Liu et al. (2015) Lei Liu, Georgia Koutrika, and Shanchan Wu. 2015. Learningassistant: A novel learning resource recommendation system. In Data Engineering (ICDE), 2015 IEEE 31st International Conference on. IEEE, 1424–1427.
- Mc Laughlin (1969) G Harry Mc Laughlin. 1969. SMOG grading-a new readability formula. Journal of reading 12, 8 (1969), 639–646.
- McSherry (2005) David McSherry. 2005. Explanation in recommender systems. Artificial Intelligence Review 24, 2 (2005), 179–197.
et al. (2016)
Zhao Meng, Lili Mou,
Ge Li, and Zhi Jin.
Context-Aware Tree-Based Convolutional Neural Networks for Natural Language Inference. Springer International Publishing, Cham, 515–526.
- Pera and Ng (2012) Maria Soledad Pera and Yiu-Kai Ng. 2012. BReK12: A Book Recommender for K-12 Users. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’12). 1037–1038.
- Pera and Ng (2013) Maria Soledad Pera and Yiu-Kai Ng. 2013. What to Read Next?: Making Personalized Book Recommendations for K-12 Users. In Proceedings of the 7th ACM Conference on Recommender Systems (RecSys ’13). 113–120.
- Pu et al. (2012) Pearl Pu, Li Chen, and Rong Hu. 2012. Evaluating recommender systems from the user’s perspective: survey of the state of the art. User Modeling and User-Adapted Interaction 22, 4 (2012), 317–355.
- Ren et al. (2017) Zhaochun Ren, Shangsong Liang, Piji Li, Shuaiqiang Wang, and Maarten de Rijke. 2017. Social Collaborative Viewpoint Regression with Explainable Recommendations. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining (WSDM ’17). 485–494.
- Tang et al. (2016) Jian Tang, Yifan Yang, Samuel Carton, Ming Zhang, and Qiaozhu Mei. 2016. Context-aware Natural Language Generation with Recurrent Neural Networks. CoRR abs/1611.09900 (2016).
- Tintarev and Masthoff (2007) Nava Tintarev and Judith Masthoff. 2007. Effective explanations of recommendations: user-centered design. In Proceedings of the 2007 ACM conference on Recommender systems. ACM, 153–156.
- Tintarev and Masthoff (2012) Nava Tintarev and Judith Masthoff. 2012. Evaluating the effectiveness of explanations for recommender systems. User Modeling and User-Adapted Interaction 22, 4 (2012), 399–439.
- Zhang et al. (2014) Yongfeng Zhang, Guokun Lai, Min Zhang, Yi Zhang, Yiqun Liu, and Shaoping Ma. 2014. Explicit Factor Models for Explainable Recommendation Based on Phrase-level Sentiment Analysis. In Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval (SIGIR ’14). 83–92.