Exploring the Effect of an Item's Neighborhood on its Sellability in eCommerce

08/10/2019 ∙ by Saratchandra Indrakanti, et al. ∙ ebay 0

Predicting the sale of an item is a critical problem in eCommerce search. Typically, items are independently predicted with a probability of sale for a given search query. But in a dynamic marketplace like eBay, even for a single product, there are various different factors distinguishing one item from another which can influence the purchase decision for the user. Users have to make a purchase decision by considering all of these options. Majority of the existing learning to rank algorithms model the relative relevance between labeled items only at the loss functions like pairwise or list-wise losses. But they are limited to point-wise scoring functions where items are ranked independently based on the features of the item itself. In this paper, we study the influence of an item's neighborhood to its purchase decision. Here, we consider the neighborhood as the items ranked above and below the current item in search results. By adding delta features comparing items within a neighborhood and learning a ranking model, we are able to experimentally show that the new ranker with delta features outperforms our baseline ranker in terms of Mean Reciprocal Rank (MRR). The ranking model with proposed delta features result in 3-5% improvement in MRR over the baseline model. We also study impact of different sizes for neighborhood. Experimental results show that neighborhood size 3 perform the best based on MRR with an improvement of 4-5% over the baseline model.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

A lot of research has been performed in improving the learning to rank frameworks employed in different applications like web search, eCommerce search, question answering systems, recommendation systems (Liu et al., 2009; Li, 2011). In eCommerce, given a query , a typical search system retrieves all items matching the query, ranks the items based on a ranking function and returns the top documents. The ranking function usually provides the probability of click or sale (Radlinski and Joachims, 2005; Joachims et al., 2005b).

For learning the ranking function, training data can be collected in 2 ways. One approach is to obtain human judged labels for items matching a query, to annotate a binary decision of relevant or not for a given item (Radlinski and Joachims, 2005). Second approach is to extract implicit relevance feedback based on user behavior logs (Agichtein et al., 2006; Cohen et al., 1998; Joachims, 2002). In web search as well as in eCommerce search, one of the widely used relevance feedback is clicks. But eCommerce search systems like eBay has the advantage of using more relevance feedback signals like bids, add to carts, purchases, revenue etc (Karmaker Santu et al., 2017).

The basic assumption in implicit relevance feedback is, users scan the items in top-down manner. Existing literature study the impact of items that were viewed and not clicked as negative samples in relevance feedback (Joachims, 2002). Other studies have focused on the impact of a document’s relevance based on the documents ranked above it with the focus on search result diversity (Zhu et al., 2014; Agrawal et al., 2009).

In this paper, we study the effect of the items ranked above and below a particular item on the probability of sale of in eCommerce. We study this impact in eCommerce domain since the users are evaluating the list of items in search results considering different options/selections to compare and choose the best item rather than satisfying a single informational need as in web search. To evaluate the impact, we compare the information or features of an item with the items ranked at different positions above and below the current item. These comparative features are denoted as delta features.

Our study highlights different delta features we tried on top of our current baseline model, and the improvements in offline metrics they result in. We also evaluate the effect of different neighborhood sizes used in constructing the delta features, and experimentally show that the neighborhood of an item has an impact on the item’s probability of sale through offline metrics.

The rest of the paper is organized as follows. Section 2 discusses some of the related work in the literature. In Section 3 we describe our methodology. In Section 4 we describe our datasets and experiments. We summarize our work and discuss possible future research in Section 5.

2. Related Work

Lichtenstein et. al presented some early work on how people make decisions under uncertainty in (Lichtenstein and Slovic, 1971), where the key insight is that the decisions are different when choices are presented separately vs. when they are presented together. Importance of a context (neighborhood) for a given item to its clickability has been extensively researched in the past. Previous studies of users’ clicks as implicit feedback in search found out that clicking decision on a web document is affected by both rank and other documents in the presentation (Joachims et al., 2005a), (Joachims et al., 2007). Craswell et al. (Craswell et al., 2008) introduced the cascade click model where the probability of click for a given document at a given rank is influenced by probability of click for documents at higher ranks.

Dupret et al. (Dupret and Piwowarski, 2008) introduced a new browsing behavior model, where the probability of a subsequent click for a given document is affected by a distance between that document and the most recently clicked document. The probability gets lower if the previously clicked document is further away, i.e. if a user has to scroll through numerous irrelevant documents. Our approach extends this research to model sellability of items in e-commerce search.

3. Our Approach

Our hypothesis is that whenever users make a decision to buy an item on an eCommerce platform, it is not in isolation. The decision is made by comparing the item to other items in its vicinity. Most ranking models use a single item’s features to determine the probability of sale. To understand how the neighboring items affect an item’s sellability, we define delta features that represent how an item differs from it neighboring items.

We focus on features that could be potentially distinguishing factors of an item and those that can identify user behavior. Since we want to model user behavior, these features are derived from elements users are likely to see on the search results page when making a purchase, for e.g. shipping time, product title, product price etc. We identified the set of features which users are likely to perceive while buying an item as the candidate set from which we can generate delta features.

Figure 1. Illustration of previous and next delta features constructed based on a ranked list of items. Here the neighborhood size is 2.

We experiment with three different neighborhood sizes ( size = 1, 3,5 ) to study how the influence of the delta features changes as the neighborhood size changes. For each of these candidate features , we generated two types of delta features each, namely next and prev; next represents the delta features based on the items ranked below the current item, while prev represents the delta features based on the items ranked above the current item. Fig 1 represents an example of a neighborhood of size 2. For the item , next features are calculated by comparing features of with and . Similarly, prev features are calculated by comparing features of with and . Note that neighborhood size refers to the number of items considered in computing the delta features above and below the current item. The delta features are denoted as,

where represents the neighborhood size. There are three different categories of delta features defined :

  1. Numerical Delta Features : Numerical delta features are defined as the difference between the previous/next item’s features and the current item’s features:

    For neighborhood size

    For neighborhood size,

  2. Categorical Delta Features : For categorical features with discrete values, the delta features are defined as the count of the feature occurring in the neighborhood. This can be represented as :

    If is the array containing a feature’s values, is the index of the current item, is the current item’s feature value, and is the neighborhood size:

    Note that for neighborhood size, , we try to have a stronger representation of delta features. Here, the delta features are defined as concatenation of the current item’s features and the previous/next item’s features. If a feature can take discrete values , this representation ensures that a delta feature is not treated the same as . These features are defined as:

    We compute delta feature values for neighborhood size 1 based on concatenation, but use value counts for neighborhood sizes ¿ 1. This was a conscious choice to have a stronger representation for neighborhood size 1, as it would have been reduced to a binary feature representation if we opted the value-count information, consequently losing substantial information. We haven’t used concatenation for neighborhood size ¿ 1, as the feature values can be long strings with extremely sparse values. This choice was made to avoid such an occurrence.

  3. Boolean Delta Features : For certain features, we want delta features to capture whether they’re equal to previous/next item’s features or not. Boolean features are defined as:

    For neighborhood size

    For neighborhood size

    If is the array containing a feature’s values, is the index of the current item, is the current item’s feature value, and is the neighborhood size:

4. Experiments

We build several offline ranking models with varying neighborhood sizes and selection of delta features to evaluate the incremental improvement produced by these features in the performance of the ranking models, and subsequently observe the effect of neighborhood on the likelihood of sale of an item.

4.1. Dataset, Features and Experiment Setting

We conduct our ranking experiments on a large scale dataset sampled from eBay search logs. The dataset consists of about 20000 unique search queries sampled based on user search sessions which resulted in an item’s sale, along with the ranked list of top items impressed for the query. The labels for the items in the dataset are obtained via implicit relevance feedback. In this paper, we consider the sale of an item as the target. We constructed delta features as described in Section 3. of the dataset was used for training and for validation.

We trained several learning to rank models on the dataset described above. We use the state of the art LambdaMART model (Burges, 2010) for our experiments. The baseline model, Model_Base is trained on the same dataset without any delta features. Model_Base is the production ranking model for eBay. The proposed ranking models use features from Model_Base and delta features. We train ranking models with different neighborhood sizes and different neighborhood types namely, prev and next. We experimented with 3 neighborhood sizes in this paper, . We trained three different models for each neighborhood size, :

  1. Model_Prev_Wm : Models with prev delta features, calculated based on items ranked above the current item

  2. Model_Next_Wm : Models with next delta features, calculated based on items ranked below the current item

  3. Model_Prev_Next_Wm : Models with prev and next delta features, calculated based on items ranked above and below the current item

The hyperparameters are tuned based on

Model_Base and the same parameters are used to train all the proposed ranking models with delta features.

4.2. Results

Mean reciprocal sale rank (MRR) was chosen as the metric to evaluate and compare the performance of the various models relative to the baseline model. MRR, in this case captures the first result that involves an item sale.

We trained models with both previous and next delta features constructed based on neighborhood sizes 1, 3 and 5 respectively, and compared them to the baseline model Model_Base with respect to MRR. The prev and next features which capture the neighborhood above and below an item in the ranked list of results, show significant improvements in MRR compared to the baseline model. The figures show MRR difference with respect to Model_Base and the error bars are computed using 1000 bootstrap samples of the test dataset.

Figure 2. MRR difference with respect to Model_Base for neighborhood sizes 1, 3 and 5 using prev features.
Figure 3. MRR difference with respect to Model_Base for neighborhood sizes 1, 3 and 5 using both prev_next features.

First, we used only prev features constructed based on neighborhood sizes 1, 3 and 5 in addition to baseline features. prev features lead to MRR improvements as can be seen from Fig 2, with neighborhood size 3 outperforming others. Similarly Fig 4 shows the relative MRR improvements when only next features constructed based on neighborhood sizes 1, 3 and 5 in addition to baseline features. Neighborhood size 3 leads to the most significant improvements in MRR. Further, varying neighborhood sizes has a measurable effect on MRR, indicating that the choice of neighborhood size is an important decision. Lastly, by combining prev and next features on top of the baseline features also resulted in significant improvements in MRR with neighborhood size 3, performing the best as shown in Fig 3.

The percentage gains in MRR resulting from each of the models relative to Model_Base is tabulated in Table 1. As evident from the table, using prev_next features constructed using a neighborhood size, 3, results in improvement in MRR, thereby showcasing that both items ranked above and below together have an influence on an item’s sellability.

Neighborhood size prev next prev_next
1 -1.32 0.07 1.81
3 4.65 4.45 5.01
5 3.05 3.55 4.52
Table 1. Percentage change in MRR relative to Model_Base resulting from the various models.
Figure 4. MRR difference with respect to Model_Base for neighborhood sizes 1, 3 and 5 using next features.
Figure 5. MRR difference with respect to Model_Base for neighborhood size 3 using prev, next, and prev_next features.

Since neighborhood size 3 resulted in the most observable MRR improvements, we compared prev, next, and prev_next models trained on delta features constructed with neighborhood size 3 in addition to the baseline features. From Fig 5 we can observe that while both prev and next models lead to improvements, prev_next models have the most pronounced MRR gains, indicating that the neighborhood of an item does influence its sellability in a measurable way.

5. Summary and Future Work

In this work, we have presented our approach of understanding the impact of neighborhood of an item on its sellability by creating delta features that capture how a given item differs from those in its neighborhood in terms of attributes that can be perceived by the user on a search result page. Different combinations of delta features including different neighborhood sizes are created on top our baseline ranker. We have applied these features to a large scale commercial search engine (eBay) and experimentally verified significant improvements on offline metrics. In addition, we experimentally show that the choice of the size of neighborhood influences the performance of these features. As a next step, we plan to incorporate the idea of neighborhood and delta features into the scoring function of the ranking models. This would require designing efficient methods to determine the placement of a candidate item based on its potential neighbors, in contrast to an independent decision.

Acknowledgements.
We would like to thank Alex Cozzi for the insightful discussions and valuable guidance he provided during the course of this work.

References

  • (1)
  • Agichtein et al. (2006) Eugene Agichtein, Eric Brill, Susan Dumais, and Robert Ragno. 2006. Learning user interaction models for predicting web search result preferences. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 3–10.
  • Agrawal et al. (2009) Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, and Samuel Ieong. 2009. Diversifying Search Results. In Proceedings of the Second ACM International Conference on Web Search and Data Mining (WSDM ’09). ACM, New York, NY, USA, 5–14. https://doi.org/10.1145/1498759.1498766
  • Burges (2010) Christopher JC Burges. 2010. From ranknet to lambdarank to lambdamart: An overview. Learning 11, 23-581 (2010), 81.
  • Cao et al. (2007) Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li. 2007. Learning to rank: from pairwise approach to listwise approach. In

    Proceedings of the 24th international conference on Machine learning

    . ACM, 129–136.
  • Cohen et al. (1998) William W Cohen, Robert E Schapire, and Yoram Singer. 1998. Learning to order things. In Advances in Neural Information Processing Systems. 451–457.
  • Craswell (2009) Nick Craswell. 2009. Mean Reciprocal Rank. Springer US, Boston, MA, 1703–1703. https://doi.org/10.1007/978-0-387-39940-9_488
  • Craswell et al. (2008) Nick Craswell, Onno Zoeter, Michael Taylor, and Bill Ramsey. 2008. An Experimental Comparison of Click Position-Bias Models. In Proceedings of the 2008 International Conference on Web Search and Data Mining. 87–94.
  • Dupret and Piwowarski (2008) Georges Dupret and Benjamin Piwowarski. 2008. A User Browsing Model to Predict Search Engine Click Data from Past Observations. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. 331–338.
  • Joachims (2002) Thorsten Joachims. 2002. Optimizing Search Engines Using Clickthrough Data. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’02). ACM, New York, NY, USA, 133–142. https://doi.org/10.1145/775047.775067
  • Joachims et al. (2005a) Thorsten Joachims, Laura Granka, Bing Pan, Helene Hembrooke, and Geri Gay. 2005a. Accurately interpreting clickthrough data as implicit feedback. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval. 154–161.
  • Joachims et al. (2007) Thorsten Joachims, Laura Granka, Bing Pan, Helene Hembrooke, Filip Radlinski, and Geri Gay. 2007. Evaluating the accuracy of implicit feedback from clicks and query reformulations in web search. ACM Transactions on Information Systems 25, 2 (2007).
  • Joachims et al. (2005b) Thorsten Joachims, Laura A Granka, Bing Pan, Helene Hembrooke, and Geri Gay. 2005b. Accurately interpreting clickthrough data as implicit feedback. In Sigir, Vol. 5. 154–161.
  • Karmaker Santu et al. (2017) Shubhra Kanti Karmaker Santu, Parikshit Sondhi, and ChengXiang Zhai. 2017. On Application of Learning to Rank for E-Commerce Search. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’17). ACM, New York, NY, USA, 475–484. https://doi.org/10.1145/3077136.3080838
  • Li (2011) Hang Li. 2011. A short introduction to learning to rank. IEICE TRANSACTIONS on Information and Systems 94, 10 (2011), 1854–1862.
  • Lichtenstein and Slovic (1971) Sarah Lichtenstein and Paul Slovic. 1971. Reversals of preference between bids and choices in gambling decisions. Journal of experimental psychology 89, 1 (1971), 46.
  • Liu et al. (2009) Tie-Yan Liu et al. 2009. Learning to rank for information retrieval. Foundations and Trends® in Information Retrieval 3, 3 (2009), 225–331.
  • Radlinski and Joachims (2005) Filip Radlinski and Thorsten Joachims. 2005. Query chains: learning to rank from implicit feedback. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. ACM, 239–248.
  • Xia et al. (2008) Fen Xia, Tie-Yan Liu, Jue Wang, Wensheng Zhang, and Hang Li. 2008. Listwise approach to learning to rank: theory and algorithm. In Proceedings of the 25th international conference on Machine learning. ACM, 1192–1199.
  • Zhu et al. (2014) Yadong Zhu, Yanyan Lan, Jiafeng Guo, Xueqi Cheng, and Shuzi Niu. 2014. Learning for search result diversification. In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval. ACM, 293–302.