The E-commerce industry is growing at a very rapid rate. According to Statista report111https://www.statista.com/statistics/379046/worldwide-retail-e-commerce-sales/, it is expected to become $4.88 trillion markets by 2021. One of the biggest challenges to the e-commerce companies is to set an optimal price for all the products on the platform daily, which can overall maximize the revenue & the profitability. Specifically fashion e-commerce is very price sensitive and shopping behaviour heavily relies on discounting. Typically in a fashion e-commerce platform, starting from the product discovery to the final order, discount and the price point is the main conversion driving factor. As shown in Figure 1, the List Page & product display page (PDP) displays the product(s) with it's MRP and the discounted price. Product 1,4 is discounted, whereas product 2,3 is selling on MRP. This impacts the click-through rate (CTR) & conversion. Hence, finding an optimal price point for all the products is a critical business need which maximizes the overall revenue.
To maximize revenue, we need to predict the quantity sold of all products at any given price. To get the quantity sold for all the products, a Demand Prediction Model has been used. The model is trained on all the products based on their historical sales & browsing clickstream data. The output label is the quantity sold value, which is continuous. Hence it's a regression problem. Demand prediction has a direct impact on the overall revenue. Small changes in it can vary the overall revenue number drastically. For demand prediction, several regression models like Linear Regression, Random Forest Regressor, XGBoost(Chen and Guestrin, 2016)
, MultiLayer Perceptron have been applied. An ensemble(Qiu et al., 2014) of the models mentioned above is used since each base model was not able to learn all aspects about the nature of demand of the product and wasn't giving adequate results. Usually, the demand for a product is temporal, as it depends on historical sales. To leverage this fact, Time Series techniques like the autoregressive integrated moving model (ARIMA) (Li et al., 2018) (Contreras et al., 2003)
is used. An RNN architecture based LSTM(Long Short-Term Memory) model(Greff et al., 2016) is also used to capture the sequence of the sales to get the next day's demand. In the result section, a very detailed comparative study of these models is explained.
Another significant challenge is cannibalization among products, i.e., if we decrease the discount of a product, it can lead to an increase in sales for other products competing with it. For example, if the discount is decreased for Nike shoes, it can lead to a rise in sales for Adidas or Puma shoes since they are competing brands & the price affinity is also very similar. We overcame this problem by running the model at a category level and creating features at a brand level, which can take into account cannibalization. The second major challenge is to predict demand for new products, it's a cold start problem since we don't have any historical data for these products. To overcome this problem, we've used a Deep Learning-based model to learn the Product embedding. These embeddings are used as features in the Demand Model.
The demand prediction model generates demand for all the products for tomorrow at the base discount value. To get demand at different discount values economics concept of ”Price Elasticity of Demand” (PED) is used. Price elasticity represents how the demand for a product varies as the price changes. After applying this, we get multiple price-demand pairs for each product. We have to select a single price point for every product such that the chosen configuration maximizes the overall revenue. To solve it, the Linear Programming optimization technique (Solow, 2007) is used. Linear Programming Problem formulation, along with the constraints, is described in detail in section 3.4.
The solution was deployed in a live production environment, and several A/B Tests were conducted. During the A/B test, A set was shown the same default price, whereas the B set was shown the optimal price suggested by the model. As explained in the Result section we're getting a revenue improvement by approx 1% and gross margin improvement by 0.81%.
The rest of the paper is organized as follows. In Section 2, we briefly discuss the related work. We introduce the Methodology in Section 3 that comprises feature engineering, demand prediction model, price elasticity of demand, and linear programming optimization. In Section 4, Results and Analysis are discussed, and we conclude the paper in Section 5.
2. Related Work
In this section, we've briefly reviewed the current literature work on Price Optimization in E-commerce. In (Gupta and Pathak, 2014)
, authors are first creating customer segments using K-means clustering; then, a price range is assigned to each cluster using a regression model. Post that authors are trying to find out the customer's likelihood of buying a product using the previous outcome of user cluster & price range. The customer's likelihood is calculated by using the logistic regression model. This model gives price range at a user cluster level, whereas in typical fashion e-commerce exact price point of each product is desired.
Paper (Singh and Dutta, 2015) describes the price prediction methodology for AWS spot instances. To predict the price for next hour, they've trained a linear model by taking the last three months & previous 24 hours of historical price points to capture the temporal effect. The weights of the linear model is learned by applying a Gradient descent based approach. This price prediction model works well in this particular case since the number of instances in AWS is very limited. In contrast, in fashion e-commerce, there can be millions of products, so the methodology mentioned above was not working well in our case.
In paper(Schlosser and Boissier, 2018)
, authors studied how the sales probability of the products are affected by customer behaviors and different pricing strategies by simulating a Test Market Environment. They used different learning techniques to estimate the sales probability from observable market data. In the second step, they used data-driven dynamic pricing strategies that also took care of the market competition. They compute the prices based on the current state as well as taking future states into account. They claim that their approach works even if the number of competitors is significant.
Paper (Ye et al., 2018) authors have explained the optimal price recommendation model for the hosts at the Airbnb platform. Firstly they've applied a binary classification model to predict booking probability of each listing in the platform; after that, a regression model predicts the optimal price for each listing for a night. At last, they apply a personalization layer to generate the final price suggestion to the hosts for their property. This setup is very different from a conventional fashion e-commerce setup where we need to decide an exact price point for all the products, whereas in the former case, it's a range of price.
To address the above issues, we've proposed a novel machine learning mechanism to predict the exact price point for all the products on a fashion e-commerce platform at a daily level.
In this section, we have explained the methodology of deriving the optimal price point for all the products in the fashion e-commerce platform.
Figure 2 shows the architecture diagram to optimize the price for all the products.
Here R is the revenue of the whole platform, is the price assigned to product, and is the quantity sold at that price.
According to Equation 1, revenue is highly dependent on the quantity sold. Quantity sold is, in turn, dependent on the price of the product. Typically, as the price increases, the demand goes down and vice-versa. So there is a trade-off between cost and demand. To maximize revenue, we need demand for all the products at a given price. Therefore, a demand prediction model is required. Predicting demand for tomorrow at an individual product level is a very challenging task. For new products, there is a cold start problem which we address by using product embeddings. Another challenging task is how to capture the cannibalization of product.
3.1. Feature Engineering
To estimate the demand for all products in the platform extensive feature engineering is done. The features are broadly categorized into four categories :
Observed features: These features showcase the browsing behavior of people. They include quantity sold, list count, product display count, cart count, total inventory, etc. From these features, the model can learn about the visibility aspects of the products. These directly correlate with the demand for the product.
Engineered features: To capture the historical trend of the products, we've handcrafted multiple features. For example, the last seven days of sales, previous 7 days visibility, etc. They also capture the seasonality effects and its correlation to the demand of the product. The ratio of quantity sold at BAG level (brand, article, gender) to the total quantity sold of a product is calculated to take care of the competitiveness among substitutable products.
Sort rank: Search ranking of a product is very highly correlated with the quantity sold. Products listed in the top search result generally have higher amounts sold. The search ranking score has been used for all the live products to capture this effect in our model.
Product Embedding: From the clickstream data, the user-product interaction matrix is generated. The value of an element in this matrix signifies the implicit score (click, order count, etc.) of a user's interaction with the product. Skip-gram based model(Guthrie et al., 2006) is applied to the interaction matrix, which captures the hidden attributes of the products in the lower dimension latent space. Word2Vec (Rong, 2014) is used for the implementation purpose.
3.2. Demand Prediction Model
Demand for a product is a continuous variable, hence regression based models, sequence model LSTM, and time series model ARIMA were used to predict future demand for all the products. Since the data obeyed most of the assumptions of linear regression (the linear relationship between input and output variables, feature independence, each feature following normal distribution and homoscedasticity), linear regression became an optimal choice. Hyperparameters like learning rate (alpha) and max no. of iterations were tuned using GridSearchCV and by applying 5 fold cross-validation. The learning rate of 0.01 and the max iteration of 1000 was chosen as a result of it. Overfitting was taken care of by using a combination of L1 and L2 regularization (Elastic Net Regularization). Tree-based models were also used. They do not need to establish a functional relationship between input features and output labels, so they are useful when it comes to demand prediction. Also, trees are much better to interpret and can be used to explain the outcome to business users adequately. Various types of tree-based models like Random Forest and XGBoost were used. Hyperparameters like depth of the tree and the number of trees were tuned using the technique mentioned above. Overfitting was taken care of by post-pruning the trees. Lastly, a MLP regressor with a single hidden layer of 100 neurons, relu activation, and Adam optimizer was used. Finally, an ensemble of all the regressors was adopted. The median of the output of all the regressors was chosen as the final prediction. It was chosen because each of the individual regressors was not able to learn all aspects of the nature of the demand for the product and wasn't giving adequate results. Thus, the ensemble approach was able to capture all aspects of the data and delivered positive results. LSTM with a hidden layer was also used to capture the temporal effects of the sale. Batch-size for each epoch was chosen in such a way that it comprised of all the items of a particular day. Hyperparameters like loss function, number of neurons, type of optimization algorithm, and the number of epochs were tuned by observing the training and validation loss convergence. Mean-squared error loss, 50 neurons, Adam optimizer, and 1000 epochs were used to perform the task. Arima was also used to capture the trends and seasonality of the demand for the product. The three parameters of ARIMA : p (number of lags of quantity sold to be used as predictors), q (lagged forecast errors), and d (minimum amount of difference to make series stationery) were set using the ACF and PACF plots. The order of difference (d) = 2 was chosen by observing when the ACF plot approaches 0. The number of lags (p) = 1 was selected by observing the PACF plot and when it exceeds the PACF's plot significance limit. Lagged forecast errors (q) = 1 was chosen the same way as p was chosen but by looking at the ACF plot instead of PACF.
3.3. Price Elasticity of Demand
From the demand model, we get the demand for a particular product for the next day based on its base discount. To get the demand for a product at a different price point, we use the concept of price elasticity of demand (Babar et al., 2015)(Minga et al., 2003).
Figure 4 illustrates the effects of change in price on the quantity demanded of a product. When the price drops from INR 1400 to 1200, the quantity demanded of the product increases from 7 to 11. The price elasticity of demand is used to capture this phenomenon.
The formula for price elasticity of demand is :
where , are the original demand and price for the product, respectively. represents the change in demand as the price changes, and depicts the change in the product’s price.
A product is said to be highly elastic if for a small change in price there is a huge change in demand, unitary elastic if the change in price is directly proportional to change in demand and is called inelastic if there is a significant change in price but not much change in demand.
Since we are operating in a fashion e-commerce environment determination of price elasticity of demand is a challenging task. There is a wide range of products that are sold, so we experience all types of elasticity. Usually price and demand share an inverse relationship i.e., when the price decreases, the demand for that product increases and vice-versa.
But this relationship is not followed in the case of Giffen goods. Giffen goods are a type of product, for which the demand increases when there is an increase in the price. In our environment, we not only experience different types of elasticity, but we also have to deal with Giffen goods. E.g., Rolex watch is a type of Giffen good as its demand increases with an increase in price because it is used more as a status symbol.
Due to these reasons calculating the price elasticity of demand at a brand, category, or any other global level is not possible. So we have computed it for each product based on the historical price-demand pairs for that product. For new products, the elasticity of similar products determined by product embeddings were used.
Price elasticity is used to calculate the product demand at different price points. From the business side, there is a strict guardrail of max % change in the base discount, where is a user defined threshold. The value of is kept as a small positive integer since a drastic change in the discounts of the products is not desired. We have the demand value for all the products for tomorrow at the base discount by using the aforementioned demand prediction model. We also calculate two more demand values by increasing the base discount by % and decreasing by % by using the price elasticity. Thus the output after applying the elasticity concept is that we get three different price points and the corresponding demand value for all products. The price elasticity changes with time, so its value is updated daily.
3.4. Linear Programming
After applying the price elasticity of demand, we get three different price points for a product and the corresponding demand at those price points.
Now we need to choose one of these three prices such that the net revenue is maximized. Given different price points and products, there will be permutations in total. Thus, the problem boils down to an optimization problem, and we formulate the following integer problem formulation: (Gomory and Baumol, 1960).
Subject To the constraints
In the above equations denotes the price and denotes the demand for product if price-demand pair is chosen. Here, acts like a mask that is used to select a particular price for a product. is a boolean variable (0 or 1) where 1 denotes that we select the price and 0 that we do not select it. The constraint = 1 makes sure that we select only 1 price for a product and = c makes sure that sum of all the selected prices is equal to c, where c is user-defined variable. C can be varied from its minimum value that is obtained when the least price for each product is chosen to its maximum value that is obtained when the highest price for each product is chosen. Somewhere in between, we get the optimal solution.
The above formulation is computationally intractable since there can be millions of permutations of price-demand pairs. Thus we convert the above integer programming problem to a linear programming problem by treating as a continuous variable. This technique is called linear relaxation. The solution obtained from this problem acts like a linear bound for the integer formulation problem. Here, for the product the price corresponding to maximum value of is chosen as the optimal price point. The linear programming problem (Dempster and Hutton, 1999) thus obtained is :
Subject To the constraints
We implement the above linear programming problem using Scipy linear programming library routine linprog. The input is given in the vector form as follows:
Subject To A.x = b
Where is a vector that contains the revenue contributed by a product at each price point, is a matrix that contains information about the products and their prices.
As shown in Figure 5, a set of 3 columns represents a product. In this set of 3 columns, the first column represents value corresponding to (discount-%), middle column to base discount, last column to (discount+%). The last row of contains different price points according to their updated discount value & product. vector contains the upper bound of the constraints. All entries in are 1 except the last one, which is equal to c. Here 1 represents that only one price can be selected for each product, and c indicates the summation corresponding to the chosen prices.
4. Results & Analysis
In this section, we have discussed the data sources used to collect the data, followed by an analysis of the data and the insights derived from it. Different kinds of evaluation metrics used to check the accuracy are also discussed along with the comparative study of different models used.
4.1. Data Sources
Data was collected from the following sources :
Clickstream data: this contained all user activity such as clicks, carts, orders, etc.
Product Catalog: this contained details of a product like brand, color, price, and other attributes related to the product.
Price data: this contained the price and the quantity sold of a product at hour level granularity.
Sort Rank: this contained search rank and the corresponding scores for all the live products on the platform.
4.2. Analysis of Data
According to the analysis, 20% of the products generate 80% of the revenue. Hence, it is required to price these products precisely. This is the biggest challenge when it comes to demand prediction.
Figure 6 shows the distribution of quantity sold for all the products on the platform. It is evident that a minority of products contribute towards the total revenue.
Figure 7 shows the elasticity distribution of the products on the platform. The y-axis of the graph depicts the density rather than the actual count frequency of the elasticity value. As discussed in Section 3.3, the determination of elasticity is a difficult task. All kinds of elasticity are experienced, and hence elasticity is calculated at a product level. It is evident from the figure that the range of elasticity is from -5 to +5. -5 indicates that if the price is dropped, then demand increases drastically, and +5 suggests that if the price is increased, then demand also increases by a large margin. It can be observed that most of the time, elasticity value is 0; this is so because 80% of the quantity of the product sold is 0. Lastly, it can be concluded that most of the product’s elasticity lies between -1 to +1. Hence, most of the products are relatively inelastic, but there are few products whose elasticity is very high and are at extreme ends, so it has to be calculated accurately.
4.3. Evaluation Metric
To evaluate the demand prediction models, mean absolute error & root mean squared error is used as a metric. They both tell us the idea of how much the predicted output is deviating from the actual label. For this scenario, coefficient of determination i.e., or adjusted , is not a good measure as it involves the mean of the actual label. Since 80% of the time actual label is 0 mean cannot accommodate it, and hence cannot judge the demand model's performance.
In this section results of different models are discussed in table 1. Majorly three different classes of the model were tested and compared. They include regressors, LSTM, and ARIMA. mae and rmse were chosen as the evaluation metric to compare these models. The reason for this choice is discussed in section 4.3. All the models were trained on historical data of the past three months, comprising of over a million records. For the test data, the recent single day records were used. First, different kinds of regressors were used; out of all the regressors, XGBoost gave the best result. It is so because it’s tough to establish a functional relationship between input features and output demand. As XGBoost does not need to establish any functional relationship, it performed better. However, all individual regressors were not able to capture all aspects of the data, so an ensemble of all the regressors gave the ideal result. The ensemble takes advantage of all regressors and hence is chosen. Then, after regressors, LSTM & ARIMA were also tried, but it failed to give the desired result as due to multiple external factors in the business, the demand data for the products did not exhibit sequential & temporal characteristics.
|Random Forest Regressor||0.219||0.854|
4.5. Experimental Design
In this section, we describe live experiments that were performed on one of the largest fashion e-commerce platforms.
The experiments were run on around two hundred thousand styles spread across five days.
To test the hypothesis that model recommended prices are better than baseline prices, two user groups were created:
Set-A (Control group) was shown the baseline prices
Set-B (Treatment group) was shown model recommended prices.
These sets were created using a random assignment from the set of live users. 50% of the total live users were assigned to Set A and the rest to Set B.
Users in Control and Treatment groups were exposed to the same product. But the products were priced differently according to the group they belong to. Both groups were compared with respect to the overall platform revenue and gross margin. Revenue is already defined and explained in section 3, whereas gross margin can be defined as follows :
In general, the gross margin in our scenario is pure profit bottom line.
|Percentage increment in Revenue||GM % Uplift|
In table 2, the first three instances correspond to the business unit - Men's Jeans and Streetwear, while the last 2 are of Women's Western wear and Eyewear respectively.
In the case of Men's Jeans and Streetwear, there is a steady increase in both revenue and gross margin of the whole platform. From this, it can be concluded that the model recommended price gave positive results for this business unit. In the case of Women's Western wear, there was a massive lift in revenue, but the gross margin fell. The reason for this is that women's products were highly elastic. Small changes in price had a significant impact on demand. So due to the model’s recommended price, there was a vast fluctuation in demand (i.e., demand for products increased), and the revenue thus increased. However, the gross margin was slightly impacted, and it decreased.
Eyewear is a small business unit, so the impact of discount change is enormous, i.e., 7.05% increment in the revenue, whereas the GM increased by 0.15%.
Overall there was an approximately 1% increase in revenue of the platform and 0.81% uplift in gross margin due to model recommended prices. These is computed by taking the average of the first three instances since they belong to the same business unit.
Fashion e-tailers currently find it extremely hard to decide an optimal price & discounting for all the products on a daily basis, which can maximize the overall net revenue & the profitability. To solve this, we have proposed a novel method comprising three major components. First, a demand prediction model is used to get an accurate estimate of tomorrow's demand. Then, the concept of price elasticity of demand is used to get the demand for a product at multiple price points. Finally, a linear programming optimization technique is used to select one price point for each product, which maximizes the overall revenue. Online A/B test experiments show that by deploying the model revenue and gross margin increases. Right now, the prices are decided and updated daily, but in the future, we would like to extend our work so that we can also capture the intra-day signals in our model & accordingly do the intra-day dynamic pricing.
- The development of demand elasticity model for demand response in the retail market environment. In 2015 IEEE Eindhoven PowerTech, pp. 1–6. Cited by: §3.3.
- Xgboost: a scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pp. 785–794. Cited by: §1.
- ARIMA models to predict next-day electricity prices. IEEE transactions on power systems 18 (3), pp. 1014–1020. Cited by: §1.
- Pricing american stock options by linear programming. Mathematical Finance 9 (3), pp. 229–254. Cited by: §3.4.
- Integer programming and pricing. Econometrica: Journal of the Econometric Society, pp. 521–550. Cited by: §3.4.
LSTM: a search space odyssey.
IEEE transactions on neural networks and learning systems28 (10), pp. 2222–2232. Cited by: §1.
- A machine learning framework for predicting purchase by online customers based on dynamic pricing. In Complex Adaptive Systems, Cited by: §2.
- A closer look at skip-gram modelling.. In LREC, pp. 1222–1225. Cited by: 4th item.
- Forecasting of chinese e-commerce sales: an empirical comparison of arima, nonlinear autoregressive neural network, and a combined arima-narnn model. Mathematical Problems in Engineering 2018. Cited by: §1.
- Dynamic pricing: ecommerce-oriented price setting algorithm. In Proceedings of the 2003 International Conference on Machine Learning and Cybernetics (IEEE Cat. No. 03EX693), Vol. 2, pp. 893–898. Cited by: §3.3.
- Ensemble deep learning for regression and time series forecasting. In 2014 IEEE symposium on computational intelligence in ensemble learning (CIEL), pp. 1–6. Cited by: §1.
- Word2vec parameter learning explained. arXiv preprint arXiv:1411.2738. Cited by: 4th item.
- Dynamic pricing under competition on online marketplaces: a data-driven approach. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 705–714. Cited by: §2.
- Dynamic price prediction for amazon spot instances. 2015 48th Hawaii International Conference on System Sciences, pp. 1513–1520. Cited by: §2.
- Linear and nonlinear programming. Wiley Encyclopedia of Computer Science and Engineering. Cited by: §1.
- Customized regression model for airbnb dynamic pricing. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 932–940. Cited by: §2.