How Many Customers Does a Retail Chain Have?

04/23/2019 ∙ by Ondřej Sokol, et al. ∙ University of Economics, Prague (Vysoká škola ekonomická v Praze) 0

The knowledge of the number of customers is the pillar of retail business analytics. In our setting, we assume that a portion of customers is monitored and easily counted due to the loyalty program while the rest is not monitored. The behavior of customers in both groups may significantly differ making the estimation of the number of unmonitored customers a non-trivial task. We identify shopping patterns of several customer segments which allows us to estimate the distribution of customers without the loyalty card. For this purpose, we utilize the least squares and maximum likelihood methods. In the case of prior knowledge of the customer distribution, we utilize the maximum a posteriori method. In a simulation study, we find that the least squares estimator is the most robust method. In an empirical study of a drugstore chain, we illustrate the applicability of the proposed approach in practice. The actual number of customers estimated by the proposed method is 1.28 times higher than the number suggested by the naive estimate assuming the constant customer distribution. The proposed method can also be utilized to determine penetration of the loyalty program in the individual customer segments.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Retail transaction data contain information about the composition of the shopping basket, the price of the purchased goods and possibly the ID of customers. Analysis of such data is an important aspect of retail business analytics as it provides a valuable insight into the customer behavior. For an overview of marketing and retail business analytics and its impact, see Lilien et al. (2013), Germann et al. (2014), Roberts et al. (2014), Bradlow et al. (2017), Grewal et al. (2017), Kumar et al. (2017) and France and Ghose (2019).

The most basic question in retail analytics is how many customers does a retail chain actually have. In some cases, customers are monitored through the loyalty program. In other cases, customers can visit the store repeatedly without any possibility of identifying them. In our setting, we assume that only a portion of customers is monitored. Clearly, the number of customers with the loyalty card is known while the number of customers without the loyalty card is unknown. With absolute certainty, we can say that there is at least one customer without the loyalty card – a single customer could theoretically purchase all unmonitored shopping baskets. We also know that there are at most as many customers without the loyalty card as unmonitored receipts – each customer has to visit the store at least once. To be able to give a more specific estimate, we need to adopt further assumptions about customer behavior. One may assume that the number of customers is proportional to the number of sales, i.e. customers with the loyalty card shop as often as customers without the loyalty card. In that case, the number of customers can be estimated simply by the number of sales divided by the average frequency of customer visits. However, this assumption is too strict and unrealistic as customers without the loyalty card are likely to visit the store less frequently. The question is how less frequently. Our paper deals with estimation of the number of customers without the loyalty card under less restrictive and more realistic assumptions.

The knowledge of the number and structure of customers without the loyalty card finds its use in marketing departments involved in the retail decision-making process. It can be used to improve mass marketing communications, especially customer targeting and promotional sales planning. The number of customers in certain segment can also be used as an input in prediction of demand for products during the promotion sales as each segment has different shopping behavior. The proper prediction then allows to minimize sold-out situations. All of the above mentioned applications play an important role in increasing sales and profits for the company.

In the literature, counting of customers is most often approached in the context of active customers that are monitored. Schmittlein et al. (1987) proposed the Pareto/NBD Model

to determine the probability that a customer with a given pattern of transactions is still active. This model was further studied and extended by

Fader et al. (2005), Batislam et al. (2007), Jerath et al. (2011), Abe (2009), Glady et al. (2009), Ma and Büschken (2011) and Mzoughia et al. (2018). Schumacher (2006) dealt with faux-new customers who seem to be new customers due to the absence of past transaction history but are actually regular customers. In contrast, we count customers that are never monitored.

Transaction data are often used for clustering of customers, products and baskets. Customer segements were determined by Tsai and Chiu (2004), Konuş et al. (2008), Putra et al. (2012), Lingras et al. (2014), Ammar et al. (2016), Peker et al. (2017) and Sokol and Holý (2018), product categories by Zhang et al. (2007), Lingras et al. (2014), Ammar et al. (2016) and Holý et al. (2017) and basket types by Decker and Monien (2003), Reutterer et al. (2006), Sarantopoulos et al. (2016), Griva et al. (2018) and Sokol and Holý (2018). We utilize clustering of customers as well as baskets.

Our procedure consists of the following steps. First, we determine basket types based on the value, price level and diversity of the products in the basket. Second, we determine customer segments based on the bought baskets using history of transactions linked to loyalty cards. For each customer segment, we have the average ratios of bought basket types and the average shopping frequency. Third, we estimate the distribution of customer segments using the observed distribution of basket types not linked to loyalty cards. For this purpose, we utilize the least squares (LS) estimator and the maximum likelihood (ML) estimator. In the case of the apriori information, the maximum a posteriori (MAP) estimator can be utilized as well. Finally, we estimate the number of customers from the distribution of customer segments and average shopping frequencies.

In a simulation study, we compare the LS and ML estimators with the naive estimator that assumes the distribution of customer segments is the same for customers with and without the loyalty card. We find that if the distribution differs but the behavior within the customer segments is consistent, the proposed methods significantly outperforms the naive approach. Overall, the performance of the LS estimator is quite similar to the ML estimator. However, the ML estimator is slightly better when the assumptions are fulfilled while the LS estimator is more robust when the assumptions are violated. For this reason, we recommend to use the LS estimator in practice.

In an empirical study, we study behavior of customers in a drugstore chain. We find that the actual total number of customers without the loyalty card is 1.28 times higher than the number suggested by the naive estimate assuming the same distribution of customer segments for members and non-members of the loyalty program. As expected, we find that the distribution of customers without the loyalty card includes much more casual customers who visit the store rarely and purchase smaller baskets. We identify that 14.2% of customers without the loyalty card focus on high-end products in small baskets. These customers have very high value and acquiring them to the loyalty program should be the main interest for the company. On the other hand, 81.5% of regular customers are members of the loyalty program which supports hypothesis that the loyalty card is very popular among regular customers and the loyalty program may be near the point of saturation.

The rest of the paper is structured as follows. In Section 2, we propose a procedure for determining the number and distribution of unmonitored customers. In Section 3, we examine the behavior of the proposed method using simulations. In Section 4, we illustrate the applicability of the proposed method in practice. We conclude the paper in Section 5.

2 Methodology

2.1 Stochastic Framework

We introduce our probabilistic framework and notation. Let denote the number of transactions. Further, let there be basket types and customer segments. Each transaction has a single basket type and a customer segment. Let

be a random variable denoting the basket type of transaction

and a random variable denoting its customer segment. The random number of transactions with basket type is then , while the random number of transactions with customer segment is , , where denotes the indicator function. The random number of transactions with basket type and customer segment is , , . Clearly, we have

(1)

Using a vector and matrix notation, we have

, , , and . We denote the observed basket types and the observed customer segments. We further denote the observed numbers of transactions with given basket type, the observed numbers of transactions with given customer segment and the observed numbers of transactions with given basket type and customer segment. We assume that random variables are independent and identically distributed with the probability of basket type denoted as , . Similarly, we assume that are independent and identically distributed with the probability of customer segment denoted as , . Finally, the probability of basket type conditional on customer segment is denoted as , , . Using a vector and matrix notation, we have , and

. Using the law of total probability, we have

(2)

In matrix notation, we simply have . This is the key equation used in the proposed approach.

First, let us assume that we observe both basket types and customer segments . In this case, we can simply estimate the probability of basket type as , , the probability of customer segment as , and the probability of basket type conditional on customer segment as , , .

Next, let us assume that we observe only basket types . In this case, we can estimate in the same way but cannot estimate nor . However, if we additionally assume that the matrix of conditional probabilities is known, we are able to estimate the customer segment distribution . In the rest of this section, we propose several estimators of under this setting.

2.2 Matrix of Conditional Probabilities

Our goal is to estimate the distribution of customer segments using the observed vector of basket types and the given matrix of conditional probabilities . Note that the matrix has rows and columns. In the estimation, we utilize the equation , where is the estimated distribution of basket types. There are three distinctive cases based on the structure of the matrix .

  1. [label=()]

  2. If the matrix has linearly dependent columns, the estimation faces identifiability issues. Specifically, if we further assume linearly independent rows, the vector of customer probabilities is not identifiable as the equation with variable has infinite number of solutions. Note that a matrix with has always linearly dependent columns. In practice, we recommend to avoid this case by finding appropriate dimension of basket types and customer segments.

  3. If the matrix has linearly independent columns and is square, i.e. , the estimation is quite simple. In this case, we can invert the matrix and obtain the straightforward estimator . However, finding a meaningful basket types and customer segments satisfying the independence and square restriction can be quite difficult. Therefore, this case is theoretically ideal but unlikely to occur in practice.

  4. If the matrix has linearly independent columns and is tall, i.e. , the estimation is more complex. In this case the equation can be inconsistent, i.e. without a solution for . However, we can find suitable by minimizing an error between and . For this purpose, we utilize the least squares, maximum likelihood and maximum a posteriori methods. As this case is the most realistic, we devote to it the rest of this section.

The three cases are illustrated in Figure 1. We discuss how to obtain the matrix in practice in Section 2.7.

Figure 1: Illustration of the three possible structures of the conditional probability matrix .

2.3 Least Squares Estimation

Let us assume that we observe the vector of basket types . The total number of transactions is denoted as while the vector of the numbers of transactions with given basket type is denoted as . Let us further assume that we known the conditional probability matrix and it has linearly independent columns with . The least squares (LS) method minimizes the squared error given by

(3)

where . The estimates are found by solving the quadratic optimization problem

(4)
s.t.

2.4 Maximum Likelihood Estimation

Let us assume the same setting as in Section 2.3. The maximum likelihood (ML) method maximizes the likelihood function or, equivalently, the logarithm of the likelihood function. The logarithmic likelihood function is given by

(5)

where the second equality holds as are independent and the third equality holds as are identically distributed. Using the law of total probability, we have

(6)

The logarithmic likelihood function is then

(7)

The estimates are found by solving the nonlinear optimization problem

(8)
s.t.

Note that this is equivalent to minimizing the Kullback–Leibler divergence from

to

(9)

where . The Kullback–Leibler divergence also known as relative entropy

measures how one probability distribution differs from a second probability distribution.

2.5 Maximum a Posteriori Estimation

If we have prior knowledge about the distribution of customers, e.g. an expert opinion or a previous study, we can adopt the Bayesian approach. For a review of the use of Bayesian methods in marketing, see Rossi and Allenby (2003).

Let us assume the same setting as in Section 2.3

. In Bayesian statistics, the

maximum a posteriori (MAP)

method maximizes the posterior probability density. Using Bayes’ theorem, it is given by

(10)

where

is the prior probability density of random parameter

. Equivalently, we can maximize the function

(11)

which is the logarithm of the posterior probability density omitting the constant divisor. Note that the first term is the logarithmic likelihood function . As the prior distribution, we assume the Dirichlet distribution with density

(12)

where is the vector of concentration parameters and denotes the multivariate beta function. The vector

captures our prior knowledge. This is the most common choice for the prior distribution of the categorical variable as it is its conjugate prior distribution. Finally, we have

(13)

For a given , the estimates are found by solving the nonlinear optimization problem

(14)
s.t.

This can be viewed as a regularized version of the maximum likelihood method with apriori knowledge of distribution of parameters .

2.6 Number of Customers

Finally, we estimate the number of customers. Suppose that we have the vector containing the estimated probabilities of the customer segments of a transaction. Let us assume that we also know the vector containing the average frequencies of visits by a customer in a given customer segment. We discuss how to obtain the vector in practice in Section 2.7. If we observe transactions, the number of transactions with customer segment is estimated as for . The average number of customers in customer segment is estimated as for . The average total number of customers is estimated as

(15)

The probability of a customer belonging to customer segment is then estimated as

(16)

Note that denotes the estimated distribution of the customer segments of a transaction while denotes the estimated distribution of the customer segments of a customer.

2.7 Observed Samples

Let us remind the assumptions of our proposed approach for the estimation of the average total number of customers .

  1. [label=()]

  2. The vector of basket types is observed.

  3. The conditional probability matrix is known and has linearly independent columns.

  4. The frequency vector is known.

The question is how can and be obtained in practice. We consider that we observe two samples of transactions. In the monitored sample, the transactions are linked to loyalty cards and we observe basket types denoted as and also customer segments denoted as . The unmonitored sample contains transactions by customers without the loyalty card and we observe only basket types . So far, we have focused on the unmonitored sample as the number of customers is known in the monitored sample. However, we can use transactions from the monitored sample to estimate the conditional probability matrix and the frequency vector . This can be done straightforwardly as we have the complete knowledge of customers in the monitored sample. We then assume that and . In the context of the monitored and unmonitored samples, the assumptions (b) and (c) can be reformulated in the following way.

  1. The conditional probability matrix is equal to the conditional probability matrix of the monitored sample and has linearly independent columns.

  2. The frequency vector is equal to the frequency vector of the monitored sample.

In other words, we assume that the behavior of customers in a given customer segment is the same with and without the loyalty card while the distribution of customers segments itself can differ. This setting is quite realistic as we discuss in the empirical study in Section 4.

3 Simulation Study

3.1 Setup for Simulations

We investigate the finite-sample properties of the proposed approach using simulations. In the simulation study, we choose and to reflect the basket types and customer segments used in the empirical study in Section 4. In each simulation, we randomly generate the vector of customer segment probabilities of a transaction, the matrix of conditional probabilities and the vector of frequencies for the monitored and unmonitored samples. Therefore, we consider them to be random variables in the following text and denote them , , , , and respectively. Next, we describe the random generation in more detail.

The vector of customer segment probabilities of a transaction in the monitored sample is generated from the Dirichlet distribution with concentration parameters equal to 1 for all customer segments. The vector in the unmonitored sample is the weighted average of the vector and a random variable independently generated in the same way. Specifically,

(17)

where is the weight determining the change in the samples and .

The rows of the matrix of conditional probabilities in the monitored sample are generated from the Dirichlet distribution with concentration parameters equal to 1 for all basket types and customer segments. The rows of the matrix in the unmonitored sample are the weighted average of rows of the matrix and random variables independently generated in the same way. Additionaly, the first row in both and is the weighted average of its original value and the mean of the other rows. Specifically,

(18)

where is the weight determining the change in the samples, is the weight determining linear dependence in columns and .

The vector of frequencies in the monitored sample is generated from the Dirichlet distribution with concentration parameters equal to 2 for all customer segments and multiplied by 36. The average frequency is therefore equal to 3 as we have 12 basket types. The vector in the unmonitored sample is the weighted average of the vector and a random variable independently generated in the same way. Specifically,

(19)

where is the weight determining the change in the samples and .

3.2 Evaluation Criterion

Suppose that we run simulations times. Let denote the true number of customers and its estimate in simulation , . For the evaluation of the accuracy of the estimates, we use the mean absolute percentage error (MAPE) given by

(20)

The estimates with lower MAPE are preferred. McCarthy et al. (2006) and Fildes and Goodwin (2007) surveyed forecasting practices in management and found that the MAPE is the most commonly used accuracy measure.

3.3 Comparison of Methods

We compare the LS and ML estimators together with the naive estimator assuming the same distribution for customers in the monitored and unmonitored samples. We do not consider the Bayesian MAP estimator as we do not deal with any prior information in the simulation study. The simulation model with parameters , , and allows us to comprehensively investigate behavior of the estimation of the number of customers under various settings. Especially, we are able to simulate violation of the individual assumptions.

First, we focus on differences between the LS and ML estimators. We can see in Table 1 that estimation errors are very similar in both methods. In scenarios (i) and (ii) where all assumptions are fulfilled, the ML estimator has slightly lower error. In scenarios (v) and (viii) where the frequency vector changes and other assumptions are fulfilled, both estimators have the same errors. In scenarios (iii), (vi), (vii), (ix) and (x) where the conditional probability matrix changes, the LS estimator has lower error. The only exception is scenario (iv) in which the LS estimator is worse. In scenario (xi) where the conditional probability matrix has linearly dependent columns, the LS estimator has again lower error. Overall, the ML estimator is more precise in the theoretical case of ideal conditions. In practice, however, the LS estimator is more suitable as some degree of violation of the assumptions can always be expected. Therefore, we focus on the LS estimator in the rest of the paper.

Next, we focus on differences between the LS and naive estimators. In scenario (i) where the distribution of customers is the same for the monitored and unmonitored sample, the naive method outperforms the LS method. This case is, however, quite unrealistic. When we introduce some change in the customer distribution, the naive estimator become inconsistent while the LS method remains quite accurate. We can see this behavior in Figure 2. Further, we investigate the case of different distributions for the monitored and unmonitored samples. In Figure 3, we change the frequency vector. Both methods react very negatively to this change. The LS estimator, however, has always lower error. In figures 4 and 5, we modify the conditional probability matrix. This does not influence the naive estimator as it is based only on the customer probability vector and the frequency vector. The LS estimator is better choice even for quite high change in the conditional probability matrix. When the matrix changes completely, however, the naive estimator has lower error. In the case of the matrix with numerically or even algebraically linearly dependent columns, the error of the LS approach increases. However, the LS estimator is still preferred before the naive estimator. Table 1 offers a view when multiple assumptions are violated and comes to the same conclusions. In scenarios (iv), (vii) and (x) where the conditional probability matrix completely changes, the naive estimator has lower error. In other cases, i.e. scenarios (ii), (iii), (v), (vi), (viii), (ix) and (xi), the LS estimator outperforms the naive approach. Overall, the LS estimator is much better choice than the naive estimator unless its assumptions are completely violated.

Scenario Method
Naive LS ML
(i) 0.00 0.00 0.00 0.00 0.03 0.14 0.13
(ii) 1.00 0.00 0.00 0.00 34.58 0.14 0.13
(iii) 1.00 0.00 0.25 0.00 34.58 11.11 11.22
(iv) 1.00 0.00 1.00 0.00 34.58 36.20 36.12
(v) 1.00 0.25 0.00 0.00 51.21 32.41 32.41
(vi) 1.00 0.25 0.25 0.00 51.21 35.69 35.77
(vii) 1.00 0.25 1.00 0.00 51.21 53.46 53.48
(viii) 1.00 1.00 0.00 0.00 58.53 58.52 58.52
(ix) 1.00 1.00 0.25 0.00 58.53 56.96 56.99
(x) 1.00 1.00 1.00 0.00 58.53 60.66 60.72
(xi) 1.00 0.00 0.00 1.00 34.58 10.82 10.86
Table 1: The MAPE of the Naive, LS and ML estimates using 50 000 simulations under various scenarios.
Figure 2: The MAPE of the Naive and LS estimates using 10 000 simulations based on the model with , , and various .
Figure 3: The MAPE of the Naive and LS estimates using 10 000 simulations based on the model with , , and various .
Figure 4: The MAPE of the Naive and LS estimates using 10 000 simulations based on the model with , , and various .
Figure 5: The MAPE of the Naive and LS estimates using 10 000 simulations based on the model with , , , and various .

4 Empirical Study

4.1 Motivation

The loyalty program of the studied Czech retail chain is well-known in the Czech Republic as it is frequently advertised not only in stores but in commercials in TV, radio and newspaper as well. The loyalty program brings a lot of benefits to its members, such as an extra discount for members or gifts for buying specific products. Even more, customers can collect bonus points for spending and then exchange them for 10% discount on the whole basket. These loyalty perks resulted into high popularity of the loyalty program among the customers and nowadays the majority of receipts are linked to a specific customer with the loyalty card.

The knowledge of the number of customers who are not members of the loyalty program is crucial for various marketing strategies. Among the management of the company it is believed that the loyalty program may be already saturated and the vast majority of customers who are willing to join the loyalty program are already a members. In this study, we focus on quantification of this hypothesis.

Our monitored sample consists of the loyalty club members and our unmonitored sample are customers without the loyalty card. The goal is to reliably estimate the number of unique customers in the unmonitored sample. We do not have any prior knowledge about the distribution of customers and therefore we cannot use the MAP estimator. We use the LS method in the empirical study as it gives slightly better results than the ML estimator when the assumptions are violated to some degree according to the simulation study.

4.2 Transaction Data

We use the proposed method on a sample of real transaction data. Our dataset consists of individual purchase data of one of the retail chains in the drugstore market of the Czech Republic. The retail chain sells over 10 thousand products which are divided into 55 categories. This categorization is done by an expert’s opinion based on the product properties and purpose. Each product is also assigned to one of the three price-levels – low-end, standard and high-end. We use data from the table of all transactions during 3 month time frame between September and November 2018. The dataset includes every single product sold. It also includes identification of the basket and in some cases a link to a specific customer. We remove monitored customers with extremely high frequency (more than 15 visits per month) from the dataset as they are a special type of customers and we believe that no customer without the loyalty card exhibits this behavior. The cleaned dataset consists of over 7 million receipts, with 69% linked to a specific customer through their loyalty card.

4.3 Basket Types

In order to get meaningful customer segmentation, we use two phase method involving segmentation of both baskets and customers. Particularly, in the first phase, we define basket types and in the second phase, we define customer segments based on shopping history of customers.

Basket types are determined using three dimensions – the value of the basket, price level of purchased products and the diversity of the basket. We distinguish baskets of small and large value. We also single out baskets with at least one product for children such as child wipes or diapers. The price-level segmentation of basket has two levels. If at least half of revenue is made by high-end products in basket, then the basket is marked as high-end, otherwise the basket is marked as standard. The last dimension is the diversity of the basket. Basket with products in only one category is marked as focused and basket containing at least two different categories is marked as general. The combination of these three dimensions allows us to distinguish 12 different types of baskets. The description of individual basket types is presented in Table 2.

Type Value Price Level Diversity
B01 Small value Standard Focused
B02 Small value Standard General
B03 Small value High-end Focused
B04 Small value High-end General
B05 Large value Standard Focused
B06 Large value Standard General
B07 Large value High-end Focused
B08 Large value High-end General
B09 Products for children Standard Focused
B10 Products for children Standard General
B11 Products for children High-end Focused
B12 Products for children High-end General
Table 2: Description of the basket types.

4.4 Customer Segments

We use purchase history of customers to assign each member of the loyalty program to a single customer segment. This is done by using predetermined rules on purchase history using frequency of basket types. We distinguish general customers, suppliers, customers with children, standard accidental customers and accidental customers focused on high-end products. Brief description of each segment is presented in Table 3.

The largest group among customers with the loyalty card are general customers. This is the archetype of general customer who exhibits standard behavior and moderate shopping frequency. Another large segment of customers are the suppliers. These are the customers who visit the store moderately and their basket value is very high. The customers who buy products for small children are one of the most valuable ones as their frequency is very high and their revenue is substantially above-average. However, margins of their most frequently purchased products are below standard. For this reason, we distinguish them as a special segment as the goal is not only to maximize their revenue but also redirect them to the products with higher margins. Two segments of accidental customers represent customers who visit the store rarely and focus mainly on small baskets. We distinguish two types of accidental customers by the ratio of high-end products in their basket. We show the distribution of basket types in each customer segment in Figure 6. Besides basket distribution, the customer segments also differ in shopping frequency as described in Table 3.

Segment Name Focus Frequency
C01 General Standard behavior Moderate
C02 Supplier Large basket Moderate
C03 With children Children basket High
C04 Accidental standard Small basket Low
C05 Accidental high-end Small high-End basket Low
Table 3: Description of the customers segments.
Figure 6: Distribution of the basket types in each customer segment.

4.5 Distribution of Customers

It is reasonable to assume that the distribution of members and non-members of the loyalty program is substantially different and our case study of the drugstore retail chain supports it. While general customers together with suppliers form 81.4% of the loyalty program members, non-members form only 31.5%. Similarly, customers with children make 7.5% of customers with the loyalty card and only 2.8% of customers without the loyalty card. The exactly opposite situation occurs with accidental customer share. Accidental standard customers make 9.9% of members and 51.5% of non-members while high-end accidental customers make 1.2% of members and 14.2% of non-members. The comparison between distributions of members and non-members of the loyalty program in each customer segment is shown in Figure 7.

Figure 7: Distribution of the customer segments for members and non-members of the loyalty program.

The results support the initial hypothesis that that the loyalty program is already saturated and customers who are willing to join the loyalty program are already members. The main result is the estimated distribution of the customer segments for customers without the loyalty card. Accidental customers with low frequency of visits form 65.7% of unique customers who are not members of the loyalty program. On the other hand, general customers and suppliers are under-represented in comparison to members of the loyalty program.

The differences in customer segment distributions between members and non-members of the loyalty program are crucial for estimation of unique non-members customers. The number estimated by the naive approach assuming the same distribution of members and non-members is 22.1 % lower than the number estimated by the proposed method which takes the differences into account. The naive approach estimates 739 883 customers while the proposed method estimates 949 024 customers without loyalty card. This result is in line with expectations of the management of the company.

4.6 Loyalty Program Penetration

Using our estimates, we are able to study the penetration of the loyalty program in each customer segment. The loyalty program penetration is shown in Figure 8. General customers are members of the loyalty program in 81.9% cases and suppliers in 81.0%. Those relatively high turn-in ratios support the hypothesis that the loyalty program is close to the saturation point in these segments. For customers with children, we have 81.8% participation rate which is probably caused by the recent focus of the company on attracting customers with children into the loyalty program by additional club discounts on products for children and other promotional activities. On the other hand, accidental customers are highly unlikely to participate in the loyalty program as accidental standard customers are members only in 24.7% cases. The most interesting result is that only 12.5% accidental high-end customers are members. Customers who buy high-end products are very valuable for the company as high-end products have significantly higher margins than their cheaper alternatives. Therefore, it is crucial for the company to attract them and bind them to the company’s brand. Our study signalizes a big potential for the company in this specific area.

Figure 8: Penetration of the loyalty program in each customer segment.

5 Conclusion

We propose a method for estimation of the total number of unique customers using retail transaction data. The method also estimates the number of unique customers in each customer segment. We verify the proposed method using synthetic data in the extensive simulation study. The method performs quite well even if its assumptions are violated to some point. Therefore, we expect the method to give reliable results in real applications.

In the empirical part, we use retail transaction data from a Czech drugstore’s retail chain. We quantify the initial proposition that the loyalty program is popular among regular customers and rather unpopular among accidental customers. This is something which has been done only by expert’s opinion before and therefore the estimates were contaminated by high uncertainty. The reliable quantification is the main contribution of the method. We also identify a segment of accidental high-end customers which contains only a few members of the loyalty program and can bring additional profit to the company when properly addressed. Overall, the resulting estimates are crucial not only for planning of promotional sales and other marketing techniques but may also be used in predictions of demand for specific types of products.

Acknowledgements

We would like to thank Michal Černý for his useful comments. We would also like to thank participants of the 7th International Conference on Management, Nový Smokovec, September 26–29, 2018 and 7th International Business and Management Sciences Congress, Istanbul, March 14–15, 2019 for fruitful discussions.

Funding

This work was supported by the Czech Science Foundation under Grant 19-02773S and by the Internal Grant Agency of the University of Economics, Prague under Grant F4/58/2017.

References

  • Abe (2009) Abe, M. 2009. "Counting Your Customers" One by One: A Hierarchical Bayes Extension to the Pareto/NBD Model. Marketing Science. Volume 28. Issue 3. Pages 541–553. ISSN 0732-2399. {https://doi.org/10.1287/mksc.1090.0502}.
  • Ammar et al. (2016) Ammar, A., Elouedi, Z., Lingras, P. 2016. Meta-Clustering of Possibilistically Segmented Retail Datasets. Fuzzy Sets and Systems. Volume 286. Pages 173–196. ISSN 0165-0114. {https://doi.org/10.1016/j.fss.2015.07.019}.
  • Batislam et al. (2007) Batislam, E. P., Denizel, M., Filiztekin, A. 2007. Empirical Validation and Comparison of Models for Customer Base Analysis. International Journal of Research in Marketing. Volume 24. Issue 3. Pages 201–209. ISSN 0167-8116. {https://doi.org/10.1016/j.ijresmar.2006.12.005}.
  • Bradlow et al. (2017) Bradlow, E. T., Gangwar, M., Kopalle, P., Voleti, S. 2017. The Role of Big Data and Predictive Analytics in Retailing. Journal of Retailing. Volume 93. Issue 1. Pages 79–95. ISSN 0022-4359. {https://doi.org/10.1016/j.jretai.2016.12.004}.
  • Decker and Monien (2003) Decker, R., Monien, K. 2003. Market Basket Analysis with Neural Gas Networks and Self-Organising maps. Journal of Targeting, Measurement and Analysis for Marketing. Volume 11. Issue 4. Pages 373–386. ISSN 0967-3237. {https://doi.org/10.1057/palgrave.jt.5740092}.
  • Fader et al. (2005) Fader, P. S., Hardie, B. G. S., Lee, K. L. 2005. "Counting Your Customers" the Easy Way: An Alternative to the Pareto/NBD Model. Marketing Science. Volume 24. Issue 2. Pages 275–284. ISSN 0732-2399. {https://doi.org/10.1287/mksc.1040.0098}.
  • Fildes and Goodwin (2007) Fildes, R., Goodwin, P. 2007. Against Your Better Judgment? How Organizations Can Improve Their Use of Management Judgment in Forecasting. Interfaces. Volume 37. Issue 6. Pages 570–576. ISSN 0092-2102. {https://doi.org/10.1287/inte.1070.0309}.
  • France and Ghose (2019) France, S. L., Ghose, S. 2019. Marketing Analytics: Methods, Practice, Implementation, and Links to Other Fields. Expert Systems with Applications. Volume 119. Pages 456–475. ISSN 0957-4174. {https://doi.org/10.1016/j.eswa.2018.11.002}.
  • Germann et al. (2014) Germann, F., Lilien, G. L., Fiedler, L., Kraus, M. 2014. Do Retailers Benefit from Deploying Customer Analytics? Journal of Retailing. Volume 90. Issue 4. Pages 587–593. ISSN 0022-4359. {https://doi.org/10.1016/j.jretai.2014.08.002}.
  • Glady et al. (2009) Glady, N., Baesens, B., Croux, C. 2009. A Modified Pareto/NBD Approach for Predicting Customer Lifetime Value. Expert Systems with Applications. Volume 36. Issue 2. Pages 2062–2071. ISSN 0957-4174. {https://doi.org/10.1016/j.eswa.2007.12.049}.
  • Grewal et al. (2017) Grewal, D., Roggeveen, A. L., Nordfält, J. 2017. The Future of Retailing. Journal of Retailing. Volume 93. Issue 1. Pages 1–6. ISSN 0022-4359. {https://doi.org/10.1016/j.jretai.2016.12.008}.
  • Griva et al. (2018) Griva, A., Bardaki, C., Pramatari, K., Papakiriakopoulos, D. 2018. Retail Business Analytics: Customer Visit Segmentation Using Market Basket Data. Expert Systems with Applications. Volume 100. Pages 1–16. ISSN 0957-4174. {https://doi.org/10.1016/j.eswa.2018.01.029}.
  • Holý et al. (2017) Holý, V., Sokol, O., Černý, M. 2017. Clustering Retail Products Based on Customer Behaviour. Applied Soft Computing. Volume 60. Pages 752–762. ISSN 1568-4946. {https://doi.org/10.1016/j.asoc.2017.02.004}.
  • Jerath et al. (2011) Jerath, K., Fader, P. S., Hardie, B. G. S. 2011. New Perspectives on Customer "Death" Using a Generalization of the Pareto/NBD Model. Marketing Science. Volume 30. Issue 5. Pages 866–880. ISSN 0732-2399. {https://doi.org/10.1287/mksc.1110.0654}.
  • Konuş et al. (2008) Konuş, U., Verhoef, P. C., Neslin, S. A. 2008. Multichannel Shopper Segments and Their Covariates. Journal of Retailing. Volume 84. Issue 4. Pages 398–413. ISSN 0022-4359. {https://doi.org/10.1016/j.jretai.2008.09.002}.
  • Kullback (1959) Kullback, S. 1959. Information Theory and Statistics. Hoboken. Wiley. ISBN 978-0-8446-5625-0. {https://books.google.com/books?id=-Yq4AAAAIAAJ}.
  • Kumar et al. (2017) Kumar, V., Anand, A., Song, H. 2017. Future of Retailer Profitability: An Organizing Framework. Journal of Retailing. Volume 93. Issue 1. Pages 96–119. ISSN 0022-4359. {https://doi.org/10.1016/j.jretai.2016.11.003}.
  • Lilien et al. (2013) Lilien, G. L., Roberts, J. H., Shankar, V. 2013. Effective Marketing Science Applications: Insights from ISMS-MSI Practice Prize Finalist Papers and Projects. Marketing Science. Volume 32. Issue 2. Pages 229–245. ISSN 0732-2399. {https://doi.org/10.1287/mksc.1120.0756}.
  • Lingras et al. (2014) Lingras, P., Elagamy, A., Ammar, A., Elouedi, Z. 2014. Iterative Meta-Clustering Through Granular Hierarchy of Supermarket Customers and Products. Information Sciences. Volume 257. Pages 14–31. ISSN 0020-0255. {https://doi.org/10.1016/j.ins.2013.09.018}.
  • Ma and Büschken (2011) Ma, S., Büschken, J. 2011. Counting Your Customers from an "Always a Share" Perspective. Marketing Letters. Volume 22. Issue 3. Pages 243–257. ISSN 0923-0645. {https://doi.org/10.1007/s11002-010-9123-0}.
  • McCarthy et al. (2006) McCarthy, T. M., Davis, D. F., Golicic, S. L., Mentzer, J. T. 2006.

    The Evolution of Sales Forecasting Management: A 20-Year Longitudinal Study of Forecasting Practices.

    Journal of Forecasting. Volume 25. Issue 5. Pages 303–324. ISSN 0277-6693. {https://doi.org/10.1002/for.989}.
  • Mzoughia et al. (2018) Mzoughia, M. B., Borle, S., Limam, M. 2018.

    A MCMC Approach for Modeling Customer Lifetime Behavior Using the COM-Poisson Distribution.

    Applied Stochastic Models in Business and Industry. Volume 34. Issue 2. Pages 113–127. ISSN 1524-1904. {https://doi.org/10.1002/asmb.2276}.
  • Peker et al. (2017) Peker, S., Kocyigit, A., Eren, P. E. 2017. LRFMP Model for Customer Segmentation in the Grocery Retail Industry: A Case Study. Marketing Intelligence & Planning. Volume 35. Issue 4. Pages 544–559. ISSN 0263-4503. {https://doi.org/10.1108/mip-11-2016-0210}.
  • Putra et al. (2012) Putra, I. K. G. D., Cahyawan, A. A. K. A., Shavitri H., D. 2012. Combination of Adaptive Resonance Theory 2 and RFM Model for Customer Segmentation in Retail Company. International Journal of Computer Applications. Volume 48. Issue 2. Pages 18–23. ISSN 0975-8887. {https://doi.org/10.5120/7320-0110}.
  • Reutterer et al. (2006) Reutterer, T., Mild, A., Natter, M., Taudes, A. 2006. A Dynamic Segmentation Approach for Targeting and Customizing Direct Marketing Campaigns. Journal of Interactive Marketing. Volume 20. Issue 3-4. Pages 43–57. ISSN 1094-9968. {https://doi.org/10.1002/dir.20066}.
  • Roberts et al. (2014) Roberts, J. H., Kayande, U., Stremersch, S. 2014. From Academic Research to Marketing Practice: Exploring the Marketing Science Value Chain. International Journal of Research in Marketing. Volume 31. Issue 2. Pages 127–140. ISSN 0167-8116. {https://doi.org/10.1016/j.ijresmar.2013.07.006}.
  • Rossi and Allenby (2003) Rossi, P. E., Allenby, G. M. 2003. Bayesian Statistics and Marketing. Marketing Science. Volume 22. Issue 3. Pages 304–328. ISSN 0732-2399. {https://doi.org/10.1287/mksc.22.3.304.17739}.
  • Sarantopoulos et al. (2016) Sarantopoulos, P., Theotokis, A., Pramatari, K., Doukidis, G. 2016. Shopping Missions: An Analytical Method for the Identification of Shopper Need States. Journal of Business Research. Volume 69. Issue 3. Pages 1043–1052. ISSN 0148-2963. {https://doi.org/10.1016/j.jbusres.2015.08.017}.
  • Schmittlein et al. (1987) Schmittlein, D. C., Morrison, D. G., Colombo, R. 1987. Counting Your Customers: Who Are They and What Will They Do Next? Management Science. Volume 33. Issue 1. Pages 1–24. ISSN 0025-1909. {https://doi.org/10.1287/mnsc.33.1.1}.
  • Schumacher (2006) Schumacher, N. 2006. The Butterfly Effect: Estimating "Faux-New" Customers. Journal of Consumer Marketing. Volume 23. Issue 1. Pages 43–46. ISSN 0736-3761. {https://doi.org/10.1108/07363760610641154}.
  • Sokol (2018) Sokol, O. 2018. Estimating the Number of Customers Using Market Basket Data. In Proceedings of the 7th International Conference on Management. Prešov. Bookman s.r.o. Pages 763–767. ISBN 978-80-8165-301-8. {http://www.managerconf.com/}.
  • Sokol and Holý (2018) Sokol, O., Holý, V. 2018. Customer Segmentation Based on a Shopping Mission in the Retail Business. Working Paper.
  • Tsai and Chiu (2004) Tsai, C. Y., Chiu, C. C. 2004. A Purchase-Based Market Segmentation Methodology. Expert Systems with Applications. Volume 27. Issue 2. Pages 265–276. ISSN 0957-4174. {https://doi.org/10.1016/j.eswa.2004.02.005}.
  • Zhang et al. (2007) Zhang, Y., Jiao, J., Ma, Y. 2007. Market Segmentation for Product Family Positioning Based on Fuzzy Clustering. Journal of Engineering Design. Volume 18. Issue 3. Pages 227–241. ISSN 0954-4828. {https://doi.org/10.1080/09544820600752781}.