1 Introduction
Retail transaction data contain information about the composition of the shopping basket, the price of the purchased goods and possibly the ID of customers. Analysis of such data is an important aspect of retail business analytics as it provides a valuable insight into the customer behavior. For an overview of marketing and retail business analytics and its impact, see Lilien et al. (2013), Germann et al. (2014), Roberts et al. (2014), Bradlow et al. (2017), Grewal et al. (2017), Kumar et al. (2017) and France and Ghose (2019).
The most basic question in retail analytics is how many customers does a retail chain actually have. In some cases, customers are monitored through the loyalty program. In other cases, customers can visit the store repeatedly without any possibility of identifying them. In our setting, we assume that only a portion of customers is monitored. Clearly, the number of customers with the loyalty card is known while the number of customers without the loyalty card is unknown. With absolute certainty, we can say that there is at least one customer without the loyalty card – a single customer could theoretically purchase all unmonitored shopping baskets. We also know that there are at most as many customers without the loyalty card as unmonitored receipts – each customer has to visit the store at least once. To be able to give a more specific estimate, we need to adopt further assumptions about customer behavior. One may assume that the number of customers is proportional to the number of sales, i.e. customers with the loyalty card shop as often as customers without the loyalty card. In that case, the number of customers can be estimated simply by the number of sales divided by the average frequency of customer visits. However, this assumption is too strict and unrealistic as customers without the loyalty card are likely to visit the store less frequently. The question is how less frequently. Our paper deals with estimation of the number of customers without the loyalty card under less restrictive and more realistic assumptions.
The knowledge of the number and structure of customers without the loyalty card finds its use in marketing departments involved in the retail decisionmaking process. It can be used to improve mass marketing communications, especially customer targeting and promotional sales planning. The number of customers in certain segment can also be used as an input in prediction of demand for products during the promotion sales as each segment has different shopping behavior. The proper prediction then allows to minimize soldout situations. All of the above mentioned applications play an important role in increasing sales and profits for the company.
In the literature, counting of customers is most often approached in the context of active customers that are monitored. Schmittlein et al. (1987) proposed the Pareto/NBD Model
to determine the probability that a customer with a given pattern of transactions is still active. This model was further studied and extended by
Fader et al. (2005), Batislam et al. (2007), Jerath et al. (2011), Abe (2009), Glady et al. (2009), Ma and Büschken (2011) and Mzoughia et al. (2018). Schumacher (2006) dealt with fauxnew customers who seem to be new customers due to the absence of past transaction history but are actually regular customers. In contrast, we count customers that are never monitored.Transaction data are often used for clustering of customers, products and baskets. Customer segements were determined by Tsai and Chiu (2004), Konuş et al. (2008), Putra et al. (2012), Lingras et al. (2014), Ammar et al. (2016), Peker et al. (2017) and Sokol and Holý (2018), product categories by Zhang et al. (2007), Lingras et al. (2014), Ammar et al. (2016) and Holý et al. (2017) and basket types by Decker and Monien (2003), Reutterer et al. (2006), Sarantopoulos et al. (2016), Griva et al. (2018) and Sokol and Holý (2018). We utilize clustering of customers as well as baskets.
Our procedure consists of the following steps. First, we determine basket types based on the value, price level and diversity of the products in the basket. Second, we determine customer segments based on the bought baskets using history of transactions linked to loyalty cards. For each customer segment, we have the average ratios of bought basket types and the average shopping frequency. Third, we estimate the distribution of customer segments using the observed distribution of basket types not linked to loyalty cards. For this purpose, we utilize the least squares (LS) estimator and the maximum likelihood (ML) estimator. In the case of the apriori information, the maximum a posteriori (MAP) estimator can be utilized as well. Finally, we estimate the number of customers from the distribution of customer segments and average shopping frequencies.
In a simulation study, we compare the LS and ML estimators with the naive estimator that assumes the distribution of customer segments is the same for customers with and without the loyalty card. We find that if the distribution differs but the behavior within the customer segments is consistent, the proposed methods significantly outperforms the naive approach. Overall, the performance of the LS estimator is quite similar to the ML estimator. However, the ML estimator is slightly better when the assumptions are fulfilled while the LS estimator is more robust when the assumptions are violated. For this reason, we recommend to use the LS estimator in practice.
In an empirical study, we study behavior of customers in a drugstore chain. We find that the actual total number of customers without the loyalty card is 1.28 times higher than the number suggested by the naive estimate assuming the same distribution of customer segments for members and nonmembers of the loyalty program. As expected, we find that the distribution of customers without the loyalty card includes much more casual customers who visit the store rarely and purchase smaller baskets. We identify that 14.2% of customers without the loyalty card focus on highend products in small baskets. These customers have very high value and acquiring them to the loyalty program should be the main interest for the company. On the other hand, 81.5% of regular customers are members of the loyalty program which supports hypothesis that the loyalty card is very popular among regular customers and the loyalty program may be near the point of saturation.
The rest of the paper is structured as follows. In Section 2, we propose a procedure for determining the number and distribution of unmonitored customers. In Section 3, we examine the behavior of the proposed method using simulations. In Section 4, we illustrate the applicability of the proposed method in practice. We conclude the paper in Section 5.
2 Methodology
2.1 Stochastic Framework
We introduce our probabilistic framework and notation. Let denote the number of transactions. Further, let there be basket types and customer segments. Each transaction has a single basket type and a customer segment. Let
be a random variable denoting the basket type of transaction
and a random variable denoting its customer segment. The random number of transactions with basket type is then , while the random number of transactions with customer segment is , , where denotes the indicator function. The random number of transactions with basket type and customer segment is , , . Clearly, we have(1) 
Using a vector and matrix notation, we have
, , , and . We denote the observed basket types and the observed customer segments. We further denote the observed numbers of transactions with given basket type, the observed numbers of transactions with given customer segment and the observed numbers of transactions with given basket type and customer segment. We assume that random variables are independent and identically distributed with the probability of basket type denoted as , . Similarly, we assume that are independent and identically distributed with the probability of customer segment denoted as , . Finally, the probability of basket type conditional on customer segment is denoted as , , . Using a vector and matrix notation, we have , and. Using the law of total probability, we have
(2) 
In matrix notation, we simply have . This is the key equation used in the proposed approach.
First, let us assume that we observe both basket types and customer segments . In this case, we can simply estimate the probability of basket type as , , the probability of customer segment as , and the probability of basket type conditional on customer segment as , , .
Next, let us assume that we observe only basket types . In this case, we can estimate in the same way but cannot estimate nor . However, if we additionally assume that the matrix of conditional probabilities is known, we are able to estimate the customer segment distribution . In the rest of this section, we propose several estimators of under this setting.
2.2 Matrix of Conditional Probabilities
Our goal is to estimate the distribution of customer segments using the observed vector of basket types and the given matrix of conditional probabilities . Note that the matrix has rows and columns. In the estimation, we utilize the equation , where is the estimated distribution of basket types. There are three distinctive cases based on the structure of the matrix .

[label=()]

If the matrix has linearly dependent columns, the estimation faces identifiability issues. Specifically, if we further assume linearly independent rows, the vector of customer probabilities is not identifiable as the equation with variable has infinite number of solutions. Note that a matrix with has always linearly dependent columns. In practice, we recommend to avoid this case by finding appropriate dimension of basket types and customer segments.

If the matrix has linearly independent columns and is square, i.e. , the estimation is quite simple. In this case, we can invert the matrix and obtain the straightforward estimator . However, finding a meaningful basket types and customer segments satisfying the independence and square restriction can be quite difficult. Therefore, this case is theoretically ideal but unlikely to occur in practice.

If the matrix has linearly independent columns and is tall, i.e. , the estimation is more complex. In this case the equation can be inconsistent, i.e. without a solution for . However, we can find suitable by minimizing an error between and . For this purpose, we utilize the least squares, maximum likelihood and maximum a posteriori methods. As this case is the most realistic, we devote to it the rest of this section.
The three cases are illustrated in Figure 1. We discuss how to obtain the matrix in practice in Section 2.7.
2.3 Least Squares Estimation
Let us assume that we observe the vector of basket types . The total number of transactions is denoted as while the vector of the numbers of transactions with given basket type is denoted as . Let us further assume that we known the conditional probability matrix and it has linearly independent columns with . The least squares (LS) method minimizes the squared error given by
(3) 
where . The estimates are found by solving the quadratic optimization problem
(4)  
s.t.  
2.4 Maximum Likelihood Estimation
Let us assume the same setting as in Section 2.3. The maximum likelihood (ML) method maximizes the likelihood function or, equivalently, the logarithm of the likelihood function. The logarithmic likelihood function is given by
(5)  
where the second equality holds as are independent and the third equality holds as are identically distributed. Using the law of total probability, we have
(6) 
The logarithmic likelihood function is then
(7) 
The estimates are found by solving the nonlinear optimization problem
(8)  
s.t.  
Note that this is equivalent to minimizing the Kullback–Leibler divergence from
to(9) 
where . The Kullback–Leibler divergence also known as relative entropy
measures how one probability distribution differs from a second probability distribution.
2.5 Maximum a Posteriori Estimation
If we have prior knowledge about the distribution of customers, e.g. an expert opinion or a previous study, we can adopt the Bayesian approach. For a review of the use of Bayesian methods in marketing, see Rossi and Allenby (2003).
Let us assume the same setting as in Section 2.3
. In Bayesian statistics, the
maximum a posteriori (MAP)method maximizes the posterior probability density. Using Bayes’ theorem, it is given by
(10) 
where
is the prior probability density of random parameter
. Equivalently, we can maximize the function(11) 
which is the logarithm of the posterior probability density omitting the constant divisor. Note that the first term is the logarithmic likelihood function . As the prior distribution, we assume the Dirichlet distribution with density
(12) 
where is the vector of concentration parameters and denotes the multivariate beta function. The vector
captures our prior knowledge. This is the most common choice for the prior distribution of the categorical variable as it is its conjugate prior distribution. Finally, we have
(13) 
For a given , the estimates are found by solving the nonlinear optimization problem
(14)  
s.t.  
This can be viewed as a regularized version of the maximum likelihood method with apriori knowledge of distribution of parameters .
2.6 Number of Customers
Finally, we estimate the number of customers. Suppose that we have the vector containing the estimated probabilities of the customer segments of a transaction. Let us assume that we also know the vector containing the average frequencies of visits by a customer in a given customer segment. We discuss how to obtain the vector in practice in Section 2.7. If we observe transactions, the number of transactions with customer segment is estimated as for . The average number of customers in customer segment is estimated as for . The average total number of customers is estimated as
(15) 
The probability of a customer belonging to customer segment is then estimated as
(16) 
Note that denotes the estimated distribution of the customer segments of a transaction while denotes the estimated distribution of the customer segments of a customer.
2.7 Observed Samples
Let us remind the assumptions of our proposed approach for the estimation of the average total number of customers .

[label=()]

The vector of basket types is observed.

The conditional probability matrix is known and has linearly independent columns.

The frequency vector is known.
The question is how can and be obtained in practice. We consider that we observe two samples of transactions. In the monitored sample, the transactions are linked to loyalty cards and we observe basket types denoted as and also customer segments denoted as . The unmonitored sample contains transactions by customers without the loyalty card and we observe only basket types . So far, we have focused on the unmonitored sample as the number of customers is known in the monitored sample. However, we can use transactions from the monitored sample to estimate the conditional probability matrix and the frequency vector . This can be done straightforwardly as we have the complete knowledge of customers in the monitored sample. We then assume that and . In the context of the monitored and unmonitored samples, the assumptions (b) and (c) can be reformulated in the following way.

The conditional probability matrix is equal to the conditional probability matrix of the monitored sample and has linearly independent columns.

The frequency vector is equal to the frequency vector of the monitored sample.
In other words, we assume that the behavior of customers in a given customer segment is the same with and without the loyalty card while the distribution of customers segments itself can differ. This setting is quite realistic as we discuss in the empirical study in Section 4.
3 Simulation Study
3.1 Setup for Simulations
We investigate the finitesample properties of the proposed approach using simulations. In the simulation study, we choose and to reflect the basket types and customer segments used in the empirical study in Section 4. In each simulation, we randomly generate the vector of customer segment probabilities of a transaction, the matrix of conditional probabilities and the vector of frequencies for the monitored and unmonitored samples. Therefore, we consider them to be random variables in the following text and denote them , , , , and respectively. Next, we describe the random generation in more detail.
The vector of customer segment probabilities of a transaction in the monitored sample is generated from the Dirichlet distribution with concentration parameters equal to 1 for all customer segments. The vector in the unmonitored sample is the weighted average of the vector and a random variable independently generated in the same way. Specifically,
(17)  
where is the weight determining the change in the samples and .
The rows of the matrix of conditional probabilities in the monitored sample are generated from the Dirichlet distribution with concentration parameters equal to 1 for all basket types and customer segments. The rows of the matrix in the unmonitored sample are the weighted average of rows of the matrix and random variables independently generated in the same way. Additionaly, the first row in both and is the weighted average of its original value and the mean of the other rows. Specifically,
(18)  
where is the weight determining the change in the samples, is the weight determining linear dependence in columns and .
The vector of frequencies in the monitored sample is generated from the Dirichlet distribution with concentration parameters equal to 2 for all customer segments and multiplied by 36. The average frequency is therefore equal to 3 as we have 12 basket types. The vector in the unmonitored sample is the weighted average of the vector and a random variable independently generated in the same way. Specifically,
(19)  
where is the weight determining the change in the samples and .
3.2 Evaluation Criterion
Suppose that we run simulations times. Let denote the true number of customers and its estimate in simulation , . For the evaluation of the accuracy of the estimates, we use the mean absolute percentage error (MAPE) given by
(20) 
The estimates with lower MAPE are preferred. McCarthy et al. (2006) and Fildes and Goodwin (2007) surveyed forecasting practices in management and found that the MAPE is the most commonly used accuracy measure.
3.3 Comparison of Methods
We compare the LS and ML estimators together with the naive estimator assuming the same distribution for customers in the monitored and unmonitored samples. We do not consider the Bayesian MAP estimator as we do not deal with any prior information in the simulation study. The simulation model with parameters , , and allows us to comprehensively investigate behavior of the estimation of the number of customers under various settings. Especially, we are able to simulate violation of the individual assumptions.
First, we focus on differences between the LS and ML estimators. We can see in Table 1 that estimation errors are very similar in both methods. In scenarios (i) and (ii) where all assumptions are fulfilled, the ML estimator has slightly lower error. In scenarios (v) and (viii) where the frequency vector changes and other assumptions are fulfilled, both estimators have the same errors. In scenarios (iii), (vi), (vii), (ix) and (x) where the conditional probability matrix changes, the LS estimator has lower error. The only exception is scenario (iv) in which the LS estimator is worse. In scenario (xi) where the conditional probability matrix has linearly dependent columns, the LS estimator has again lower error. Overall, the ML estimator is more precise in the theoretical case of ideal conditions. In practice, however, the LS estimator is more suitable as some degree of violation of the assumptions can always be expected. Therefore, we focus on the LS estimator in the rest of the paper.
Next, we focus on differences between the LS and naive estimators. In scenario (i) where the distribution of customers is the same for the monitored and unmonitored sample, the naive method outperforms the LS method. This case is, however, quite unrealistic. When we introduce some change in the customer distribution, the naive estimator become inconsistent while the LS method remains quite accurate. We can see this behavior in Figure 2. Further, we investigate the case of different distributions for the monitored and unmonitored samples. In Figure 3, we change the frequency vector. Both methods react very negatively to this change. The LS estimator, however, has always lower error. In figures 4 and 5, we modify the conditional probability matrix. This does not influence the naive estimator as it is based only on the customer probability vector and the frequency vector. The LS estimator is better choice even for quite high change in the conditional probability matrix. When the matrix changes completely, however, the naive estimator has lower error. In the case of the matrix with numerically or even algebraically linearly dependent columns, the error of the LS approach increases. However, the LS estimator is still preferred before the naive estimator. Table 1 offers a view when multiple assumptions are violated and comes to the same conclusions. In scenarios (iv), (vii) and (x) where the conditional probability matrix completely changes, the naive estimator has lower error. In other cases, i.e. scenarios (ii), (iii), (v), (vi), (viii), (ix) and (xi), the LS estimator outperforms the naive approach. Overall, the LS estimator is much better choice than the naive estimator unless its assumptions are completely violated.
Scenario  Method  

Naive  LS  ML  
(i)  0.00  0.00  0.00  0.00  0.03  0.14  0.13 
(ii)  1.00  0.00  0.00  0.00  34.58  0.14  0.13 
(iii)  1.00  0.00  0.25  0.00  34.58  11.11  11.22 
(iv)  1.00  0.00  1.00  0.00  34.58  36.20  36.12 
(v)  1.00  0.25  0.00  0.00  51.21  32.41  32.41 
(vi)  1.00  0.25  0.25  0.00  51.21  35.69  35.77 
(vii)  1.00  0.25  1.00  0.00  51.21  53.46  53.48 
(viii)  1.00  1.00  0.00  0.00  58.53  58.52  58.52 
(ix)  1.00  1.00  0.25  0.00  58.53  56.96  56.99 
(x)  1.00  1.00  1.00  0.00  58.53  60.66  60.72 
(xi)  1.00  0.00  0.00  1.00  34.58  10.82  10.86 
4 Empirical Study
4.1 Motivation
The loyalty program of the studied Czech retail chain is wellknown in the Czech Republic as it is frequently advertised not only in stores but in commercials in TV, radio and newspaper as well. The loyalty program brings a lot of benefits to its members, such as an extra discount for members or gifts for buying specific products. Even more, customers can collect bonus points for spending and then exchange them for 10% discount on the whole basket. These loyalty perks resulted into high popularity of the loyalty program among the customers and nowadays the majority of receipts are linked to a specific customer with the loyalty card.
The knowledge of the number of customers who are not members of the loyalty program is crucial for various marketing strategies. Among the management of the company it is believed that the loyalty program may be already saturated and the vast majority of customers who are willing to join the loyalty program are already a members. In this study, we focus on quantification of this hypothesis.
Our monitored sample consists of the loyalty club members and our unmonitored sample are customers without the loyalty card. The goal is to reliably estimate the number of unique customers in the unmonitored sample. We do not have any prior knowledge about the distribution of customers and therefore we cannot use the MAP estimator. We use the LS method in the empirical study as it gives slightly better results than the ML estimator when the assumptions are violated to some degree according to the simulation study.
4.2 Transaction Data
We use the proposed method on a sample of real transaction data. Our dataset consists of individual purchase data of one of the retail chains in the drugstore market of the Czech Republic. The retail chain sells over 10 thousand products which are divided into 55 categories. This categorization is done by an expert’s opinion based on the product properties and purpose. Each product is also assigned to one of the three pricelevels – lowend, standard and highend. We use data from the table of all transactions during 3 month time frame between September and November 2018. The dataset includes every single product sold. It also includes identification of the basket and in some cases a link to a specific customer. We remove monitored customers with extremely high frequency (more than 15 visits per month) from the dataset as they are a special type of customers and we believe that no customer without the loyalty card exhibits this behavior. The cleaned dataset consists of over 7 million receipts, with 69% linked to a specific customer through their loyalty card.
4.3 Basket Types
In order to get meaningful customer segmentation, we use two phase method involving segmentation of both baskets and customers. Particularly, in the first phase, we define basket types and in the second phase, we define customer segments based on shopping history of customers.
Basket types are determined using three dimensions – the value of the basket, price level of purchased products and the diversity of the basket. We distinguish baskets of small and large value. We also single out baskets with at least one product for children such as child wipes or diapers. The pricelevel segmentation of basket has two levels. If at least half of revenue is made by highend products in basket, then the basket is marked as highend, otherwise the basket is marked as standard. The last dimension is the diversity of the basket. Basket with products in only one category is marked as focused and basket containing at least two different categories is marked as general. The combination of these three dimensions allows us to distinguish 12 different types of baskets. The description of individual basket types is presented in Table 2.
Type  Value  Price Level  Diversity 

B01  Small value  Standard  Focused 
B02  Small value  Standard  General 
B03  Small value  Highend  Focused 
B04  Small value  Highend  General 
B05  Large value  Standard  Focused 
B06  Large value  Standard  General 
B07  Large value  Highend  Focused 
B08  Large value  Highend  General 
B09  Products for children  Standard  Focused 
B10  Products for children  Standard  General 
B11  Products for children  Highend  Focused 
B12  Products for children  Highend  General 
4.4 Customer Segments
We use purchase history of customers to assign each member of the loyalty program to a single customer segment. This is done by using predetermined rules on purchase history using frequency of basket types. We distinguish general customers, suppliers, customers with children, standard accidental customers and accidental customers focused on highend products. Brief description of each segment is presented in Table 3.
The largest group among customers with the loyalty card are general customers. This is the archetype of general customer who exhibits standard behavior and moderate shopping frequency. Another large segment of customers are the suppliers. These are the customers who visit the store moderately and their basket value is very high. The customers who buy products for small children are one of the most valuable ones as their frequency is very high and their revenue is substantially aboveaverage. However, margins of their most frequently purchased products are below standard. For this reason, we distinguish them as a special segment as the goal is not only to maximize their revenue but also redirect them to the products with higher margins. Two segments of accidental customers represent customers who visit the store rarely and focus mainly on small baskets. We distinguish two types of accidental customers by the ratio of highend products in their basket. We show the distribution of basket types in each customer segment in Figure 6. Besides basket distribution, the customer segments also differ in shopping frequency as described in Table 3.
Segment  Name  Focus  Frequency 

C01  General  Standard behavior  Moderate 
C02  Supplier  Large basket  Moderate 
C03  With children  Children basket  High 
C04  Accidental standard  Small basket  Low 
C05  Accidental highend  Small highEnd basket  Low 
4.5 Distribution of Customers
It is reasonable to assume that the distribution of members and nonmembers of the loyalty program is substantially different and our case study of the drugstore retail chain supports it. While general customers together with suppliers form 81.4% of the loyalty program members, nonmembers form only 31.5%. Similarly, customers with children make 7.5% of customers with the loyalty card and only 2.8% of customers without the loyalty card. The exactly opposite situation occurs with accidental customer share. Accidental standard customers make 9.9% of members and 51.5% of nonmembers while highend accidental customers make 1.2% of members and 14.2% of nonmembers. The comparison between distributions of members and nonmembers of the loyalty program in each customer segment is shown in Figure 7.
The results support the initial hypothesis that that the loyalty program is already saturated and customers who are willing to join the loyalty program are already members. The main result is the estimated distribution of the customer segments for customers without the loyalty card. Accidental customers with low frequency of visits form 65.7% of unique customers who are not members of the loyalty program. On the other hand, general customers and suppliers are underrepresented in comparison to members of the loyalty program.
The differences in customer segment distributions between members and nonmembers of the loyalty program are crucial for estimation of unique nonmembers customers. The number estimated by the naive approach assuming the same distribution of members and nonmembers is 22.1 % lower than the number estimated by the proposed method which takes the differences into account. The naive approach estimates 739 883 customers while the proposed method estimates 949 024 customers without loyalty card. This result is in line with expectations of the management of the company.
4.6 Loyalty Program Penetration
Using our estimates, we are able to study the penetration of the loyalty program in each customer segment. The loyalty program penetration is shown in Figure 8. General customers are members of the loyalty program in 81.9% cases and suppliers in 81.0%. Those relatively high turnin ratios support the hypothesis that the loyalty program is close to the saturation point in these segments. For customers with children, we have 81.8% participation rate which is probably caused by the recent focus of the company on attracting customers with children into the loyalty program by additional club discounts on products for children and other promotional activities. On the other hand, accidental customers are highly unlikely to participate in the loyalty program as accidental standard customers are members only in 24.7% cases. The most interesting result is that only 12.5% accidental highend customers are members. Customers who buy highend products are very valuable for the company as highend products have significantly higher margins than their cheaper alternatives. Therefore, it is crucial for the company to attract them and bind them to the company’s brand. Our study signalizes a big potential for the company in this specific area.
5 Conclusion
We propose a method for estimation of the total number of unique customers using retail transaction data. The method also estimates the number of unique customers in each customer segment. We verify the proposed method using synthetic data in the extensive simulation study. The method performs quite well even if its assumptions are violated to some point. Therefore, we expect the method to give reliable results in real applications.
In the empirical part, we use retail transaction data from a Czech drugstore’s retail chain. We quantify the initial proposition that the loyalty program is popular among regular customers and rather unpopular among accidental customers. This is something which has been done only by expert’s opinion before and therefore the estimates were contaminated by high uncertainty. The reliable quantification is the main contribution of the method. We also identify a segment of accidental highend customers which contains only a few members of the loyalty program and can bring additional profit to the company when properly addressed. Overall, the resulting estimates are crucial not only for planning of promotional sales and other marketing techniques but may also be used in predictions of demand for specific types of products.
Acknowledgements
We would like to thank Michal Černý for his useful comments. We would also like to thank participants of the 7th International Conference on Management, Nový Smokovec, September 26–29, 2018 and 7th International Business and Management Sciences Congress, Istanbul, March 14–15, 2019 for fruitful discussions.
Funding
This work was supported by the Czech Science Foundation under Grant 1902773S and by the Internal Grant Agency of the University of Economics, Prague under Grant F4/58/2017.
References
 Abe (2009) Abe, M. 2009. "Counting Your Customers" One by One: A Hierarchical Bayes Extension to the Pareto/NBD Model. Marketing Science. Volume 28. Issue 3. Pages 541–553. ISSN 07322399. {https://doi.org/10.1287/mksc.1090.0502}.
 Ammar et al. (2016) Ammar, A., Elouedi, Z., Lingras, P. 2016. MetaClustering of Possibilistically Segmented Retail Datasets. Fuzzy Sets and Systems. Volume 286. Pages 173–196. ISSN 01650114. {https://doi.org/10.1016/j.fss.2015.07.019}.
 Batislam et al. (2007) Batislam, E. P., Denizel, M., Filiztekin, A. 2007. Empirical Validation and Comparison of Models for Customer Base Analysis. International Journal of Research in Marketing. Volume 24. Issue 3. Pages 201–209. ISSN 01678116. {https://doi.org/10.1016/j.ijresmar.2006.12.005}.
 Bradlow et al. (2017) Bradlow, E. T., Gangwar, M., Kopalle, P., Voleti, S. 2017. The Role of Big Data and Predictive Analytics in Retailing. Journal of Retailing. Volume 93. Issue 1. Pages 79–95. ISSN 00224359. {https://doi.org/10.1016/j.jretai.2016.12.004}.
 Decker and Monien (2003) Decker, R., Monien, K. 2003. Market Basket Analysis with Neural Gas Networks and SelfOrganising maps. Journal of Targeting, Measurement and Analysis for Marketing. Volume 11. Issue 4. Pages 373–386. ISSN 09673237. {https://doi.org/10.1057/palgrave.jt.5740092}.
 Fader et al. (2005) Fader, P. S., Hardie, B. G. S., Lee, K. L. 2005. "Counting Your Customers" the Easy Way: An Alternative to the Pareto/NBD Model. Marketing Science. Volume 24. Issue 2. Pages 275–284. ISSN 07322399. {https://doi.org/10.1287/mksc.1040.0098}.
 Fildes and Goodwin (2007) Fildes, R., Goodwin, P. 2007. Against Your Better Judgment? How Organizations Can Improve Their Use of Management Judgment in Forecasting. Interfaces. Volume 37. Issue 6. Pages 570–576. ISSN 00922102. {https://doi.org/10.1287/inte.1070.0309}.
 France and Ghose (2019) France, S. L., Ghose, S. 2019. Marketing Analytics: Methods, Practice, Implementation, and Links to Other Fields. Expert Systems with Applications. Volume 119. Pages 456–475. ISSN 09574174. {https://doi.org/10.1016/j.eswa.2018.11.002}.
 Germann et al. (2014) Germann, F., Lilien, G. L., Fiedler, L., Kraus, M. 2014. Do Retailers Benefit from Deploying Customer Analytics? Journal of Retailing. Volume 90. Issue 4. Pages 587–593. ISSN 00224359. {https://doi.org/10.1016/j.jretai.2014.08.002}.
 Glady et al. (2009) Glady, N., Baesens, B., Croux, C. 2009. A Modified Pareto/NBD Approach for Predicting Customer Lifetime Value. Expert Systems with Applications. Volume 36. Issue 2. Pages 2062–2071. ISSN 09574174. {https://doi.org/10.1016/j.eswa.2007.12.049}.
 Grewal et al. (2017) Grewal, D., Roggeveen, A. L., Nordfält, J. 2017. The Future of Retailing. Journal of Retailing. Volume 93. Issue 1. Pages 1–6. ISSN 00224359. {https://doi.org/10.1016/j.jretai.2016.12.008}.
 Griva et al. (2018) Griva, A., Bardaki, C., Pramatari, K., Papakiriakopoulos, D. 2018. Retail Business Analytics: Customer Visit Segmentation Using Market Basket Data. Expert Systems with Applications. Volume 100. Pages 1–16. ISSN 09574174. {https://doi.org/10.1016/j.eswa.2018.01.029}.
 Holý et al. (2017) Holý, V., Sokol, O., Černý, M. 2017. Clustering Retail Products Based on Customer Behaviour. Applied Soft Computing. Volume 60. Pages 752–762. ISSN 15684946. {https://doi.org/10.1016/j.asoc.2017.02.004}.
 Jerath et al. (2011) Jerath, K., Fader, P. S., Hardie, B. G. S. 2011. New Perspectives on Customer "Death" Using a Generalization of the Pareto/NBD Model. Marketing Science. Volume 30. Issue 5. Pages 866–880. ISSN 07322399. {https://doi.org/10.1287/mksc.1110.0654}.
 Konuş et al. (2008) Konuş, U., Verhoef, P. C., Neslin, S. A. 2008. Multichannel Shopper Segments and Their Covariates. Journal of Retailing. Volume 84. Issue 4. Pages 398–413. ISSN 00224359. {https://doi.org/10.1016/j.jretai.2008.09.002}.
 Kullback (1959) Kullback, S. 1959. Information Theory and Statistics. Hoboken. Wiley. ISBN 9780844656250. {https://books.google.com/books?id=Yq4AAAAIAAJ}.
 Kumar et al. (2017) Kumar, V., Anand, A., Song, H. 2017. Future of Retailer Profitability: An Organizing Framework. Journal of Retailing. Volume 93. Issue 1. Pages 96–119. ISSN 00224359. {https://doi.org/10.1016/j.jretai.2016.11.003}.
 Lilien et al. (2013) Lilien, G. L., Roberts, J. H., Shankar, V. 2013. Effective Marketing Science Applications: Insights from ISMSMSI Practice Prize Finalist Papers and Projects. Marketing Science. Volume 32. Issue 2. Pages 229–245. ISSN 07322399. {https://doi.org/10.1287/mksc.1120.0756}.
 Lingras et al. (2014) Lingras, P., Elagamy, A., Ammar, A., Elouedi, Z. 2014. Iterative MetaClustering Through Granular Hierarchy of Supermarket Customers and Products. Information Sciences. Volume 257. Pages 14–31. ISSN 00200255. {https://doi.org/10.1016/j.ins.2013.09.018}.
 Ma and Büschken (2011) Ma, S., Büschken, J. 2011. Counting Your Customers from an "Always a Share" Perspective. Marketing Letters. Volume 22. Issue 3. Pages 243–257. ISSN 09230645. {https://doi.org/10.1007/s1100201091230}.

McCarthy et al. (2006)
McCarthy, T. M., Davis, D. F., Golicic, S. L.,
Mentzer, J. T.
2006.
The Evolution of Sales Forecasting Management: A 20Year Longitudinal Study of Forecasting Practices.
Journal of Forecasting. Volume 25. Issue 5. Pages 303–324. ISSN 02776693. {https://doi.org/10.1002/for.989}. 
Mzoughia et al. (2018)
Mzoughia, M. B., Borle, S., Limam, M.
2018.
A MCMC Approach for Modeling Customer Lifetime Behavior Using the COMPoisson Distribution.
Applied Stochastic Models in Business and Industry. Volume 34. Issue 2. Pages 113–127. ISSN 15241904. {https://doi.org/10.1002/asmb.2276}.  Peker et al. (2017) Peker, S., Kocyigit, A., Eren, P. E. 2017. LRFMP Model for Customer Segmentation in the Grocery Retail Industry: A Case Study. Marketing Intelligence & Planning. Volume 35. Issue 4. Pages 544–559. ISSN 02634503. {https://doi.org/10.1108/mip1120160210}.
 Putra et al. (2012) Putra, I. K. G. D., Cahyawan, A. A. K. A., Shavitri H., D. 2012. Combination of Adaptive Resonance Theory 2 and RFM Model for Customer Segmentation in Retail Company. International Journal of Computer Applications. Volume 48. Issue 2. Pages 18–23. ISSN 09758887. {https://doi.org/10.5120/73200110}.
 Reutterer et al. (2006) Reutterer, T., Mild, A., Natter, M., Taudes, A. 2006. A Dynamic Segmentation Approach for Targeting and Customizing Direct Marketing Campaigns. Journal of Interactive Marketing. Volume 20. Issue 34. Pages 43–57. ISSN 10949968. {https://doi.org/10.1002/dir.20066}.
 Roberts et al. (2014) Roberts, J. H., Kayande, U., Stremersch, S. 2014. From Academic Research to Marketing Practice: Exploring the Marketing Science Value Chain. International Journal of Research in Marketing. Volume 31. Issue 2. Pages 127–140. ISSN 01678116. {https://doi.org/10.1016/j.ijresmar.2013.07.006}.
 Rossi and Allenby (2003) Rossi, P. E., Allenby, G. M. 2003. Bayesian Statistics and Marketing. Marketing Science. Volume 22. Issue 3. Pages 304–328. ISSN 07322399. {https://doi.org/10.1287/mksc.22.3.304.17739}.
 Sarantopoulos et al. (2016) Sarantopoulos, P., Theotokis, A., Pramatari, K., Doukidis, G. 2016. Shopping Missions: An Analytical Method for the Identification of Shopper Need States. Journal of Business Research. Volume 69. Issue 3. Pages 1043–1052. ISSN 01482963. {https://doi.org/10.1016/j.jbusres.2015.08.017}.
 Schmittlein et al. (1987) Schmittlein, D. C., Morrison, D. G., Colombo, R. 1987. Counting Your Customers: Who Are They and What Will They Do Next? Management Science. Volume 33. Issue 1. Pages 1–24. ISSN 00251909. {https://doi.org/10.1287/mnsc.33.1.1}.
 Schumacher (2006) Schumacher, N. 2006. The Butterfly Effect: Estimating "FauxNew" Customers. Journal of Consumer Marketing. Volume 23. Issue 1. Pages 43–46. ISSN 07363761. {https://doi.org/10.1108/07363760610641154}.
 Sokol (2018) Sokol, O. 2018. Estimating the Number of Customers Using Market Basket Data. In Proceedings of the 7th International Conference on Management. Prešov. Bookman s.r.o. Pages 763–767. ISBN 9788081653018. {http://www.managerconf.com/}.
 Sokol and Holý (2018) Sokol, O., Holý, V. 2018. Customer Segmentation Based on a Shopping Mission in the Retail Business. Working Paper.
 Tsai and Chiu (2004) Tsai, C. Y., Chiu, C. C. 2004. A PurchaseBased Market Segmentation Methodology. Expert Systems with Applications. Volume 27. Issue 2. Pages 265–276. ISSN 09574174. {https://doi.org/10.1016/j.eswa.2004.02.005}.
 Zhang et al. (2007) Zhang, Y., Jiao, J., Ma, Y. 2007. Market Segmentation for Product Family Positioning Based on Fuzzy Clustering. Journal of Engineering Design. Volume 18. Issue 3. Pages 227–241. ISSN 09544828. {https://doi.org/10.1080/09544820600752781}.
Comments
There are no comments yet.