For some time, economists have paid little attention to questions of inequality and wealth distribution. The Nobel Prize economist Robert Lucas, e.g., declared : “Of the tendencies that are harmful to sound economics, the most seductive, and in my opinion, the most poisonous, is to focus on questions of distribution.” According to Robert H. Wade , this attitude was due to the dominating theory of trickle-down economics, which celebrates inequality as an incentive for effort and creativity.
The “publishing sensation” (Wade) of Piketty’s “Capital in the Twenty-First Century”  has brought back inequality into the focus of mainstream economists. Piketty did not examine why inequality matters (see  or  for this topic), but how it evolved over time. Most of his data stemmed from tax records, which go back to the beginning of the 19th century in some cases. While tax data have the advantage that their collection is obligatory, they have other shortcomings. For about thirty years, governments in most OECD countries have followed the consensus among mainstream economists that taxes on the rich and aid to the poor tend to be inversely correlated with economic growth. This has resulted in tax cuts for the rich and the elimination of some types of tax: the wealth tax, e.g., was dropped in Germany in 1997, which has the side effect that some information about wealth can no longer be drawn from tax data. Another problem with tax records is bias at the top tail because higher income and wealth are more likely to be hidden in tax havens.
Alternative data sources are surveys like the Eurosystem Household and Finance Consumption Survey (HFCS) . Although the HFCS combines comprehensive information in a unified form across several European countries, it suffers from bias at the top tail, too. One reason for this bias is the random sampling process which is very unlikely to represent the rare values from the top tail of the distribution. Another reason is that the survey response rate is known to be lower for higher income and wealth . These effects make it necessary to correct for the missing rich in some way.
To this end, Vermeulen 
proposed to replace the survey data for high wealth values with a parametric model based on the Pareto distribution, i.e., a probability distribution with density. For parameter estimation, he suggested to combine the survey data with rich lists like the Forbes World’s billionaires list. Bach et al.  applied this idea to the HFCS data in combination with national rich lists for different European countries. Eckerstorfer et al.  relied only on the HFCS data, from which they extrapolated the top-tail of the wealth distribution with well-defined criteria for the choice of some model parameters, which have been chosen in an ad-hoc manner by Vermeulen and Bach et al. All authors used these methods to estimate the wealth share of the richest one-percent, which they found to increase with respect to the raw HFCS data when the correction for the missing rich is applied. The actual values for this share were estimated to 33% for Germany  and 38% for Austria .
This article provides a comprehensive summary of the statistical model and how it is used to estimate the wealth share of the top one percent. It is organized as follows: section 2 describes the underlying HFCS data and the rich list for Germany. Section 3 describes the statistical model and how it is utilized for computing the percentile wealth share. Section 4 gives a survey of methods for estimating the model parameters. It should be noted, that criteria for determining some of these parameters have not yet been discussed in the literature. The results of these methods when applied to the data for Germany are presented in section 5. The R code written for the present study can be downloaded from the author’s website111http://informatik.hsnr.de/~dalitz/data/wealthshare/.
2 Data base
This study is based on two data sources: the Eurosystem Household Finance and Consumption Survey (HFCS) 2012, and the Manager Magazin rich list from 2012 with extended information collected by the Deutsches Institut für Wirtschaftsforschung (DIW).
The HFCS was performed between 2008 and 2011 by the national banks in the Eurosystem countries Belgium, Germany, Spain, France, Italy, Greece, Cyprus, Luxembourg, Malta, Netherlands, Austria, Portugal, Slovenia, Slovakia, and Finland. In each country, a questionnaire was sent to sample households on basis of the sampling criteria described in . From all the data collected, the present study only uses the net wealth (variable DN3001) for country Germany (variable SA0100 = DE). To compensate for sampling errors, two counter-measures were taken by the national banks:
As rich households are known to have a lower response rate, rich households were over-sampled on basis of geographical area.
To each response, a household weight (variable HW0010) was assigned that estimates the number of households that this particular household represents. The rough idea, but no details, of the weighting process is described in .
There is some controversy, in which situations the weights should be used. Bach et al. 
heavily relied on the weights because they used linear regression for estimating the Pareto parameter. Eckerstorfer et al.  however ignored the weights in some calculations to “limit the influence of [..] unknown implicit assumptions.” The present study uses weights throughout.
To deal with item non-response, the HFCS data also provides imputed values for missing variable values. This only affects the variablenet wealth, but not the household weights. For each missing value, five imputed values are provided, such that there are five different complete data sets. These can be used in two different ways. One is by averaging the variables for each household over the five sets, a method used by Bach et al. in . This is called the averaged HFCS data in this paper. The other method is to take each imputed data set separately to eventually obtain a range for the observables of interest, a method used by Eckerstorfer et al. in .
The Manager Magazin annually publishes a list of the 500 richest families in Germany. Their net wealth is estimated from different sources, which include information provided by the person themselves. The editors of the list indicate that the list is incomplete because some persons have asked for removal from the list. To make the information based on families compatible with the HFCS data based on households, Bach et al. have collected information from public sources about the number of households for each family. Moreover they have identified non-residents on the list and recommend to only use the top 200 entries from this list. The present study uses these data as provided by Bach et al., but with the non-residents removed, and down to the wealth threshold of the 200th entry, which is 500 Mio € and thus goes down to 206 entries.
From the large gap between the highest HFCS reported wealth (76 Mio €) and the lowest value in the Manager Magazin list, it can be concluded that oversampling and weights cannot fully compensate for top-tail bias, and that a compensation for the missing rich is necessary.
3 Statistical model
The wealth distribution can be represented by a probability density , such that is the fraction of households having wealth in . When this distribution is known, all interesting observables can be computed therefrom. The mean wealth , e.g., is given by
and the total wealth is , where is the total number of households. The percentile , i.e. the wealth value for which percent of the households are richer, is implicitly defined by
and from this value, the wealth share of the richest percent are given by
3.1 Density estimation
The density must be estimated from the HFCS data or the rich list, which both provide lists of wealth values and corresponding weights , such that is the wealth of the -th sample household and is the number of households that this value represents. The approach in [8, 9, 10] is to approximate below some threshold with a histogram having cells centered at and to use a parametric model based on the Pareto distribution for . This results in the following histogram density estimator for :
for each histogram cell centered around , i.e., . Here, is the total sum over all weights, i.e., . For , the density is approximated with a Pareto distribution:
As can be seen in Fig. 1, the histogram obtained from the HFCS data is very noisy in the transition region. A smoother estimate for
can be obtained with a weighted kernel density estimator
where and is a band-width. This approximation, however, introduces yet another parameter and makes a numeric integration for computing the integrals in Eq. (3) necessary.
3.2 Percentile share computation
if the resulting is less than . For , it is given by
The resulting wealth share (3) is for :
and for :
It should be noted that the integrals over the Pareto distribution have been evaluated analytically to obtain these formulas. Bach et al.  have instead “imputed” wealth values for with weights drawn from the Pareto distribution. This leads to the same results, because this imputation is basically a numeric integration.
Eckerstorfer et al. observed that the upper limit in the integrals in (3) actually is not infinity, but some finite value . In this case, the term in (9) & (10) has to be replaced with . The upper limit can be estimated from a rich list as the highest value plus half the distance to the second highest value. For Germany, the Manager Magazin rich list leads to €, so that a truncation at this value only affects the fourth decimal place.
3.3 Model validation
While the approximations (4) or (6) are non-parametric and make no assumptions about the shape of the wealth density , the approximation (5) only makes sense when the density is actually close to the density of the Pareto distribution for . This assumption can be verified by testing whether the average wealth above a threshold follows van der Wijk’s law:
While it is easy to see that van der Wijk’s law holds for the Pareto density by inserting on the left hand side of Eq. (11), it is also the other way around: the Pareto distribution is the only distribution for which (11) holds222To see this, differentiate (11) with respect to and solve the resulting differential equation for the complementary cumulative distribution function
and solve the resulting differential equation for the complementary cumulative distribution function..
Eq. (11) thus provides a simple test whether the tail of the survey data follows a Pareto distribution: compute
and check whether it is constant. As can be seen in Fig. 2, van der Wijk’s law holds for the HFCS data for € and for the Manager Magazin data for €. This seems to be a contradiction, because it seems to imply that somewhere above the highest HFCS wealth (76 Mio €) the distribution deviates from the Pareto shape. It is however in accordance with Bach’s observation that the Manager Magazin data tends to become unreliable for more than the 200 highest entries , which corresponds to wealths below 500 Mio €.
4 Parameter determination
The focus in the published literature so far has been on the determination of the power in the Pareto distribution, which is summarized in section 4.1. The subsequent sections discuss methods for determining the normalization constant and propose a new method for a unique choice for the transition threshold , a problem to which little attention has been given previously.
where is the wealth threshold, above which the data is used for estimating , and is the sum of all weights greater than . The Monte-Carlo experiments in section 5.1
indicate that this estimator has both considerable bias and variance.
An estimator obtained from the maximum-likelihood (ML) principle is therefore preferable, because ML-estimators are generally known to be consistent . The weights can be incorporated into the ML estimator by treating each measured value as if it were measured times. This yields the estimator 
Another estimator for can be obtained from the complementary cumulative distribution function of the Pareto distribution:
Taking the logarithm on both sides leads to a linear relationship , from which can be estimated with a least squares fit as the linear regression coefficient
When the linear regression is not done with Eq. (16), but with built in routines of a statistical software package, it is important to set the “intercept term” to zero, because otherwise the least square fit is done with a different formula that additionally estimates a constant term333In R, this is achieved by using the formula “” in the function lm..
Clauset et al.  made Monte-Carlo experiments to compare the least squares fit estimator with the maximum likelihood estimator . They found that showed noticeable bias, while was practically unbiased. Vermeulen  made Monte-Carlo experiments under the assumption of a non-response rate that was correlated with wealth according to . This lead to considerable bias, which was slightly stronger for than for .
Another problem with ansatz (15) is that it is not applicable in the presence of missing data, e.g. for the combined HFCS and Manager Magazin data, which have a considerable gap between the highest HFCS and lowest Manager Magazin wealth value. Eq. (15) is based on the complementary empirical distribution function, and thus requires knowledge of the weight of the missing values, which cannot be estimated without knowing beforehand. The same problem affects ansatz (12) and the resulting estimator (13). The maximum likelihood estimator, however, does not suffer from this shortcoming, because the term in (14) is not used as an estimator for the distribution function, but simply represents the total number of measured values.
Whichever of the estimators for is used, in any case a choice for is necessary. Bach et al.  based their choice on a visual inspection of Fig. 2 and set somewhere at the beginning of the region where the curve is roughly constant. This approach, however, does not yield a well-defined algorithm for the exact choice of , and an optimality criterion based on the goodness-of-fit is preferable. Goodness-of-fit is typically measured with the distance between the empirical cumulative distribution function and the fitted cumulative distribution function . Clauset et al.  recommended the Kolmogorov-Smirnov criterion, which is the maximum distance between the two distribution functions (see Fig. 3):
while Eckerstorfer et al.  used the Cramer-van Mieses criterion, which is based on the area between the distribution functions:
When the integral is numerically evaluated with the trapezoidal rule, the criterion becomes
where is the integrand in (18), i.e.
As can be seen in Fig. 4, both criteria have typically the same qualitative dependency on with a minimum at almost the same position. This means that both criteria yield very similar optimal choices for .
4.2 Normalization constant
The normalization constant in the Pareto distribution for must be chosen such that the density combined from HFCS-histogram and Pareto-fit is normalized to one, i.e. . Bach et al.  have achieved this by setting the weight of the Pareto tail equal to the HFCS weight of all households with wealth :
As the data frequency in the region around is typically quite low, this choice has the effect that varies in a discontinuous way as moves across a data point. Eckerstorfer et al.  used the same method, but with (see section 4.1) instead of :
The choice (21) for the normalization constant has the side effect that the estimated density function is no longer normalized to one, but the area under the density function depends on the choice for . To make sure that the total number of all households remained unchanged, Eckerstorfer et al. rescaled all HFCS weights for wealth values below to
A new way to obtain yet another estimate for the normalization constant consists in a utilization of the rich list, which gives the exact number of households above some wealth threshold :
where is the number of households in the rich list with wealth greater or equal . is the total number of households, which is increased by (24) to
The threshold should be chosen as low as possible, yet still in the region where van der Wijk’s law holds for the rich list data. For the Manager Magazin data, e.g., it can be concluded from Fig. 2 that Mio € is a reasonable threshold for this rich list. As wealth values in this list are only rough estimates and the gap to the next wealth value is 50 Mio €, the most conservative choice is to use with Mio € and with Mio €.
4.3 Transition threshold
Little attention has been paid in the literature to the choice of the transition threshold . From the discussion in section 4.1, it would seem natural to set it equal to , i.e., to the threshold that yields the best fit according to a goodness-of-fit criterion. Surprisingly, both Bach et al. and Eckerstorfer et al. set it much higher: Mio € & Mio € (Bach et al. for Germany ) and Mio € & Mio € (Eckerstorfer et al. for Austria ).
Eckerstorfer et al. argued: “We choose this €4 million cut-off point because the frequency of observations starts to markedly decline beyond this level of net wealth.” They also observed that changing to 3 Mio € or 5 Mio € only had a minor impact on the final results. This does not explain, however, why should be limited to this range. As can be seen from Table 1, there actually are noticeable differences for the resulting top percent share as varies. It is therefore desirable to have a well-defined criterion for choosing .
A natural way to restrict the choice of is by imposing the demand of continuity upon the estimated density function . As can be seen from Fig. 1, this is not a reasonable assumption for the histogram estimator from the HFCS data, which is discontinuous itself. It is reasonable, however, for the kernel density estimator given by Eq. (6). This leads to the ansatz
which is to be solved for numerically. As Eq. (28) generally has more than one solution , the smallest zero greater than should be chosen. For the averaged HFCS data for Germany and the normalization with (20) or (21), this leads to € when the kernel density bandwidth is automatically selected from all data points greater than with the method by Sheather & Jones444This is implemented as method “SJ” in the R function density , which yields a bandwidth of 55 551 €.
Mean, standard deviation, MSE, and goodness-of-fit criteria KS and CM of the three estimators for data randomly drawn from a Pareto distribution with true value . is the proper linear regression estimator given by Eq. (16), and is a linear regression estimator with an additional intercept term.
Section 4.1 gives different possible estimators for the power , and the Monte-Carlo study  favors the maximum-likelihood estimator given by (14) over the regression estimator given by (16), while  favors the regression estimator. The estimator based on van der Wijk’s law given by (13) has not yet been studied in the literature.
Therefore, bias and mean-squared-error of these three estimators was first compared in Monte-Carlo experiments. The best performing estimator in these experiments, which was , was then used to estimate both the power and the wealth share of the richest percent.
5.1 Estimator comparison by simulations
To compare the three estimators, 5000 sample values were drawn from a Pareto distribution with the function rpareto from the R package actuar . The parameter was set to 1.5 because this had turned out to be a cross-country constant in different studies according to Gabaix , and was set to 0.5 Mio €. This was repeated 1000 times to estimate bias and mean squared error (MSE) of the three estimators. Two additional variants of the experiments were implemented:
to simulate the cutoff in the HFCS data, values greater than 75 Mio € were rejected
to simulate the HFCS weighting process, weights were added that represent the exact area under the Pareto density in a cell around each sample value
As both variants could be applied or not, this lead to four different simulations. The results are shown in Table 2: the maximum-likelihood estimator showed the smallest MSE in all cases, and the estimator based on van der Wijk’s law the largest. According to these results, the maximum-likelihood estimator is the most preferable.
It is interesting to note that adding an intercept term in the linear regression did not improve the least squares fit, but lead both to a larger MSE and and to poorer goodness-of-fit values compared to a linear regression without an intercept term (see the values for in Table 2). This seems surprising at first sight because one would expect that an additional fit parameter improves the fit. This does not hold in this case, however, because the model with the additional parameter no longer represents the underlying distribution.
|averaged HFCS||Manager Magazin|
|KS = 0.023479||1.4735||561400 €||KS = 0.057548||1.5168||500 Mio €|
|CM = 0.000093||1.4877||559100 €||CM = 0.000764||1.5777||550 Mio €|
|KS = 0.025494||1.5179||538292 €||KS = 0.077164||1.4377||500 Mio €|
|CM = 0.000111||1.5091||557932 €||CM = 0.000679||1.4377||500 Mio €|
|KS = 0.028566||1.5315||538292 €||KS = 0.078456||1.4326||500 Mio €|
|CM = 0.000133||1.5313||535300 €||CM = 0.000687||1.4326||500 Mio €|
|KS = 0.040020||1.6032||546800 €||KS = 0.066083||1.6255||550 Mio €|
|CM = 0.000282||1.5955||540500 €||CM = 0.000893||1.6255||550 Mio €|
5.2 Estimation of
Table 3 shows the results of the different estimators on the averaged HFCS data for Germany. The values for have been determined by selecting the best fit (highest KS, or CM, respectively) from all wealth values in the data. The ranking of the different estimators is the same as in the simulations in the preceding section: the maximum-likelihood estimator shows the best goodness-of-fit with respect to both criteria, and the estimator based on van der Wijk’s law shows the poorest goodness-of-fit. In agreement with Fig. 4, both goodness-of-fit criteria yield similar results for . It should be noted that the value reported by Bach et al. in  is the value obtained by linear regression with an intercept term.
Table 3 also gives the results for the Manager Magazin rich list for Germany. It is interesting to note that is higher than for the HFCS data, while is lower, albeit not as low as reported by Bach et al. in . This discrepancy stems from the way in which Bach et al. used the rich list: they did not aggregate the household data by wealth, such that each value only had weight one, with the same wealth listed as often as the number of households. This has no effect on , but introduced systematic bias into . The results reported for by Bach et al. were therefore systematically too small when the Manager Magazin data was included in their calculation.
The estimators for obtained from the rich list are less reliable than the estimators obtained from the HFCS data for two reasons:
The rich list data with only represents about 200 households, while the HFCS data with represents about 2.5 Mio households. This does not only make the fitting process less robust, but also means that missing data in the rich list have a stronger effect.
The wealth values in the rich list are only rough estimates and there is a considerable gap between and . This makes the exact placement of the threshold and thus also the estimator for less accurate than for the tightly lying HFCS data. The choice Mio € instead of 500 Mio €, e.g., which includes exactly the same number of households because it is the mid point between 500 Mio € and the next reported wealth, leads to .
It is therefore advisable to use the rich list data only in combination with the HFCS data, thereby leading to a refinement of the estimator obtained from the HFCS data. For reasons explained in section 4.1, only the maximum-likelihood estimator can be used for combined data.
|KS||561 400 €||1.4735||1.4723|
|avg||CM||559 100 €||1.4877||1.4865|
|KS||561 500 €||1.4664||1.4652|
|1||CM||558 000 €||1.4810||1.4798|
|KS||564 030 €||1.4802||1.4790|
|2||CM||563 430 €||1.4827||1.4814|
|KS||597 000 €||1.4680||1.4666|
|3||CM||595 800 €||1.4691||1.4677|
|KS||619 800 €||1.4499||1.4485|
|4||CM||618 000 €||1.4604||1.4590|
|KS||561 500 €||1.4454||1.4443|
|5||CM||561 500 €||1.4622||1.4611|
The results for the combined fits are shown in Table 4. Note that the “average” value is not the average of the values given below, but the maximum-likelihood estimator obtained from the averaged individual wealth values. That the resulting estimator from the averaged data with the Cramer-van Mieses goodness-of-fit criterion is greater than the estimator for each individual dataset is surprising, but can be explained by the highly non-linear relationship between the and and .
Taking the Manager Magazin rich list into account decreases , albeit only marginal: these changes to the fourth significant digit are smaller than the variation due to the variety of imputations, which affects the third significant digit. Bach et al. reported in  a much greater decrease of when the rich list data was taken into account (from 1.535 to 1.370), but their results were based on a least squares fit on the data points with the assumption that the missing data between the highest HFCS and the lowest Manager Magazin wealth had zero weight. Such an assumption leads to additional bias of the least squares fit estimator . The maximum-likelihood estimators reported in Table 4 do not suffer from this problem and lead to the conclusion that lies between 1.443 and 1.481 for Germany, depending on the imputation variant.
5.3 Estimation of one percent share
Table 5 shows the estimated wealth share of the richest percent in Germany when the different ways for normalizing the Pareto distribution are applied: stands for the method by Bach et al. given by Eq. (20), for the method by Eckerstorfer et al. given by Eq. (21), and for the new method utilizing the rich list as given by Eq. (27). The transition thresholds have been determined with the continuity condition (28). These have been computed for all five imputed datasets and the averaged dataset separately, but are given in Table 5 only for the averaged dataset.
|(avg)||820 790||839 135||1 791 065|
The results for normalizations and are comparable, but the shares obtained with the normalization based on the rich list are 4 to 6 percentage points higher. This is not unexpected, because the normalization methods by Eckerstorfer et al. and by Bach et al. both remove weights above and assign it to higher wealth. The underlying assumption is that the total number of weights assigned to wealths greater than (or greater than in Bach’s case) is estimated correctly in the HFCS data. The new method based on the rich list assumes, however, that the number of households in the rich list is correct instead, which means that the total number of households with high wealth is estimated too low in the HFCS data. The author considers the latter to be a more realistic assumption; actually this is the main reason for using the rich list at all.
The maximum-likelihood estimator for the power of the Pareto distribution showed the best performance both in the Monte-Carlo simulations and the goodness-of-fit tests on the HFCS data, and it is therefore recommended for fitting a Pareto distribution to wealth survey data. Combining the HFCS data with data from a rich list has little impact on the maximum-likelihood estimator for . Neither does the resulting wealth share of the richest percent increase when rich list data is taken into account only for estimating the power . If the rich list is used, however, for estimating the normalization constant in the Pareto distribution, the wealth share increases about four percentage points in comparison to an estimation of this normalization constant from the HFCS data.
The new method for determining the transition threshold by imposing a continuity condition on the wealth distribution density provides a way for removing the arbitrariness of this parameter. It relies on a kernel density estimator for the wealth density. Theoretically, this kernel density estimator can also be utilized for computing the top percent wealth share by numeric integration. It would be interesting to investigate whether this would make the dependency of the wealth share on the choice of less noisy.
The author is grateful to the ECB for providing the HFCS survey data, and to Andreas Thiemann from the DIW for providing the extended Manager Magazin rich list and for explaining some aspects of the DIW study .
-  R. Lucas, “The industrial revolution: past and future,” Annual Report Essay 2003, Federal Reserve Bank of Minneapolis, May 2004.
-  R. H. Wade, “The Piketty phenomenon and the future of inequality,” Real-World Economics Review, vol. 69, pp. 2–17, 2014.
-  T. Piketty, Capital in the Twenty-First Century. Cambridge: Harvard University Press, 2014.
-  R. Wilkinson and K. Pickett, The Spirit Level: Why Greater Equality Makes Society Stronger. London: Allen Lane, 2009.
-  J. Stiglitz, The Price of Inequality. London: Penguin, 2012.
-  European Central Bank, “The eurosystem household and consumption survey - methodological report for the first wave,” Statistics paper 1, European Central Bank, Frankfurt, April 2013.
-  A. Kennickell and D. McManus, “Sampling for household financial characteristics using frame information on past income,” Working paper, Federal Reserve Board, August 1993.
-  P. Vermeulen, “How fat is the top tail of the wealth distribution?,” Working paper 1692, European Central Bank, Frankfurt, July 2014.
-  S. Bach, A. Thiemann, and A. Zucco, “The top tail of the wealth distribution in Germany, France, Spain, and Greece,” Discussion paper 1502, Deutsches Institut für Wirtschaftsforschung, Berlin, 2015.
-  P. Eckerstorfer, J. Halak, J. Kapeller, B. Schütz, F. Springholz, and R. Wildauer, “Correcting for the missing rich: An application to wealth survey data,” Review of Income and Wealth, 2015.
-  F. Gisbert and J. Goerlich, “Weighted samples, kernel density estimators and convergence,” Empirical Economics, vol. 28, no. 2, pp. 335–351, 2003.
-  A. Clauset, C. R. Shalizi, and M. E. Newman, “Power-law distributions in empirical data,” SIAM Review, vol. 51, no. 4, pp. 661–703, 2009.
-  S. Sheather and M. Jones, “A reliable data-based bandwidth selection method for kernel density estimation,” Journal of the Royal Society series B, vol. 53, pp. 683–690, 1991.
-  C. Dutang, V. Goulet, and M. Pigeon, “actuar: An R package for actuarial science,” Journal of Statistical Software, vol. 25, no. 7, p. 38, 2008.
-  X. Gabaix, “Power laws in economics and finance,” Annual Review of Economics 2009, vol. 1, pp. 255–293, 2009.