Mind the wealth gap: a new allocation method to match micro and macro statistics for household wealth

01/04/2021
by   Michele Cantarella, et al.
Helsingin yliopisto
0

The financial and economic crisis recently experienced by many European countries has increased demand for timely, coherent and consistent distributional information for the household sector. In the Euro area, most of the NCBs collect such information through income and wealth surveys, which are often used to inform their decisions. These surveys, however, can often suffer from biases, usually caused by non-response and under-reporting behaviours, leading to a mismatch with macroeconomic aggregates. In this paper, we develop a novel allocation method which combines information from a power law (Pareto) model and imputation procedures so to address these issues simultaneously, when only limited external information is available. We provide two important contributions: first, we adjust the weights of observed survey households for non-response bias, then, we correct for measurement error. Finally, we produce distributional indicators for four Euro-Area countries.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 27

09/24/2018

Preserving the distribution function in surveys in case of imputation for zero inflated data

Item non-response in surveys is usually handled by single imputation, wh...
10/03/2018

Surveyor Gender Modifies Average Survey Responses: Evidence from Household Surveys in Four Sub-Saharan African Countries

Relatively little is known regarding the influence of surveyor traits on...
11/16/2017

Adjusting for selective non-participation with re-contact data in the FINRISK 2012 survey

Aims: A common objective of epidemiological surveys is to provide popula...
10/04/2020

Efficient multiply robust imputation in the presence of influential units in surveys

Item nonresponse is a common issue in surveys. Because unadjusted estima...
10/27/2021

A Shiny Application for Conducting Electronic Surveys Using Randomized Response Techniques

Randomized response techniques (RRT) are useful for collecting informati...
04/15/2021

Micro-Estimates of Wealth for all Low- and Middle-Income Countries

Many critical policy decisions, from strategic investments to the alloca...
09/29/2018

A Survey of e-Biodiversity: Concepts, Practices, and Challenges

The unprecedented size of the human population, along with its associate...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The financial and economic crisis recently experienced by many European countries has increased demand for timely, coherent, and consistent distributional information relating to household income and wealth. Such information is receiving high priority especially in the agenda of national central banks (NCBs) which use it in several ways (hfcs2009). Distributional information is used for financial stability purposes, for example, to assess how much debt is concentrated in the hands of financially vulnerable households (see, for instance, AMPUDIA2016; qef_369_16)

. Moreover, distributional information allows to estimate the aggregate consumption response to wealth shocks when individual responses are heterogeneous

(Pa2007; GPV) and, more generally, to understand the interplay between monetary policy measures, especially non-standard ones, and income and wealth inequality (cas; joes; COIBION201770). Recent years have also been characterised by a surge of interest in the study of the dynamics of wealth accumulation over the last century (gp2018; Alvaredo2018; Frmeaux2020).

Sample surveys are the main source of distributional information on household wealth. In the Euro area, most NCBs conduct the Eurosystem Household Finance and Consumption Survey (HFCS) which collects harmonized household-level data on households’ finances and consumption (hfcs2009).

The second source of information relating to household wealth that is produced by NCBs is the system of financial accounts (FAs) of the household sector, which reports the total financial asset holdings and liabilities of all households within a country.

In theory, since the HFCS is designed to be representative of all households, aggregating this microdata should correspond to the macro aggregates. In practice, however, differences are large: aggregate totals based on surveys are often substantially below the totals to be found in national accounts. Before using the distributional information from survey data, it is, therefore, crucial to explain and possibly eliminate the differences between the two sources of information.111In 2015, the European System of Central Banks (ESCB) has established an expert group with the aim of comparing and bridging macro data (i.e. national accounts/financial accounts) and microdata (i.e. the Household Finance and Consumption Survey) on wealth.

There are several possible reasons for the differences (egdna). From the survey side, two relevant issues are unit non-response and measurement errors. There is substantial evidence that the decision whether to participate is not at random. In particular, rich households are known to have a higher non-response rate (Chakraborty19; AK2008; Ken19; vermeulen2018fat). Since these households have a large proportion of wealth concentrated in their hands, the presence of differential non-response is likely to lead to an under-representation of wealthy households in surveys and eventually, as the response rate approaches zero, truncation at the very top of the survey.

Moreover, wealth surveys generally include both complex and sensitive items. As a consequence, respondents are not always able or even willing to report the correct amount of wealth they hold. Similar to non-response, measurement error is not random and differs across population subgroups and portfolio items (Neri15; Ran11).

The ideal solution for overcoming these problems would be to link survey data with administrative records (such as tax records or credit registers, as in blanchetetal2018; gp2018; garbinti2017accounting). Alternative approaches to data linkage are directly based on the direct use of wealth (tax) records (Alva2009) or on the use of capital income information from tax records to construct wealth estimates assuming certain rates of return on wealth (saez16).

Unfortunately, when such administrative records exist and are not limited in scope, they are not usually available for confidentiality reasons. Because of that, authors of the recent literature have developed methods to combine survey data with the limited external information publicly available, such as aggregate totals from national accounts or rich lists, in order to produce distributional indicators. vermeulen2018fat; Vermeulen2016 uses Forbes World Billionaires lists in combination with some wealth surveys to estimate using a Pareto model the total wealth held by rich households. He found that the use of the rich list increases the quality of the results (compared to estimating a Pareto model from survey data alone). Building on this approach, Chakraborty19, Waltl2018, and chakraborty2018missing extended the analysis by benchmarking survey results to the National Accounts. In another recent study, Bach2019 implemented these methodologies to impute rich list data to wealth surveys.

The common characteristic of all these studies is that the main reason for the gap between micro and macro data is assumed to be unit non-response of very rich, and often unobserved, households (i.e. the "missing rich").

This paper contributes to this recent strain of literature which tries to produce distributional indicators of wealth using survey data that are consistent with the national accounts. We present a methodology that draws on existing and well-established methods. We contribute to the literature in four ways.

First, by focusing on the missing part of the tail, previous studies held existing survey observations as representative. In our case, we have reasons to believe that differential non-response also affects the representativeness of existing survey observations, which in turn affects estimates for the total number of households in the Pareto tail, and their total wealth. We contribute to the literature by implementing a correction for differential non-reponse that accounts for the missing rich, but focuses on observed survey households. By any means, this correction does not substitute the imputation (Bach2019) or simulation (Waltl2018) procedures developed in the literature, but rather complements them allowing for the correction of non-response bias among existing survey observations.

Second, while existing papers only focus on non-response at the tail of the distribution, we present a methodology that allows us to correct also for measurement error. Dealing with both aspects simultaneously is important, even when the research purpose is to estimate the share of total wealth held by the top of the distribution. Indeed, some rich households may not be willing to report their true wealth and therefore they could be misclassified in the survey. Failure to select rich households correctly may bias the results, whatever the adjustment method is used. Furthermore, an advantage of our approach is that it enables us to compute distributional indicators that refer to "non-rich" households, such as those relating to financial vulnerability.

Our third contribution is that even if we apply methods that are well-established (such as the Pareto model, imputation, and calibration) we show how to combine and use them in a single framework.

The fourth contribution is to produce a modified and readily usable dataset in which survey values are adjusted for the above-mentioned quality issues and, by construction, the totals add up to the national accounts. While the existing papers are mainly focused on methods to estimate of total wealth held at the top, our adjusted dataset can be used for estimating any distributional indicators that may be of interest.

The paper is structured as follows. Section 2 describes the data sources used in our application and motivating example. Section 3 presents the Pareto approach (section 3.1) and calibration (section 3.2) and the methodologies we use to combine them in a single approach (sections 3.3, 3.4, 3.5). Section 4 describes the tools used to assess the properties of the proposed methods. Section 5 describes how the method is applied to our data, while section 6 discusses the results and the main findings of the application. Section 7 provides some conclusions and lines for future research.

2 Data

This paper makes use of the Household Finance and Consumption Survey (HFCS) and two sources of auxiliary information, that is the national accounts which include both financial and non-financial accounts, and rich list data.

The Household Finance and Consumption Survey (HFCS) is a joint project of all the national central banks (NCBs) of the Eurosystem and several national statistical institutes (NSIs). The survey collects detailed household-level data on various aspects of household balance sheets and related economic and demographic variables, including income, private pensions, employment, and measures of consumption. The HFCS is conducted in a decentralised manner. The European Central Bank (ECB) in conjunction with the HFCN coordinates the whole project, ensuring the cross-country comparability of the final data.

We use the second wave of HFCS (2014) and we restrict our analysis to four countries: Italy, France, Germany, and Finland. This choice is motivated by two considerations. First, rich lists and non-financial accounts are available for this subset of countries. Second, these surveys present methodological differences that can be used to evaluate our method. For example, some countries over-sample rich households using individual tax records (as in the French and Finnish survey) or using the information at the regional level (as in the German one), while others do not over-sample (as in the Italian case). Moreover, in some cases, the survey is linked with administrative data (as in the Finnish one). In both cases of over-sampling and use of administrative records, we should expect a lower effect of the adjustment method.

Our variable of interest is household net wealth defined as the sum of deposits, bonds, shares, mutual funds, money owed to the household, the value of insurance policies and pension funds, business wealth, and housing wealth, minus debts.

The second source of information is national accounts. For financial assets, we use financial accounts. These are produced by NCBs and provide the total financial assets and liabilities held by households, classified by financial instrument, in order of liquidity based on the original maturity and negotiability (from cash to deposits and insurance and pension instruments). Non-financial accounts are produced by NSIs and contain the total value of dwellings, other buildings and structures, and land owned by households. Even if national accounts figures may suffer from quality issues and may adopt different concepts and definitions from the ones used in the survey, we use them as a benchmark to correct survey data.

Rich lists are our third source of information. They have already been used in the literature to adjust for missing rich households (vermeulen2018fat; chakraborty2018missing). Their use may generate concerns since the methodology adopted is often obscure, and usually, only figures for net worth are provided, with no financial instrument breakdowns. Some studies have tried to overcome these issue by using different types of Pareto adjustments (blanchet2017generalized; Waltl2018). Other studies (such as Schrder2019) have also explored new ways of sampling high-wealth individuals with adequate precision. However, these methods can only be employed in specific instances when information on these households exists and is easily accessible. When these sources are not available, rich lists remain a reliable alternative to this kind of auxiliary information, and evidence from Waltl2018 indicates that, after the integration with rich lists, there might be little difference between the wealth estimated by different Pareto adjustments.

In our case, we use wealthy household data from the 2014 Forbes’ Billionaires List. This information has been replaced by that from larger region-specific lists, such as 2014 editions of Challenges’ "Les 500 plus grandes fortunes de France" for France, Manager Magazin’s list for Germany and Arvopaperi’s list for Finland, when available. We also adjust this rich list data by estimating the debts and portfolio composition, based on portfolio shares from top wealth observations in the HFCS.222This is a simplifying assumption. An improvement over this form of portfolio allocation can be offered by the approach used in chakraborty2018missing.

In this way, estimates for portfolio compositions among top fortunes can be obtained, and rich list data can be fully integrated with the HFCS for estimation purposes.

3 Methodology

Let be household net wealth and the population total to be estimated using survey data. Let be the Horvitz-Thompson estimator, where is the sampling weight and the net wealth for each individual household in the sample of respondents , ordered by net wealth rank.

Because of unit non-response and measurement error the expected value of the Horvitz-Thompson estimator is generally lower than , the corresponding macro aggregate. Unit non-response occurs when some households refuse to participate to the survey. If this decision is related to household wealth (i.e richer households are more difficult to enrol than others) the sample of respondents may not represent adequately the upper tail of the distribution. Measurement error happens when the information collected in the survey is different from the true unknown value . The error term may depend on many factors such as the difficulty of respondents to recall the required information or their unwillingness to report their true wealth.

Our methodology to address these issues is based on two techniques that are well-established in the literature. We use the Pareto distribution to compensate for the low coverage of rich households in the survey (section 3.1), and the calibration methods commonly used in survey sampling to deal with the issue of measurement error (section 3.2).

The two correction methods are dependent on each other and they can be implemented simultaneously. The Pareto correction starts with an assessment of the rich households available in the survey. Because of measurement error, some households could be misclassified and therefore a preliminary calibration adjustment is required. On the other hand, calibration is used again for the adjustment for measurement error across the whole distribution, requiring that the survey represents adequately the upper tail of the distribution.

Our solution to conduct the two adjustment simultaneously is to run them in an iterative process, based on the procedure described in the following sections.

The final product of the methodology is an adjusted survey data set with total estimates of net wealth, real assets, financial assets and liabilities that match the aggregate figures in the national accounts balance sheet. This data set can be used to compute several the distributional indicators of interest.

Before applying the method, we reclassify some definitions of wealth items used in the survey data in order to remove as many of the conceptual differences with national accounts as possible (see for instance eglmm; Chakraborty19). In particular, we remove from national accounts totals the wealth held by non-profit institutions serving households (NPISHs), and we only focus on the items with the highest level of comparability.

3.1 Pareto tail estimation

The Pareto adjustment assumes that, over a certain wealth threshold , the complementary cumulative distribution (CCDF) of wealth is approximated by a power law, which (for ) can be expressed as:

(1)

where the parameter indicates the shape of the tail. The lower the value of , the fatter is the tail, and the more concentrated is wealth.

The first step of the adjustment is the estimation of the threshold . Previous research has often adopted the arbitrary threshold of €1 million and, as a robustness check, of 1.5 or 2 million. We relax this assumption by using a less arbitrary method, based on the properties of the mean excess function (2) (Yang1978):

(2)

with . The expectation expressed by this function is estimated with the weighted mean of the deviation from for all observations whose wealth exceeds . Essentially, every value of is treated as a possible threshold when the corresponding expected value of is estimated.

A useful property of this function is its linearity in if the distribution is Pareto (Yang1978; DavisonSmith1990). Following from this property, we estimate for each value of in , and then we find the threshold after which the mean excess function is linear on . This can be achieved by selecting the value

for which the R-squared of the linear regression of

on is maximised (Langousis2016).

It is worth stressing that the threshold is the point where the Pareto distribution starts, which differs from the truncation point () after which the survey has no rich households. Indeed, survey data will generally include observations in the bottom part of the Pareto while missing those at the very top of the distribution. In the presence of truncation, the relationship between mean excesses and wealth will turn to take a downward bias the closer we approach the truncation point (see Aban2006). To account for this issue, we weight the regression on by the sum of survey weights for all .333As a robustness check, we also used the 1 million threshold to estimate the Pareto shape parameter and ran all our adjustment methods afterward. These conservative estimates are very close to the ones obtained using our estimated threshold and are available on request.

After the threshold is found, the shape parameter can be estimated using the method described in vermeulen2018fat.

Define as the sub-sample of respondents with wealth higher than . The rich list and the sample are appended creating a new file with observations. For simplicity, we will drop the sample subscript from now on. Households are again ordered by wealth rank where the lower the rank the higher household wealth. So the rank of the richest household in the sample is one, the rank for the second richest is two, and so on until household , whose wealth equals the threshold , is reached.

Survey weights are taken into account by assigning to observations in the rich list weight , while survey observations (a subset of the sample ) retain their original survey weight. Denote by the average survey weight of all observations in sample (i.e. ). Denote the sum of all weights as , representing an estimate of the number of households that have wealth at least as high as . Define the average weight of the first sample points (i.e. ).

Linear estimates for can then be obtained through the following least squares specification (see also Gabaix2011):

(3)

As discussed earlier, chakraborty2018missing and Waltl2018 showed that this estimator produces unbiased and consistent estimates of when information on top tail observations is provided. The rich list sample is only used for the estimation of the Pareto tail parameters and . Afterward, the adjustment method is applied to survey sample .

The third step of the adjustment consists of estimating the total wealth in the top tail by multiplying the total number of rich households resulting from the sample by the mean of the estimated Pareto distribution (given by for ). We will later use this information to calibrate the sampling weights of rich households in the survey to the total wealth implied by the Pareto adjustment.

This approach assumes that the sample estimate of

(the total number of rich households) is unbiased. Indeed, some households have zero probability of being included in the survey (the missing tail from now on) after wealth reaches the truncation point

. This may be due to the difficulties in contacting such rich households to even negotiate an interview, or to a specific decision by the data producer to exclude them for operative or confidentiality reasons. Appended rich list observations will rarely be representative of all missing households. Also, the presence of differential non-response will imply that observed households in the Pareto tail are also under-represented as the probability of a household being interviewed approaches zero the closer its wealth is to the truncation point.

As a result of the underestimation of households in the Pareto tail, estimates for total wealth in the tail will also be underestimated. We then propose a novel method for the estimation of the number of missing rich households and their wealth.

Consider the sample of Pareto-tailed households ordered by their wealth, and recall that

is the truncation point above which there are not rich households in the sample. Following from the Glivenko-Cantelli theorem, because of the truncation the empirical cumulative distribution function resulting from this sample is different from the theoretical distribution implied by the Pareto adjustment.

In particular, the following relation holds:

(4)

where is the sum of weights of all households richer than () and is the sum of the survey weights of observations in the survey Pareto tail (so that ). This relation means that the empirical CDF will always suffer from a bias equal or larger than zero since units whose wealth exceeds are unobserved.

The theoretical Pareto CDF can then be used to correct the survey-based estimate by dividing the cumulative sum of survey weights for any point by the value of the Pareto CDF at that point:

(5)

Analytically, the estimate from equation (5) should be the same for each -th observation in the tail. In practice, with empirical data, variability in survey weights will affect the estimate of the number of households in the tail. Because of differential non-response, this becomes a particularly relevant problem when weight quality can deteriorate the closer observed wealth gets to the truncation point. The estimate can then be improved by estimating for each value of wealth over a range of top tail observations, then estimating the mean as follows:

(6)

An estimator of the number of missing, unobserved, households after the truncation point can be computed as . To account for these missing households, the total of observable households will be estimated as .

Finally, the total wealth in the top tail can be estimated by the product of the estimated number of households and the Pareto mean:

(7)

Wealth in the missing part of the tail can similarly be computed as: , setting the new threshold at the truncation point .444This is possible because the Pareto shape parameter does not change along the Pareto distribution.

3.2 Calibration

Calibration is a method whose aim is to correct the sampling weights through re-weighting methods while keeping the individual responses unchanged (DevilleSarndal92; sarndal2007calibration). In the literature, this approach is referred as design-based and it is mainly used: to force consistency of certain survey estimates to known population quantities; to reduce non-sampling errors such as non-response errors and coverage errors; to improve the precision of estimates (haziza2017construction).

Alternatively, the so-called model-based approach aims at adjusting the individual responses collected through the survey while sampling weights are left unchanged. It requires a model for the distribution of the measurement error and auxiliary information to estimate the parameters of the model. Among the several models available in the literature, those most suitable for our purposes are imputation methods. For a general description, see the seminal works by Rubin (Rubin76, Rubin1987).

The two approaches have some shared traits, so that the distinction is not always clear-cut. For example, the weighting adjustment can also be seen as a method of imputation consisting of compensating for the missing responses by using those of the respondents with the most similar characteristics; in the same way, the imputation of plausible estimates in lieu of respondents’ claimed values can be thought of as a re-weighting method.

The choice of the method of adjustment is driven by three factors. First, it depends on the estimator of interest. For example, if the interest is to estimate the share of total wealth held by rich households, the use of the Pareto method (as described in section 3.1) could be sufficient. Second, the choice depends on the magnitude of the gap to fill and the reasons behind it. If the gap is considerable and depends on both measurement error and non-response, one single approach may not be sufficient. Therefore, one may need to combine several methods. Finally, the choice depends on the information that is available. If, for example, the only available auxiliary information is in the form of population totals, then the calibration approach might be the only feasible way. However, if auxiliary data are available at the individual level, then the model-based methods may represent the most effective solution.

In our case, the design-based approach provides a standard approach to the solution of the non-response problem through survey calibration as we intend to adjust the survey’s weighted empirical distribution function to account both for the truncation and the decay in quality with increasing wealth.

In the design-based approach, the calibration method for estimating the population total of a variable of interest is addressed through the following optimisation problem for finding a new set of weights :

(8)

where , is a distance function between the basic design weights and the new calibrated weights, are known constants the role of which will be discussed in more detail later, and

represents an auxiliary variable, possibly vector valued. The adjustment factor

is a function of the value on the sample of the variables used in the calibration procedure , and it is computed so that final weights meet benchmark constraints, , while, at the same time, being kept as close as possible to the initial ones. Closeness can be defined by means of several distance functions (see table 1 in DevilleSarndal92), the most common being the chi-squared type

(9)

for which an analytical solution always exists. The benchmark constraints are defined with respect to , that is the known vector of population totals or counts of the calibration variables.

The final output is a single new set of weights to be used for all variables. The magnitude of the adjustment factors and therefore the variability of the final set of weights is a function of the number of constraints (dimension of the vector ) and the imbalance (the difference between the Horvitz-Thompson estimate and the population total). Very variable weights hinder the quality of final estimates for sub-populations and for variables that are not involved in the calibration procedure. For these reasons, weights are usually required to meet range restrictions such as to be positive and/or within a chosen range. This can be achieved by suitably choosing and tuning the distance function .

The method was originally proposed to improve the efficiency of the estimators and to ensure coherence with population information, but then it was also largely applied to adjust for non-response (Sarndal2005). For example, little2005does

showed that if the variables used to construct the weights are associated both with non-participation and with the variable of interest, the bias and the variance of the estimator are reduced.

The main problem with the use of household balance sheet data in re-weighting methods is that wealth is generally skewed and concentrated in the hands of a small group of the population that has both low propensity to participate in the survey and different socio-demographic characteristics from the average population.

3.3 Adjusting for non-response: Pareto-calibration

We begin by exploiting the information obtained after fitting a Pareto distribution, as in subsection 3.1, to adjust the wealth distribution in the survey for differential non-response using the calibration methods described in section 3.2.

We proceed by using and and equation (6) to estimate the total number of observable households over the threshold and their total net wealth (and the corresponding figures for households below the threshold ).

We then calibrate the sampling weights from sample using the following constraints:

(10)

where is the estimated number of observed households in the Pareto tail, relates to the observations not in the tail, is the estimated observable wealth in the Pareto tail, is a vector of Horvitz-Thompson estimators decomposing the initial wealth of observations below the threshold into their corresponding portfolio items,555Calibrating weights in the bottom part of the distribution to the initial, unadjusted, wealth in that part of the survey, average wealth among these observations will increase. To account for this issue, the calibration benchmark could be adjusted by subtracting . However, the disparity between the number of households in the Pareto tail and the ones in the bottom part of the distribution is so large that this adjustment is unlikely to affect our analysis. Therefore, in order not to over-stress the computational requirements of the model and focus on the part of the distribution where the effect of Pareto-calibration is significant, wealth in the non-Pareto part of the survey has been kept fixed. and is a vector of population counts for demographic characteristics.

Let the indicator variable for and otherwise, then set the auxiliary variables vector for calibration to

(11)

After calibrating survey data to these parameters, we obtain non-response adjusted weights . This approach will be referred as ‘Pareto-calibration’ from now on.

Should the survey be suffering from differential non-response issues only, this step might be sufficient to fill the gap with financial accounts. However, this is not always the case: provided that the we have a good approximation of wealth distribution in the tail, the remaining differences in coverage between the estimate obtained in equation (7) and the national accounts will then be left to measurement error.

3.4 Adjusting for non-response and measurement error: Simultaneous approach

In order to correct for measurement error, we combine the adjustment for differential non-response described in subsection 3.3 with the following procedure.

The first step is to run the Pareto-calibration adjustment, as described earlier. Let be the final weight from the non-response adjustment procedure.

As second step we run a calibration procedure as in (8) in which the ’s are considered to be the basic weights and the set of benchmark constraints are given by the macro aggregates. The adjustment factor , for , obtained by this procedure is such that

(12)

We apply this adjustment factor directly to the variables of interest so that

(13)

This approach shares similar traits with reverse calibration introduced by chambers2004outlier

to deal with outlier-robust imputation.

Recall that is vector-valued. Then, note that this calibration is multivariate because it accounts for all constraints with respect to macro estimates in a single procedure and, therefore, it accounts for the multivariate structure of the variables included in . In addition, every household has a different adjustment factor that depends on all the values of .

A special case of multivariate calibration is proportional allocation, which consists of allocating the gap by multiplying each component of by the corresponding inverse of the item-specific coverage ratio.666In fact, if we focus on a single item, , the adjustment factor used by proportional allocation can be obtained as the solution to a univariate calibration procedure in which the starting weights are again the ’s, there is only one benchmark constraint , and the distance function is chi-squared as in (9) with constants . The proof is omitted for brevity, but it is close in spirit to Example 1 in DevilleSarndal92

This equivalence sheds some light on the role of the constants ’s in the distance function ((8)). In univariate calibration, if they are chosen to be the inverse of the variable in the constraint, then the adjustment factors are shrunk towards a common value for all households as in proportional allocation. On the contrary, if they are set to be constant, the adjustment factors would be roughly proportional to the values of the item. For this reason, in the proposed multivariate calibration for imputation, we have set the constants to possibly depend on the wealth of the household, that is

(14)

where can be seen as a shrinkage factor: larger values provide adjustment factors that are more uniform across households, while values towards 0 provide adjustment factors with a higher variability and correlation with .777For this work, we set . Future research might seek to retrieve information on using external data where no misreporting behaviour is present.

In order to account for the missing wealthy households, we add a single observation with weight and wealth is created and imputed at the top of the sample. This observation’s portfolio is also allocated using portfolio shares in the Pareto tail of the distribution.

At the end of the multivariate calibration the gap is filled. However, the distribution of has changed, because its components have changed. Some households which were initially classified as not rich may have moved in the top tail of wealth distribution. Therefore, we need to find the new Pareto threshold, and apply again the Pareto-calibration procedure described earlier. This requires an iterative procedure that alternates a Pareto-calibration step that improves coverage and a multivariate calibration step that addresses measurement error. The two steps are iterated until convergence. Convergence has been set on the parameter of the Pareto distribution: if the estimated values in two consecutive steps differ by less than a small predefined threshold the procedure stops 888It is worth stressing that the converge of the process could also not be achieved, especially in the case the gap to be filled is sizeable..

3.5 A special case: Single-iteration approach

If one is willing to assume that (1) that relative error is independent from the observed wealth, at least among the very rich, and that (2) the relative error converges in probability to a constant, which we will denote , so that, on average, the unobserved ‘true’ total wealth will be given by , provided that , the method simplifies.

Thanks to Slutsky’s theorem, survey wealth would still be Pareto distributed with tail parameter after adjusting for measurement error. As it follows, total wealth in the survey would scale up to , and the Pareto CDF would turn into .

Simplifying this last formula and updating equation (7) for measurement error, we obtain the following estimate for total wealth:

(15)

This means that our estimate for does not depend on the scaling of the variables. In this case, the coefficient for the Pareto-adjusted coverage ratio, given the national accounts total wealth, as in , will yield the scalar to which to re-allocate reported survey wealth. It is straightforward that, to account for the missing wealth, wealth should be scaled to , which, after Pareto-calibration, simplifies to .

As the Pareto shape parameter is unaffected by the re-scaling, the iterative procedure would no longer be needed. The adjustment for measurement error and for non-response at the tail of the distribution can be run independently from each other.

In theory, because of the assumptions above mentioned, whatever the adjustment method for measurement error is used, the final data should still be Pareto distributed among rich households.

In practice, if one wants to make sure that this is the case, it is advisable to correct for measurement error using calibration in a slightly different manner than the one described in section 3.2. Traditional calibration methods find the optimal adjustment factor which minimises the quadratic distortion of new weights relative to prior ones. We propose to change the objective function so that the adjustment factor

is minimised with respect to a quadratic loss function for reported wealth values, as follows:

(16)

In this method, the correction for measurement error is based on univariate calibration using total wealth as a sole benchmark. As the objective function minimised distortions relative to the initial reported value, the final imputed data will be Pareto distributed.

4 Assessment of the method

The ideal approach for assessing the quality of the results would be to compare them with an external benchmark, for instance, coming from highly reliable administrative records. Without such auxiliary information, we can assess the method in two ways. First, we assess the robustness of our results by comparing them with other estimators based on different assumptions. Second, we assess the precision of our results by estimating their variability .

Beyond our simultaneous approach, we compute five alternative estimators:

  • ‘Survey & missing tail’. The results are produced using the unadjusted survey data, plus an estimation of the total wealth held by rich household with zero probability of being in the survey (missing tail).

  • ‘Pareto-calibration & missing tail’. Survey data are adjusted with the Pareto-calibration model. Survey weights are calibrated and the total wealth of the missing tail is included in the estimate.

  • ‘Par-cal, proportional allocation & missing tail’. This method adds to the previous one a correction for measurement error based on proportional allocation, as in oecd2013. This is a very naive method based on the assumption that measurement error is equal across households and that only depends on the instrument. Moreover, it does not enable to adjust for no-reporting.

  • ‘Single-iteration approach & missing tail’. In this method, the correction for measurement error is based on univariate calibration method described in subsection 3.5. Adjustments are applied on the y variable (gross wealth). After rescaling the threshold to account for measurement error, the missing tail is re-estimated and included.

  • ‘Single-iteration approach, portfolio calibration & missing tail’. This method extends the previous one by adding an extra step in which portfolios are calibrated using financial accounts totals – adjusted to account for the missing part of the tail – and Pareto distributional information as benchmarks. Calibration in the extra step works again on weights.

Variance estimation in our methodology has two main components. The first one is the sampling variance, which indicates the variability introduced by choosing a sample instead of enumerating the whole population, assuming that the information collected in the survey is otherwise exactly correct. A second source of variability is imputation variance which refers to the fact the methodology for filling the gap can produce several different plausible imputed data sets. The uncertainty due to the imputation process adds up to the sampling variance.

To estimate the overall variability we use the Rao-Wu rescaled bootstrap weights released with HFCS data to account for sampling variability (hfcs2020a)

. For each of the 1,000 sets of bootstrap weights we replicate all the methods previously described. In each replication, the parameters of the Pareto distribution are re-estimated introducing additional variability. We then obtain the mean and standard deviation from all successful simulations

999A simulation is flagged as unsuccessful, and discarded, whenever a calibration procedure fails because of lack of convergence under the chosen restraints. and compute the coefficient of variation to evaluate the robustness of our methods and derive a measure of their variability.

5 Application to the HFCS

The method described in the previous sections has been applied to the second 2014 wave of the HFCS. The first step consists of estimating the parameters of the Pareto distribution ( and ). Figure 1 provides a graphical intuition of the automatic selection of threshold for the four selected countries, showing the estimated and showing, given this threshold, linear fits for the mean excess conditional on wealth. Table 1 summarises the final results. As it appears, this approach provides benefits over an arbitrary threshold selection: in all cases, the new threshold is found to be lower than €1 million, meaning that subsequent estimates on tail behaviour will significantly benefit in precision.

Figure 2 illustrates the outcome of the Pareto-calibration process, showing the empirical CCDF on a log-log scale before and after the adjustment. Re-weighted figures are produced by using the proposed Pareto-calibration method. indicates the Pareto shape parameter estimated by imputing the rich list, while shows these estimation results with survey data only.

Table 2 shows coverage ratios between survey wealth estimates and financial accounts. Column (1) shows initial coverage ratios, while column (2) displays coverage ratio for adjusted data, and column (3) grosses up survey wealth by estimating total wealth after truncation and adding it to the previous estimate. Columns (4) and (5) show the estimated number of households in the Pareto tail, along with the number of “missing rich”.

Overall, these figures suggest that the proposed Pareto-calibration approach can produce substantial improvements in survey coverage, especially in the absence of over-sampling. In the case of Finland and Germany, the discrepancies between micro and macro figures virtually disappear after calibrating survey data and accounting for the unobservable households. Coverage is also significantly improved for Italy and France, but the persistence of a mismatch between survey data and financial accounts points to the presence of measurement error.

Having re-estimated the number of households in the Pareto tail of the survey, our method also shows substantial improvements in coverage over the grossing up methods already explored in the literature, and suggests that adjustments for non-response should also focus on correcting the number of households in the Pareto tail, rather than only the wealth contained in it.

After dealing with the issue of nonresponse at the tail of the distribution, we use multivariate calibration to adjust for measurement error along the whole distribution.

As benchmark constraints we use the financial instruments with high conceptual comparability between survey and financial accounts – namely, deposits, bonds, shares, funds, insurances and pensions, money owed to the household and liabilities – following from the comparability scale provided by eglmm. The resulting adjustment factors are then applied to financial instruments with lower comparability – business and housing wealth – which, assuming that measurement error is comparable within comparable financial instruments, should ensure that the adjustment will not be biased by the presence of instruments with low comparability.

We then iterate the Pareto-calibration and the multivariate calibration until convergence. Convergence has been set on the parameter of the Pareto distribution: if the estimated value in two consecutive steps differs by less than a small predefined threshold,101010In the current application, this tolerance was set at 0.05. the procedure stops. Convergence is usually achieved in a limited number of steps (between 1 and 3 in the application at hand).

Table 3 shows the average values of the adjustment factors ’s (as well as coefficients of variation) as a function of gross wealth percentiles at the end of the iterative procedure for the four countries. That is, these are the overall adjustment of the survey variables at the end of the procedure obtained as the ratio between the final imputed values and the ones from the original survey.

6 Results

Table 4 shows distributional results indicating the proportion of net wealth held by the top 1, 5, 10, and 20 weighted percentiles, along with the bottom 50%. Weighted Gini inequality indices are also presented in column (6), while column (7) provides the estimated Pareto tail parameter given the data. These figures are reproduced under each allocation method. The bootstrap-based coefficient of variation is reported in parentheses for each estimate.

The first set of rows (‘Base Survey’) presents distributional figures from the unadjusted HFCS data. As is well known, truncation in top wealth distribution and measurement error can cause survey estimates to understate the true level of wealth inequality, and the figures presented in the table provide support for this possibility. Indeed, estimates from the unadjusted HFCS would suggest wealth inequality in Italy, who has one of largest micro-macro gaps, to be close to the inequality level in Finland, where the gap is lower.

Column (7) displays the Pareto tail coefficients. In the first set of rows, the parameter is estimated using survey data only, meaning that this is the Pareto estimate that survey data yields when truncation is not corrected through the imputation of a rich list.

For all following sets of rows, which correspond to the alternative estimators discussed in section 4, we also include an adjustment for the unobserved part of the Pareto tail as presented in section 3.1. To do so, these missing households are imputed as a single observation in which the weight and wealth are respectively equal to the estimated number of unobserved households and the estimated average wealth in the unobserved Pareto tail.

The second set of rows (‘Survey & missing tail’) displays estimates produced using the un-adjusted survey data, plus the missing tail households. Depending on the size of the truncation in the Pareto tail, inequality estimates can be affected considerably. For surveys, such as the Italian and German ones, in which truncation bias is particularly pronounced, the sole inclusion of these unobserved households increases the proportion of wealth held by the top 1% households by at least 10.7 and 10.3 percentage points, respectively. This increase is much less pronounced for the French and Finnish surveys, where the truncation is also much more modest.

The inclusion of the unobserved tail raises inequality levels for all the surveys considered, but again these increases are proportional to the size of the truncation. Finally, estimates for the Pareto tail parameter are now corrected for the truncation by imputing the rich list and using the estimation procedure described in section 3. These are the same parameters earlier shown in Figure 2.

Survey weights are then adjusted using the proposed Pareto-calibration method to produce the figures shown in the third set of rows (‘Pareto-calibration & missing tail’). After this adjustment, between-country differences across distributional indicators start to decline. This time, an increase in inequality, while less remarkable than in the previous step, can still be noted across all surveys. Should there be a reason to suspect that survey weights degrade due to differential non-response, this increase suggests that the proposed adjustment can make an important contribution in the measurement of inequality through the adjustment of existing survey data points.

After applying the adjustment, the tail parameters are re-estimated, and shown in column (7). Their closeness to the initial Pareto estimates, shown in the previous set of rows, suggests that the calibration process does minimise distortions from the estimated Pareto distribution, even in cases in which the issues of truncation and differential-non-response are more severe. Improvements can be noted over the parameters (obtained without imputation of the rich list) as well, which are now closer to the rich-list imputed parameters, as shown in figure 2.

The row sets from fourth to sixth adjust the survey applying the estimators described in section 4. For countries like Finland and Germany, where measurement error seems to be a negligible issue, these adjustments might not be needed, and remaining divergences in portfolio item coverage against macroeconomic aggregates should be treated as sampling issues and adjusted through weight calibration, as detailed in section 4, and shown in the last set of rows in table 4.

In the fourth set of rows (‘Par-cal, proportional allocation & missing tail’), portfolio items are scaled proportionally to the Financial Accounts aggregates. Proportional allocation, however, seems like an inadequate solution. While proportionally allocated items do not generate severe distortions in the estimated Pareto distribution, proportional allocation will most likely affect the portfolio allocation within each household. Since it is based on very unreliable assumptions this method should be considered in cases where the gap to fill is minimal.

The fourth (‘Par-cal, Wealth calibration, & missing tail’) and fifth (‘Par-cal, Wealth/portfolio calibration, & missing tail’) sets of rows show how distributional figures are affected by the approach described in subsection 3.2 .

In both cases, substantial differences over proportional allocation can be noted. First, the Pareto tail parameter is always closer to the initial estimate, meaning that the reallocation process, this time, leaves the distributional features of the survey intact. Secondly, inequality figures appear to be much more like the estimates produced in the previous steps. Indeed, the final output shows comparable results across all surveys, in which the increases in inequality, compared to the initial survey data, are proportional to the severity of both truncation and measurement error problems.

Most importantly, the parameter is still close enough to the one estimated initially, suggesting, once again, that neither adjustment gives rise to unnecessary distortions in the tail wealth distribution. This is a relevant result, that validates the assumption of relative error converging in probability to a constant.

While wealth calibration should not be treated as a substitute for proper models for adjusting for measurement error, especially when this error is linked to socio-economic or behavioural factors, these calibration-based methods can still assist in the production of distributional figures without exposing the researcher to the risk of misrepresenting the distribution of household wealth and individual asset compositions.

Also, the use of portfolio calibration (as in the penultimate set of rows) can definitely help when measurement error is supposed to be null (Finland, and Germany to a lesser degree), and when models have been used to address such problem. In these cases, the wealth calibration step can be skipped entirely, while the portfolio calibration can be paired with Pareto-calibration within the same step, so that the weighted sum of each portfolio item is kept consistent with the corresponding macro-economic aggregate, producing consistent and correct distributional figures.

The results obtained using the simultaneous approach are presented in the final set of rows. Here, we see that the distributional estimates are broadly in line with the results produced by the other methods. In particular, it provides very similar results of the Single-iteration approach, suggesting that its simplifying assumptions are likely to hold, at least in the four countries used in the analysis.

Overall, all the methods consistently show that household finance survey under-estimate the levels of wealth inequality. Moreover, the larger the wealth gap between micro and macro data, the higher the increase in the measures of inequality.

As to variance estimation, the adjustment methods generally produce a decrease in the reliability of the results. This is expected since they add some additional variability because of the imputation process.

For each method, the precision increases when the statistic relates the bottom or median part of the wealth distribution. The estimators of the wealth share held by the top 1 percent have a low precision in all countries.

Compared to other methods, the simultaneous approach produces the lowest increase in variability. This is also due to the use of multivariate calibration, a method that has been originally developed to increase the precision of estimators. The final coefficients of variation are not very different from those based on the unadjusted survey data, especially for the statistics that do not relate the top tail of the distribution.

Table 4 shows distributional results indicating the proportion of net wealth held by the top 1, 5, 10, and 20 weighted percentiles, along with the bottom 50%. Weighted Gini inequality indices are also presented in column (6), while column (7) provides the estimated Pareto tail parameter given the data. These figures are reproduced under each allocation method. The bootstrap-based coefficient of variation is reported in parentheses for each estimate.

The first set of rows (‘Base Survey’) presents distributional figures from the unadjusted HFCS data. As is well known, truncation in top wealth distribution and measurement error can cause survey estimates to understate the true level of wealth inequality, and the figures presented in the table provide support for this possibility. Indeed, estimates from the unadjusted HFCS would suggest wealth inequality in Italy, which has one of the largest micro-macro gaps, to be close to the inequality level in Finland, where the gap is lower.

Column (7) displays the Pareto tail coefficients. In the first set of rows, the parameter is estimated using survey data only, meaning that this is the Pareto estimate that survey data yields when truncation is not corrected through the imputation of a rich list.

For all following sets of rows, which correspond to the alternative estimators discussed in section 4, we also include an adjustment for the unobserved part of the Pareto tail as presented in section 3.1. To do so, these missing households are imputed as a single observation in which the weight and wealth are respectively equal to the estimated number of unobserved households and the estimated average wealth in the unobserved Pareto tail.

The second set of rows (‘Survey & missing tail’) displays estimates produced using the un-adjusted survey data, plus the missing tail households. Depending on the size of the truncation in the Pareto tail, inequality estimates can be affected considerably. For surveys, such as the Italian and German ones, in which truncation bias is particularly pronounced, the sole inclusion of these unobserved households increases the proportion of wealth held by the top 1% households by at least 10.7 and 10.3 percentage points, respectively. This increase is much less pronounced for the French and Finnish surveys, where the truncation is also much more modest.

The inclusion of the unobserved tail raises inequality levels for all the surveys considered, but again these increases are proportional to the size of the truncation. Finally, estimates for the Pareto tail parameter are now corrected for the truncation by imputing the rich list and using the estimation procedure described in section 3. These are the same parameters earlier shown in Figure 2.

Survey weights are then adjusted using the proposed Pareto-calibration method to produce the figures shown in the third set of rows (‘Pareto-calibration & missing tail’). After this adjustment, between-country differences across distributional indicators start to decline. This time, an increase in inequality, while less remarkable than in the previous step, can still be noted across all surveys. Should there be a reason to suspect that survey weights degrade due to differential non-response, this increase suggests that the proposed adjustment can make an important contribution in the measurement of inequality through the adjustment of existing survey data points.

After applying the adjustment, the tail parameters are re-estimated and shown in column (7). Their closeness to the initial Pareto estimates, shown in the previous set of rows, suggests that the calibration process does minimise distortions from the estimated Pareto distribution, even in cases in which the issues of truncation and differential-non-response are more severe. Improvements can be noted over the parameters (obtained without imputation of the rich list) as well, which are now closer to the rich-list imputed parameters, as shown in figure 2.

The row sets from fourth to sixth adjust the survey applying the estimators described in section 4. For countries like Finland and Germany, where measurement error seems to be a negligible issue, these adjustments might not be needed, and remaining divergences in portfolio item coverage against macroeconomic aggregates should be treated as sampling issues and adjusted through weight calibration, as detailed in section 4, and shown in the last set of rows in table 4.

In the fourth set of rows (‘Par-cal, proportional allocation & missing tail’), portfolio items are scaled proportionally to the Financial Accounts aggregates. Proportional allocation, however, seems like an inadequate solution. While proportionally allocated items do not generate severe distortions in the estimated Pareto distribution, the proportional allocation will most likely affect the portfolio allocation within each household. Since it is based on very unreliable assumptions this method should be considered in cases where the gap to fill is minimal.

The fourth (‘Par-cal, Wealth calibration, & missing tail’) and fifth (‘Par-cal, Wealth/portfolio calibration, & missing tail’) sets of rows show how distributional figures are affected by the approach described in subsection 3.2 .

In both cases, substantial differences over proportional allocation can be noted. First, the Pareto tail parameter is always closer to the initial estimate, meaning that the reallocation process, this time, leaves the distributional features of the survey intact. Secondly, inequality figures appear to be much more like the estimates produced in the previous steps. Indeed, the final output shows comparable results across all surveys, in which the increases in inequality, compared to the initial survey data, are proportional to the severity of both truncation and measurement error problems.

Most importantly, the parameter is still close enough to the one estimated initially, suggesting, once again, that neither adjustment gives rise to unnecessary distortions in the tail wealth distribution. This is a relevant result, that validates the assumption of relative error converging in probability to a constant.

While wealth calibration should not be treated as a substitute for proper models for adjusting for measurement error, especially when this error is linked to socio-economic or behavioural factors, these calibration-based methods can still assist in the production of distributional figures without exposing the researcher to the risk of misrepresenting the distribution of household wealth and individual asset compositions.

Also, the use of portfolio calibration (as in the penultimate set of rows) can help when measurement error is supposed to be null (Finland, and Germany to a lesser degree), and when models have been used to address such a problem. In these cases, the wealth calibration step can be skipped entirely, while the portfolio calibration can be paired with Pareto-calibration within the same step, so that the weighted sum of each portfolio item is kept consistent with the corresponding macro-economic aggregate, producing consistent and correct distributional figures.

The results obtained using the simultaneous approach are presented in the final set of rows. Here, we see that the distributional estimates are broadly in line with the results produced by the other methods. In particular, it provides very similar results to the Single-iteration approach, suggesting that its simplifying assumptions are likely to hold, at least in the four countries used in the analysis.

Overall, all the methods consistently show that the household finance survey under-estimate the levels of wealth inequality. Moreover, the larger the wealth gap between micro and macro data, the higher the increase in the measures of inequality.

As to variance estimation, the adjustment methods generally produce a decrease in the reliability of the results. This is expected since they add some additional variability because of the imputation process.

For each method, the precision increases when the statistic relates the bottom or median part of the wealth distribution. The estimators of the wealth share held by the top 1 percent have a low precision in all countries.

Compared to other methods, the simultaneous approach produces the lowest increase in variability. This is also due to the use of multivariate calibration, a method that has been originally developed to increase the precision of estimators. The final coefficients of variation are not very different from those based on the unadjusted survey data, especially for the statistics that do not relate to the top tail of the distribution.

7 Conclusions

In this paper, we show how a combination of well-established methodologies for the fitting of a Pareto distribution and the calibration of survey data can be used in conjunction to adjust survey wealth for the correction of non-response and misreporting when only limited external information is available. These methods build upon existing methods for the estimation of the missing rich in wealth surveys, and complement them by focusing on the observable part of the household population.

We apply these methods to the HFCS data, using the 2014 Finnish, French, German, and Italian surveys, and employing rich list data from Forbes or national press sources, along with household sector aggregates from the financial accounts, as auxiliary sources of information.

We show how these adjustments can play a particularly important role in the production of distributional national accounts for the household sector, suggesting that inequality estimates from the original survey data may understate the population parameters, depending on the severity of both non-response problem and measurement error. We also discuss how to assess the quality of these distributional indicators.

Further work is needed for the refinement of this approach. For example, the estimation process for the number of households in the tail could be further validated and improved, for instance by using alternatives to rich lists (such as tax records) or by applying additional methods (such as the Type II Pareto or the Estate Multiplier Method). Also, the correction of measurement error could be further improved by enriching the auxiliary variables vector with more granular external information (if available).

Nonetheless, our framework has the advantage to offer a set of adaptable tools that can be turned on a case-by-case basis. Indeed, both the Pareto-calibration adjustment and the multivariate calibration methods can be enhanced with external information and can be run separately when needed.

Moreover, our contribution opens the opportunity for computing several distributional indicators using the adjusted micro data-set, while most studies are currently focusing on providing aggregate estimates for the missing wealth.

Acknowledgements

The paper has greatly benefited from the discussions with all members of the EG-LMM and EG-DFA working groups.

References

8 Appendix: Tables and figures

Figure 1: Pareto Threshold detection. Mean excess plots for gross recorded wealth in the HFCS. Predicted Pareto thresholds and linear fits estimated using the proposed methodology.
Figure 2: Pareto Tail Re-weighting. Empirical cumulative distribution functions (log scale) for survey wealth distributions in the Pareto Tail. Re-weighting achieved by using the Pareto-calibration method, using the calibration benchmarks from equation (10). parameters estimated using survey data only, estimated using Vermeulen’s vermeulen2018fat regression method with imputed rich list.

Country (1) (2) (3)

IT
1.952 1.491 310,084
FR 1.771 1.537 567,378
DE 1.499 1.362 254,000
FI 2.145 1.718 880,806


Notes: Pareto tail parameters and estimated thresholds. parameters estimated using survey data only, estimated using Vermeulen’s vermeulen2018fat regression method with imputed rich list.
Table 1: The missing gap: Pareto tail parameters and estimated thresholds

Coverage Ratios Estimated tail households
Base Adjusted Implied Total 95% C.I. Missing
Country (1) (2) (3) (4) (5) (6)

IT
0.553 0.647 0.727 5,483,837 243.406 19366.110
FR 0.673 0.762 0.779 3,003,389 60.255 289.339
DE 0.827 0.908 1.039 9,889,923 463.185 7913.568
FI 0.917 1.019 1.037 108,507 36.287 91.788


Notes: Coverage ratios and estimated number of households in the tail. Re-weighting achieved with the Pareto-calibration method, using the calibration benchmarks from equation 10.
Table 2: The missing gap: Pareto adjustments for gross wealth

Percentile
0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00
Country (1) (2) (3) (4) (5) (6) (7) (8) (9) (10)

IT
(0.110100478) (0.128386592) (0.185638498) (0.053985327) (0.019116371) (0.016473403) (0.015599364) (0.021732155) (0.01627191) (0.097799845)

FR
(0.102114312) (0.151260448) (0.066618173) (0.195827762) (0.046845418) (0.005779879) (0.004145259) (0.006719404) (0.008334635) (0.112678639)

DE
(0.117598313) (0.077011781) (0.040544981) (0.024094583) (0.04290484) (0.06090941) (0.017420882) (0.009559354) (0.006931982) (0.017240653)


FI
(0.092597014) (0.090366998) (0.057036395) (0.043957307) (0.004562964) (0.003244137) (0.004485249) (0.002541498) (0.00319745) (0.028715268)




Notes: Mean and coefficient of variation of overall adjustment factors , equations (12) and (13), from the multivariate calibration approach for imputation as a function of gross wealth percentiles.
Table 3: Simultaneous approach: final multivariate calibration adjustment factors


Wealth Shares
Top 1% Top 5% Top 10% Top 20% Bot 50% Gini Tail S.r.
Country (1) (2) (3) (4) (5) (6) (7) (8)
Base Survey

IT
(00.074043924) (0.032349375) (0.020676699) (0.012581805) (0.038947602) (0.01230447) (0.074651195)

FR
(0.089322942) (0.036388704) (0.021976953) (0.011554199) (0.039716146) (0.011225063) (0.052288883)

DE
(0.111291626) (0.049686226) (0.030760529) (0.015098774) (0.072603913) (0.013842171) (0.159378921)

FI
(0.050382835) (0.018772414) (0.011529657) (0.006889984) (0.025254053) (0.007033199) (0.068581422)

Survey & missing tail


IT
(0.140352891) (0.068805967) (0.044394832) (0.024330073) (0.056430522) (0.019044346) (0.015357159)

FR
(0.25414802) (0.110739881) (0.066960475) (0.03445033) (0.078570837) (0.020877417) (0.030471995)

DE
(0.077118545) (0.037726654) (0.02430974) (0.012291083) (0.073533691) (0.011802833) (0.017136201)

FI
(0.071817044) (0.026250988) (0.015714254) (0.008674642) (0.026563672) (0.00876459) (0.039229624)

Pareto-calibration & missing tail

IT
(0.087269404) (0.043929358) (0.028611299) (0.014907495) (0.045318048) (0.013948789) (0.015357159)

FR
(0.181652505) (0.083733685) (0.050389792) (0.024677847) (0.038990646) (0.023211861) (0.029178852)

DE
(0.083589631) (0.039739964) (0.023516879) (0.010000952) (0.070521832) (0.01039992) (0.017136201)

FI
(0.164869007) (0.075492925) (0.046222362) (0.022306239) (0.029968827) (0.019608429) (0.039229624)

Par-cal, Proportional allocation & missing tail

IT
(0.10791799) (0.054076974) (0.035016892) (0.018325945) (0.048250771) (0.017345783) (0.009200605)

FR
(0.18129983) (0.085520974) (0.052837983) (0.027007022) (0.048272211) (0.025463006) (0.032084263)

DE
(0.092227307) (0.043820285) (0.025985923) (0.011021125) (0.070916238) (0.011560138) (0.016247937)

FI
(0.171791997) (0.079373743) (0.048710654) (0.023626459) (0.030878846) (0.020948357) (0.039096703)


Par-cal, Wealth calibration & missing tail

IT
(0.225486224) (0.117403392) (0.077133026) (0.042237522) (0.102170295) (0.041774632) (0.009196682)

FR
(0.229959407) (0.110774328) (0.067829808) (0.033365981) (0.067739478) (0.033046372) (0.033996809)

DE
(0.084869232) (0.045522824) (0.030377721) (0.015828329) (0.079638546) (0.014616543) (0.01611418)

FI
(0.167844416) (0.084166918) (0.056439226) (0.031979879) (0.049493716) (0.027975696) (0.038684284 )


Single-iteration approach & missing tail

IT
(0.296798357) (0.121360125) (0.076636548) (0.042065957) (0.110417105) (0.042278155) (0.009143348)

FR
(0.245881102) (0.104212265) (0.059169995) (0.026051977) (0.070896338) (0.026391168) (0.038278876)

DE
(0.114989255) (0.050432023) (0.031934494) (0.016388484) (0.090627147) (0.015674941) (0.020903145)

FI
(0.109527528) (0.043780021) (0.028971436) (0.01681636) (0.031674222) (0.0125746) (0.02869323)

Simultaneous approach

IT
(0.080747521) (0.038100521) (0.023442986) (0.013136038) (0.055437696) (0.013130281) (0.005709876)

FR
(0.148302721) (0.065476277) (0.040231525) (0.020834759) (0.044095909) (0.018487718) (0.018619528)

DE
(0.09207948) (0.042801025) (0.025120159) (0.011536941) (0.068353716) (0.012167132) (0.018151911)

FI
(0.114902346) (0.048577984) (0.033591862) (0.020074588) (0.049705703) (0.017467699) (0.016293128)
Table 4: The missing gap: distributional wealth indicators
Notes: Wealth share by percentile, Gini inequality coefficients and Pareto tail parameters for Italy, France, Germany and Finland, estimated using different adjustments for the HFCS data, and accounting for the unobserved part of the Pareto tail. Bootstrap-based coefficients of variation reported in parentheses. Success rates (S.r.) report the observed probability of convergence for calibration.