Data Fusion for Joining Income and Consumption Information Using Different Donor-Recipient Distance Metrics

11/30/2020
by   Florian Meinfelder, et al.
0

Data fusion describes the method of combining data from (at least) two initially independent data sources to allow for joint analysis of variables which are not jointly observed. The fundamental idea is to base inference on identifying assumptions, and on common variables which provide information that is jointly observed in all the data sources. A popular class of methods dealing with this particular missing-data problem is based on nearest neighbour matching. However, exact matches become unlikely with increasing common information, and the specification of the distance function can influence the results of the data fusion. In this paper we compare two different approaches of nearest neighbour hot deck matching: One, Random Hot Deck, is a variant of the covariate-based matching methods which was proposed by Eurostat, and can be considered as a 'classical' statistical matching method, whereas the alternative approach is based on Predictive Mean Matching. We discuss results from a simulation study to investigate benefits and potential drawbacks of both variants, and our findings suggest that Predictive Mean Matching tends to outperform Random Hot Deck.

READ FULL TEXT

page 26

page 27

page 31

page 32

research
03/29/2019

Statistical matching of non-Gaussian data

The statistical matching problem is a data integration problem with stru...
research
01/22/2021

Revisiting Identifying Assumptions for Population Size Estimation

The problem of estimating the size of a population based on a subset of ...
research
09/01/2021

Bayesian data combination model with Gaussian process latent variable model for mixed observed variables under NMAR missingness

In the analysis of observational data in social sciences and businesses,...
research
07/11/2022

A blended distance to define "people-like-me"

Curve matching is a prediction technique that relies on predictive mean ...
research
06/05/2018

The Value of Information in Retrospect

In the course of any statistical analysis, it is necessary to consider i...
research
10/05/2022

Fused mean structure learning in data integration with dependence

Motivated by image-on-scalar regression with data aggregated across mult...
research
01/07/2021

Distances with mixed type variables some modified Gower's coefficients

Nearest neighbor methods have become popular in official statistics, mai...

Please sign up or login with your details

Forgot password? Click here to reset