Bayesian data combination model with Gaussian process latent variable model for mixed observed variables under NMAR missingness

09/01/2021
by   Masaki Mitsuhiro, et al.
0

In the analysis of observational data in social sciences and businesses, it is difficult to obtain a "(quasi) single-source dataset" in which the variables of interest are simultaneously observed. Instead, multiple-source datasets are typically acquired for different individuals or units. Various methods have been proposed to investigate the relationship between the variables in each dataset, e.g., matching and latent variable modeling. It is necessary to utilize these datasets as a single-source dataset with missing variables. Existing methods assume that the datasets to be integrated are acquired from the same population or that the sampling depends on covariates. This assumption is referred to as missing at random (MAR) in terms of missingness. However, as will been shown in application studies, it is likely that this assumption does not hold in actual data analysis and the results obtained may be biased. We propose a data fusion method that does not assume that datasets are homogenous. We use a Gaussian process latent variable model for non-MAR missing data. This model assumes that the variables of concern and the probability of being missing depend on latent variables. A simulation study and real-world data analysis show that the proposed method with a missing-data mechanism and the latent Gaussian process yields valid estimates, whereas an existing method provides severely biased estimates. This is the first study in which non-random assignment to datasets is considered and resolved under resonable assumptions in data fusion problem.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/19/2018

Mixed Likelihood Gaussian Process Latent Variable Model

We present the Mixed Likelihood Gaussian process latent variable model (...
research
06/14/2023

Bayesian Non-linear Latent Variable Modeling via Random Fourier Features

The Gaussian process latent variable model (GPLVM) is a popular probabil...
research
06/30/2021

Relational VAE: A Continuous Latent Variable Model for Graph Structured Data

Graph Networks (GNs) enable the fusion of prior knowledge and relational...
research
11/30/2020

Data Fusion for Joining Income and Consumption Information Using Different Donor-Recipient Distance Metrics

Data fusion describes the method of combining data from (at least) two i...
research
01/29/2020

TPLVM: Portfolio Construction by Student's t-process Latent Variable Model

Optimal asset allocation is a key topic in modern finance theory. To rea...
research
08/09/2022

Analysis of Longitudinal Data with Missing Values in the Response and Covariates Using the Stochastic EM Algorithm

In longitudinal data a response variable is measured over time, or under...
research
01/14/2020

Analysis of Bayesian Inference Algorithms by the Dynamical Functional Approach

We analyze the dynamics of an algorithm for approximate inference with l...

Please sign up or login with your details

Forgot password? Click here to reset