A Data Analytics Framework for Aggregate Data Analysis

09/16/2018
by   Sanket Tavarageri, et al.
0

In many contexts, we have access to aggregate data, but individual level data is unavailable. For example, medical studies sometimes report only aggregate statistics about disease prevalence because of privacy concerns. Even so, many a time it is desirable, and in fact could be necessary to infer individual level characteristics from aggregate data. For instance, other researchers who want to perform more detailed analysis of disease characteristics would require individual level data. Similar challenges arise in other fields too including politics, and marketing. In this paper, we present an end-to-end pipeline for processing of aggregate data to derive individual level statistics, and then using the inferred data to train machine learning models to answer questions of interest. We describe a novel algorithm for reconstructing fine-grained data from summary statistics. This step will create multiple candidate datasets which will form the input to the machine learning models. The advantage of the highly parallel architecture we propose is that uncertainty in the generated fine-grained data will be compensated by the use of multiple candidate fine-grained datasets. Consequently, the answers derived from the machine learning models will be more valid and usable. We validate our approach using data from a challenging medical problem called Acute Traumatic Coagulopathy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/31/2022

Holistic Fine-grained GGS Characterization: From Detection to Unbalanced Classification

Recent studies have demonstrated the diagnostic and prognostic values of...
research
06/24/2021

SecureDL: Securing Code Execution and Access Control for Distributed Data Analytics Platforms

Distributed data analytics platforms such as Apache Spark enable cost-ef...
research
03/30/2023

Establishing baselines and introducing TernaryMixOE for fine-grained out-of-distribution detection

Machine learning models deployed in the open world may encounter observa...
research
06/29/2019

An aggregate learning approach for interpretable semi-supervised population prediction and disaggregation using ancillary data

Census data provide detailed information about population characteristic...
research
05/20/2023

Commodity-specific triads in the Dutch inter-industry production network

Triadic motifs are the smallest building blocks of higher-order interact...
research
03/01/2018

Recover Fine-Grained Spatial Data from Coarse Aggregation

In this paper, we study a new type of spatial sparse recovery problem, t...
research
02/04/2018

Using Poisson Binomial GLMs to Reveal Voter Preferences

We present a new modeling technique for solving the problem of ecologica...

Please sign up or login with your details

Forgot password? Click here to reset