Categorical exploratory data analysis on goodness-of-fit issues

11/19/2020
by   Sabrina Enriquez, et al.
9

If the aphorism "All models are wrong"- George Box, continues to be true in data analysis, particularly when analyzing real-world data, then we should annotate this wisdom with visible and explainable data-driven patterns. Such annotations can critically shed invaluable light on validity as well as limitations of statistical modeling as a data analysis approach. In an effort to avoid holding our real data to potentially unattainable or even unrealistic theoretical structures, we propose to utilize the data analysis paradigm called Categorical Exploratory Data Analysis (CEDA). We illustrate the merits of this proposal with two real-world data sets from the perspective of goodness-of-fit. In both data sets, the Normal distribution's bell shape seemingly fits rather well by first glance. We apply CEDA to bring out where and how each data fits or deviates from the model shape via several important distributional aspects. We also demonstrate that CEDA affords a version of tree-based p-value, and compare it with p-values based on traditional statistical approaches. Along our data analysis, we invest computational efforts in making graphic display to illuminate the advantages of using CEDA as one primary way of data analysis in Data Science education.

READ FULL TEXT

page 9

page 10

page 12

page 13

page 14

page 15

research
06/07/2023

Towards High-Performance Exploratory Data Analysis (EDA) Via Stable Equilibrium Point

Exploratory data analysis (EDA) is a vital procedure for data science pr...
research
07/26/2022

XInsight: eXplainable Data Analysis Through The Lens of Causality

In light of the growing popularity of Exploratory Data Analysis (EDA), u...
research
03/12/2019

SmartEDA: An R Package for Automated Exploratory Data Analysis

This paper introduces SmartEDA, which is an R package for performing Exp...
research
03/30/2022

Error Identification Strategies for Python Jupyter Notebooks

Computational notebooks – such as Jupyter or Colab – combine text and da...
research
02/11/2018

Uncharted Forest a Technique for Exploratory Data Analysis of Provenance Studies

Exploratory data analysis is a crucial task for developing effective cla...
research
08/31/2021

DoGR: Disaggregated Gaussian Regression for Reproducible Analysis of Heterogeneous Data

Quantitative analysis of large-scale data is often complicated by the pr...
research
09/15/2023

Modeling Data Analytic Iteration With Probabilistic Outcome Sets

In 1977 John Tukey described how in exploratory data analysis, data anal...

Please sign up or login with your details

Forgot password? Click here to reset