The Landscape of R Packages for Automated Exploratory Data Analysis

03/27/2019
by   Mateusz Staniak, et al.
0

The increasing availability of large but noisy data sets with a large number of heterogeneous variables leads to the increasing interest in the automation of common tasks for data analysis. The most time-consuming part of this process is the Exploratory Data Analysis, crucial for better domain understanding, data cleaning, data validation, and feature engineering. There is a growing number of libraries that attempt to automate some of the typical Exploratory Data Analysis tasks to make the search for new insights easier and faster. In this paper, we present a systematic review of existing tools for Automated Exploratory Data Analysis (autoEDA). We explore the features of twelve popular R packages to identify the parts of analysis that can be effectively automated with the current tools and to point out new directions for further autoEDA development.

READ FULL TEXT

page 5

page 7

page 9

page 11

research
03/12/2019

SmartEDA: An R Package for Automated Exploratory Data Analysis

This paper introduces SmartEDA, which is an R package for performing Exp...
research
11/01/2019

Goals, Process, and Challenges of Exploratory Data Analysis: An Interview Study

How do analysis goals and context affect exploratory data analysis (EDA)...
research
06/07/2023

Towards High-Performance Exploratory Data Analysis (EDA) Via Stable Equilibrium Point

Exploratory data analysis (EDA) is a vital procedure for data science pr...
research
12/04/2018

JOVIAL: Notebook-based Astronomical Data Analysis in the Cloud

Performing astronomical data analysis using only personal computers is b...
research
09/05/2022

Explaining the optimistic performance evaluation of newly proposed methods: a cross-design validation experiment

The constant development of new data analysis methods in many fields of ...
research
01/03/2020

Towards Scalable Dataframe Systems

Dataframes are a popular and convenient abstraction to represent, struct...
research
08/31/2021

DoGR: Disaggregated Gaussian Regression for Reproducible Analysis of Heterogeneous Data

Quantitative analysis of large-scale data is often complicated by the pr...

Please sign up or login with your details

Forgot password? Click here to reset