FeatureEnVi: Visual Analytics for Feature Engineering Using Stepwise Selection and Semi-Automatic Extraction Approaches

03/26/2021
by   Angelos Chatzimparmpas, et al.
0

The machine learning (ML) life cycle involves a series of iterative steps, from the effective gathering and preparation of the data, including complex feature engineering processes, to the presentation and improvement of results, with various algorithms to choose from in every step. Feature engineering in particular can be very beneficial for ML, leading to numerous improvements such as boosting the predictive results, decreasing computational times, reducing excessive noise, and increasing the transparency behind the decisions taken during the training. Despite that, while several visual analytics tools exist to monitor and control the different stages of the ML life cycle (especially those related to data and algorithms), feature engineering support remains inadequate. In this paper, we present FeatureEnVi, a visual analytics system specifically designed to assist with the feature engineering process. Our proposed system helps users to choose the most important feature, to transform the original features into powerful alternatives, and to experiment with different feature generation combinations. Additionally, data space slicing allows users to explore the impact of features on both local and global scales. FeatureEnVi utilizes multiple automatic feature selection techniques; furthermore, it visually guides users with statistical evidence about the influence of each feature (or subsets of features). The final outcome is the extraction of heavily engineered features, evaluated by multiple validation metrics. The usefulness and applicability of FeatureEnVi are demonstrated with two use cases, using a popular red wine quality data set and publicly available data related to vehicle recognition from their silhouettes. We also report feedback from interviews with ML experts and a visualization researcher who assessed the effectiveness of our system.

READ FULL TEXT

page 1

page 4

page 5

page 8

page 9

research
05/04/2020

StackGenVis: Alignment of Data, Algorithms, and Models for Stacking Ensemble Learning Using Performance Metrics

In machine learning (ML), ensemble methods such as bagging, boosting, an...
research
09/14/2023

SMARTFEAT: Efficient Feature Construction through Feature-Level Foundation Model Interactions

Before applying data analytics or machine learning to a data set, a vita...
research
03/29/2022

HardVis: Visual Analytics to Handle Instance Hardness Using Undersampling and Oversampling Techniques

Despite the tremendous advances in machine learning (ML), training with ...
research
12/02/2020

VisEvol: Visual Analytics to Support Hyperparameter Search through Evolutionary Optimization

During the training phase of machine learning (ML) models, it is usually...
research
06/29/2023

Statistically Enhanced Learning: a feature engineering framework to boost (any) learning algorithms

Feature engineering is of critical importance in the field of Data Scien...
research
12/07/2022

MetaStackVis: Visually-Assisted Performance Evaluation of Metamodels

Stacking (or stacked generalization) is an ensemble learning method with...
research
05/31/2023

Managed Geo-Distributed Feature Store: Architecture and System Design

Companies are using machine learning to solve real-world problems and ar...

Please sign up or login with your details

Forgot password? Click here to reset