SMARTFEAT: Efficient Feature Construction through Feature-Level Foundation Model Interactions

09/14/2023
by   Yin Lin, et al.
0

Before applying data analytics or machine learning to a data set, a vital step is usually the construction of an informative set of features from the data. In this paper, we present SMARTFEAT, an efficient automated feature engineering tool to assist data users, even non-experts, in constructing useful features. Leveraging the power of Foundation Models (FMs), our approach enables the creation of new features from the data, based on contextual information and open-world knowledge. To achieve this, our method incorporates an intelligent operator selector that discerns a subset of operators, effectively avoiding exhaustive combinations of original features, as is typically observed in traditional automated feature engineering tools. Moreover, we address the limitations of performing data tasks through row-level interactions with FMs, which could lead to significant delays and costs due to excessive API calls. To tackle this, we introduce a function generator that facilitates the acquisition of efficient data transformations, such as dataframe built-in methods or lambda functions, ensuring the applicability of SMARTFEAT to generate new features for large datasets. With SMARTFEAT, dataset users can efficiently search for and apply transformations to obtain new features, leading to improvements in the AUC of downstream ML classification by up to 29.8

READ FULL TEXT
research
03/26/2021

FeatureEnVi: Visual Analytics for Feature Engineering Using Stepwise Selection and Semi-Automatic Extraction Approaches

The machine learning (ML) life cycle involves a series of iterative step...
research
09/12/2019

Simple-ML: Towards a Framework for Semantic Data Analytics Workflows

In this paper we present the Simple-ML framework that we develop to supp...
research
01/23/2023

Feature construction using explanations of individual predictions

Feature construction can contribute to comprehensibility and performance...
research
12/17/2015

Unsupervised Feature Construction for Improving Data Representation and Semantics

Feature-based format is the main data representation format used by mach...
research
06/01/2017

One button machine for automating feature engineering in relational databases

Feature engineering is one of the most important and time consuming task...
research
05/05/2023

GPT for Semi-Automated Data Science: Introducing CAAFE for Context-Aware Automated Feature Engineering

As the field of automated machine learning (AutoML) advances, it becomes...
research
07/05/2022

Deterministic Decoupling of Global Features and its Application to Data Analysis

We introduce a method for deterministic decoupling of global features an...

Please sign up or login with your details

Forgot password? Click here to reset