One button machine for automating feature engineering in relational databases

06/01/2017
by   Hoang Thanh Lam, et al.
0

Feature engineering is one of the most important and time consuming tasks in predictive analytics projects. It involves understanding domain knowledge and data exploration to discover relevant hand-crafted features from raw data. In this paper, we introduce a system called One Button Machine, or OneBM for short, which automates feature discovery in relational databases. OneBM automatically performs a key activity of data scientists, namely, joining of database tables and applying advanced data transformations to extract useful features from data. We validated OneBM in Kaggle competitions in which OneBM achieved performance as good as top 16 competitions. More importantly, OneBM outperformed the state-of-the-art system in a Kaggle competition in terms of prediction accuracy and ranking on Kaggle leaderboard. The results show that OneBM can be useful for both data scientists and non-experts. It helps data scientists reduce data exploration time allowing them to try and error many ideas in short time. On the other hand, it enables non-experts, who are not familiar with data science, to quickly extract value from their data with a little effort, time and cost.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/16/2018

Learning Features For Relational Data

Feature engineering is one of the most important but tedious tasks in da...
research
02/06/2020

Supervised Learning on Relational Databases with Graph Neural Networks

The majority of data scientists and machine learning practitioners use r...
research
09/12/2019

Augmented Data Science: Towards Industrialization and Democratization of Data Science

Conversion of raw data into insights and knowledge requires substantial ...
research
03/02/2023

A Vision for Semantically Enriched Data Science

The recent efforts in automation of machine learning or data science has...
research
09/07/2020

A Lightweight Algorithm to Uncover Deep Relationships in Data Tables

Many data we collect today are in tabular form, with rows as records and...
research
09/14/2023

SMARTFEAT: Efficient Feature Construction through Feature-Level Foundation Model Interactions

Before applying data analytics or machine learning to a data set, a vita...
research
06/28/2019

MLFriend: Interactive Prediction Task Recommendation for Event-Driven Time-Series Data

Most automation in machine learning focuses on model selection and hyper...

Please sign up or login with your details

Forgot password? Click here to reset