AutoSmart: An Efficient and Automatic Machine Learning framework for Temporal Relational Data

09/09/2021
by   Zhipeng Luo, et al.
0

Temporal relational data, perhaps the most commonly used data type in industrial machine learning applications, needs labor-intensive feature engineering and data analyzing for giving precise model predictions. An automatic machine learning framework is needed to ease the manual efforts in fine-tuning the models so that the experts can focus more on other problems that really need humans' engagement such as problem definition, deployment, and business services. However, there are three main challenges for building automatic solutions for temporal relational data: 1) how to effectively and automatically mining useful information from the multiple tables and the relations from them? 2) how to be self-adjustable to control the time and memory consumption within a certain budget? and 3) how to give generic solutions to a wide range of tasks? In this work, we propose our solution that successfully addresses the above issues in an end-to-end automatic way. The proposed framework, AutoSmart, is the winning solution to the KDD Cup 2019 of the AutoML Track, which is one of the largest AutoML competition to date (860 teams with around 4,955 submissions). The framework includes automatic data processing, table merging, feature engineering, and model tuning, with a time&memory controller for efficiently and automatically formulating the models. The proposed framework outperforms the baseline solution significantly on several datasets in various domains.

READ FULL TEXT
research
02/06/2020

Supervised Learning on Relational Databases with Graph Neural Networks

The majority of data scientists and machine learning practitioners use r...
research
01/19/2018

Active Learning of Strict Partial Orders: A Case Study on Concept Prerequisite Relations

Strict partial order is a mathematical structure commonly seen in relati...
research
02/04/2023

REaLTabFormer: Generating Realistic Relational and Tabular Data using Transformers

Tabular data is a common form of organizing data. Multiple models are av...
research
03/05/2020

SAFE: Scalable Automatic Feature Engineering Framework for Industrial Tasks

Machine learning techniques have been widely applied in Internet compani...
research
10/13/2020

Data Engineering for HPC with Python

Data engineering is becoming an increasingly important part of scientifi...
research
11/22/2011

Representations and Ensemble Methods for Dynamic Relational Classification

Temporal networks are ubiquitous and evolve over time by the addition, d...
research
11/09/2020

Batchwise Probabilistic Incremental Data Cleaning

Lack of data and data quality issues are among the main bottlenecks that...

Please sign up or login with your details

Forgot password? Click here to reset