AutoCross: Automatic Feature Crossing for Tabular Data in Real-World Applications

04/29/2019
by   Luo Yuanfei, et al.
0

Feature crossing captures interactions among categorical features and is useful to enhance learning from tabular data in real-world businesses. In this paper, we present AutoCross, an automatic feature crossing tool provided by 4Paradigm to its customers, ranging from banks, hospitals, to Internet corporations. By performing beam search in a tree-structured space, AutoCross enables efficient generation of high-order cross features, which is not yet visited by existing works. Additionally, we propose successive mini-batch gradient descent and multi-granularity discretization to further improve efficiency and effectiveness, while ensuring simplicity so that no machine learning expertise or tedious hyper-parameter tuning is required. Furthermore, the algorithms are designed to reduce the computational, transmitting, and storage costs involved in distributed computing. Experimental results on both benchmark and real-world business datasets demonstrate the effectiveness and efficiency of AutoCross. It is shown that AutoCross can significantly enhance the performance of both linear and deep models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/22/2020

DNN2LR: Interpretation-inspired Feature Crossing for Real-world Tabular Data

For sake of reliability, it is necessary for models in real-world applic...
research
02/24/2021

DNN2LR: Automatic Feature Crossing for Credit Scoring

Credit scoring is a major application of machine learning for financial ...
research
12/16/2018

Stochastic Distributed Optimization for Machine Learning from Decentralized Features

Distributed machine learning has been widely studied in the literature t...
research
07/10/2018

Automatic Gradient Boosting

Automatic machine learning performs predictive modeling with high perfor...
research
04/22/2021

XCrossNet: Feature Structure-Oriented Learning for Click-Through Rate Prediction

Click-Through Rate (CTR) prediction is a core task in nowadays commercia...
research
12/30/2021

THE Benchmark: Transferable Representation Learning for Monocular Height Estimation

Generating 3D city models rapidly is crucial for many applications. Mono...

Please sign up or login with your details

Forgot password? Click here to reset