AC/DC: In-Database Learning Thunderstruck

03/20/2018
by   Mahmoud Abo Khamis, et al.
0

We report on the design and implementation of the AC/DC gradient descent solver for a class of optimization problems over normalized databases. AC/DC decomposes an optimization problem into a set of aggregates over the join of the database relations. It then uses the answers to these aggregates to iteratively improve the solution to the problem until it converges. The challenges faced by AC/DC are the large database size, the mixture of continuous and categorical features, and the large number of aggregates to compute. AC/DC addresses these challenges by employing a sparse data representation, factorized computation, problem reparameterization under functional dependencies, and a data structure that supports shared computation of aggregates. To train polynomial regression models and factorization machines of up to 141K features over the join of a real-world dataset of up to 86M tuples, AC/DC needs up to 30 minutes on one core of a commodity machine. This is up to three orders of magnitude faster than its competitors R, MadLib, libFM, and TensorFlow whenever they finish and thus do not exceed memory limitation, 24-hour timeout, or internal design limitations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/20/2019

A Layered Aggregate Engine for Analytics Workloads

This paper introduces LMFAO (Layered Multiple Functional Aggregate Optim...
research
03/29/2023

An inexact linearized proximal algorithm for a class of DC composite optimization problems and applications

This paper is concerned with a class of DC composite optimization proble...
research
12/06/2019

Communication-Efficient Network-Distributed Optimization with Differential-Coded Compressors

Network-distributed optimization has attracted significant attention in ...
research
04/15/2019

Single Machine Graph Analytics on Massive Datasets Using Intel Optane DC Persistent Memory

Intel Optane DC Persistent Memory is a new kind of byte-addressable memo...
research
11/06/2019

DC-S3GD: Delay-Compensated Stale-Synchronous SGD for Large-Scale Decentralized Neural Network Training

Data parallelism has become the de facto standard for training Deep Neur...
research
05/29/2019

3D Multi-Drone-Cell Trajectory Design for Efficient IoT Data Collection

Drone cell (DC) is an emerging technique to offer flexible and cost-effe...
research
07/30/2022

Optimizing Differentially-Maintained Recursive Queries on Dynamic Graphs

Differential computation (DC) is a highly general incremental computatio...

Please sign up or login with your details

Forgot password? Click here to reset