Accelerating Machine Learning Queries with Linear Algebra Query Processing

06/14/2023
by   Wenbo Sun, et al.
0

The rapid growth of large-scale machine learning (ML) models has led numerous commercial companies to utilize ML models for generating predictive results to help business decision-making. As two primary components in traditional predictive pipelines, data processing, and model predictions often operate in separate execution environments, leading to redundant engineering and computations. Additionally, the diverging mathematical foundations of data processing and machine learning hinder cross-optimizations by combining these two components, thereby overlooking potential opportunities to expedite predictive pipelines. In this paper, we propose an operator fusing method based on GPU-accelerated linear algebraic evaluation of relational queries. Our method leverages linear algebra computation properties to merge operators in machine learning predictions and data processing, significantly accelerating predictive pipelines by up to 317x. We perform a complexity analysis to deliver quantitative insights into the advantages of operator fusion, considering various data and model dimensions. Furthermore, we extensively evaluate matrix multiplication query processing utilizing the widely-used Star Schema Benchmark. Through comprehensive evaluations, we demonstrate the effectiveness and potential of our approach in improving the efficiency of data processing and machine learning workloads on modern hardware.

READ FULL TEXT

page 8

page 9

page 10

page 11

research
05/31/2022

End-to-end Optimization of Machine Learning Prediction Queries

Prediction queries are widely used across industries to perform advanced...
research
07/05/2019

Visus: An Interactive System for Automatic Machine Learning Model Building and Curation

While the demand for machine learning (ML) applications is booming, ther...
research
10/26/2022

A case for disaggregation of ML data processing

Machine Learning (ML) computation requires feeding input data for the mo...
research
03/30/2021

Production Machine Learning Pipelines: Empirical Analysis and Optimization Opportunities

Machine learning (ML) is now commonplace, powering data-driven applicati...
research
12/14/2021

TCUDB: Accelerating Database with Tensor Processors

The emergence of novel hardware accelerators has powered the tremendous ...
research
10/06/2016

Near-Data Processing for Differentiable Machine Learning Models

Near-data processing (NDP) refers to augmenting memory or storage with p...
research
10/05/2021

Scalable Relational Query Processing on Big Matrix Data

The use of large-scale machine learning methods is becoming ubiquitous i...

Please sign up or login with your details

Forgot password? Click here to reset