Extending the R Language with a Scalable Matrix Summarization Operator

Analysts prefer simpler interpreted languages to program their computations. Prominent languages include R, Python, and Matlab. On the other hand, analysts aim to compute mathematical models as fast as possible, especially with large data sets. Data summarization remains a fundamental technique to accelerate machine learning computations. Based on this motivation, we propose a novel summarization mechanism computed via a single matrix multiplication in the statistical R language. We show our summarization benefits a large family of linear models, including Linear Regression, PCA, and Naive Bayes. We present a subsystem that enables exploiting summarization by detecting Gramian matrix products in R. We optimize the existing R source code by overriding the internal R matrix multiplication algorithm using ours. Our solution can be plugged into R and help solving where a similar matrix multiplication appears, much faster and without RAM limitations. Moreover, our solution can be benefited from the parallel processing ability of the summarization matrix. We present an experimental validation showing our subsystem incurs little overhead since it works on source code while providing much faster speeds compared to the R language built-in functions. To round up our comparisons, we also compare our subsystem with Spark in parallel machines. For our solution, we assume that data can be in the HDFS, disk, or already partitioned. Our solution triumphs Spark in most cases proving we can also compete in the big data space.

READ FULL TEXT

page 1

page 3

research
10/12/2021

Scalable machine learning in the R language using a summarization matrix

Big data analytics generally rely on parallel processing in large comput...
research
10/12/2021

A General Summarization Matrix for Scalable Machine Learning Model Computation in the R Language

Data analysis is an essential task for research. Modern large datasets i...
research
11/25/2022

Secure Distributed Gram Matrix Multiplication

The Gram matrix of a matrix A is defined as AA^T (or A^TA). Computing th...
research
05/12/2023

AMULET: Adaptive Matrix-Multiplication-Like Tasks

Many useful tasks in data science and machine learning applications can ...
research
05/26/2023

Kaczmarz-Type Method for Solving Matrix Equation AXB=C

In this paper, several row and column orthogonal projection methods are ...
research
02/06/2019

Fast Strassen-based A^t A Parallel Multiplication

Matrix multiplication A^t A appears as intermediate operation during the...

Please sign up or login with your details

Forgot password? Click here to reset