ROSA: R Optimizations with Static Analysis

04/10/2017
by   Rathijit Sen, et al.
0

R is a popular language and programming environment for data scientists. It is increasingly co-packaged with both relational and Hadoop-based data platforms and can often be the most dominant computational component in data analytics pipelines. Recent work has highlighted inefficiencies in executing R programs, both in terms of execution time and memory requirements, which in practice limit the size of data that can be analyzed by R. This paper presents ROSA, a static analysis framework to improve the performance and space efficiency of R programs. ROSA analyzes input programs to determine program properties such as reaching definitions, live variables, aliased variables, and types of variables. These inferred properties enable program transformations such as C++ code translation, strength reduction, vectorization, code motion, in addition to interpretive optimizations such as avoiding redundant object copies and performing in-place evaluations. An empirical evaluation shows substantial reductions by ROSA in execution time and memory consumption over both CRAN R and Microsoft R Open.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/06/2018

An Approach to Static Performance Guarantees for Programs with Run-time Checks

Instrumenting programs for performing run-time checking of properties, s...
research
07/30/2020

New approach to MPI program execution time prediction

The problem of MPI programs execution time prediction on a certain set o...
research
09/18/2020

Out of Sight, Out of Place: Detecting and Assessing Swapped Arguments

Programmers often add meaningful information about program semantics whe...
research
07/23/2020

Dataflow Analysis With Prophecy and History Variables

Leveraging concepts from state machine refinement proofs, we use prophec...
research
09/06/2018

Safe Execution of Concurrent Programs by Enforcement of Scheduling Constraints

Automated software verification of concurrent programs is challenging be...
research
04/26/2019

Simulating Execution Time of Tensor Programs using Graph Neural Networks

Optimizing the execution time of tensor program, e.g., a convolution, in...
research
11/09/2020

Batchwise Probabilistic Incremental Data Cleaning

Lack of data and data quality issues are among the main bottlenecks that...

Please sign up or login with your details

Forgot password? Click here to reset