fplyr: the split-apply-combine strategy for big data in R

05/10/2020
by   Federico Marotta, et al.
0

We present fplyr, a new package for the R language to deal with big files. It allows users to easily implement the split-apply-combine strategy for files that are too big to fit into the available memory, without relying on data bases nor introducing non-native R classes. A custom function can be applied independently to each group of observations, and the results may be either returned or directly printed to one or more output files.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/15/2021

BOSS: A Blockchain Off-State Sharing System

Blockchain has been applied to data sharing to ensure the integrity of d...
research
08/23/2018

The Optimal Memory-Rate Trade-off for the Non-uniform Centralized Caching Problem with Two Files under Uncoded Placement

We propose a novel caching strategy for the problem of centralized coded...
research
01/15/2008

MathPSfrag 2: Convenient LaTeX Labels in Mathematica

This article introduces the next version of MathPSfrag. MathPSfrag is a ...
research
12/01/2021

A unified framework to improve the interoperability between HPC and Big Data languages and programming models

One of the most important issues in the path to the convergence of HPC a...
research
03/20/2018

Big Data Challenges in Genome Informatics

In recent years, we have witnessed a dramatic data explosion in genomics...
research
05/05/2015

On the Feasibility of Distributed Kernel Regression for Big Data

In modern scientific research, massive datasets with huge numbers of obs...

Please sign up or login with your details

Forgot password? Click here to reset