An "On The Fly" Framework for Efficiently Generating Synthetic Big Data Sets

03/12/2019
by   Karl Mason, et al.
0

Collecting, analyzing and gaining insight from large volumes of data is now the norm in an ever increasing number of industries. Data analytics techniques, such as machine learning, are powerful tools used to analyze these large volumes of data. Synthetic data sets are routinely relied upon to train and develop such data analytics methods for several reasons: to generate larger data sets than are available, to generate diverse data sets, to preserve anonymity in data sets with sensitive information, etc. Processing, transmitting and storing data is a key issue faced when handling large data sets. This paper presents an "On the fly" framework for generating big synthetic data sets, suitable for these data analytics methods, that is both computationally efficient and applicable to a diverse set of problems. An example application of the proposed framework is presented along with a mathematical analysis of its computational efficiency, demonstrating its effectiveness.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/19/2018

Big Data Analytics for Wireless and Wired Network Design: A Survey

Currently, the world is witnessing a mounting avalanche of data due to t...
research
07/09/2018

Multicore architecture and cache optimization techniques for solving graph problems

With the advent of era of Big Data and Internet of Things, there has bee...
research
10/09/2017

Big Data Analytics and Its Applications

The term, Big Data, has been authored to refer to the extensive heave of...
research
12/01/2018

Towards Gaussian Bayesian Network Fusion

Data sets are growing in complexity thanks to the increasing facilities ...
research
01/31/2020

An efficient automated data analytics approach to large scale computational comparative linguistics

This research project aimed to overcome the challenge of analysing human...
research
02/23/2017

A Unified Parallel Algorithm for Regularized Group PLS Scalable to Big Data

Partial Least Squares (PLS) methods have been heavily exploited to analy...
research
12/10/2015

Simulation and Analysis of Container Freight Train Operations at Port Botany

Over two million containers crossed the docks at Sydney's Port Botany in...

Please sign up or login with your details

Forgot password? Click here to reset