scatteR: Generating instance space based on scagnostics

09/10/2022
by   Janith C. Wanniarachchi, et al.
0

Modern synthetic data generators consist of model-based methods where the focus is primarily on tuning the parameters of the model and not on specifying the structure of the data itself. Scagnostics is an exploratory graphical method, capable of encapsulating the structure of bivariate data through graph-theoretic measures. An inverse scagnostic measure would therefore provide an entry point to generate datasets based on the characteristics of instance space rather than a model-based simulation approach. scatteR is a novel data generation method with controllable characteristics based on scagnostic measurements. We have used a Generalized Simulated Annealing optimizer iteratively to discover the optimal arrangement of data points in each iteration that minimizes the distance between the current and target measurements. Generally, as a pedagogical tool, scatteR can be used to generate datasets to teach statistical methods. Based on the results of this study, scatteR is capable of generating 50 data points in under 30 seconds with a 0.05 Root Mean Squared Error on average.

READ FULL TEXT

page 11

page 12

research
08/02/2023

Towards optimal sensor placement for inverse problems in spaces of measures

This paper studies the identification of a linear combination of point s...
research
09/27/2012

Reclassification formula that provides to surpass K-means method

The paper presents a formula for the reclassification of multidimensiona...
research
02/21/2019

Malaria Incidence in the Philippines: Prediction using the Autoregressive Moving Average Models

The study was conducted to develop an appropriate model that could predi...
research
02/04/2020

Optimal quantization of the mean measure and application to clustering of measures

This paper addresses the case where data come as point sets, or more gen...
research
09/19/2012

Comunication-Efficient Algorithms for Statistical Optimization

We analyze two communication-efficient algorithms for distributed statis...
research
08/18/2020

Deep Learning-based Signal Strength Prediction Using Geographical Images and Expert Knowledge

Methods for accurate prediction of radio signal quality parameters are c...
research
09/24/2021

Predicting pigging operations in oil pipelines

This paper presents an innovative machine learning methodology that leve...

Please sign up or login with your details

Forgot password? Click here to reset