A tool framework for tweaking features in synthetic datasets

01/11/2018
by   J. W. Zhang, et al.
0

Researchers and developers use benchmarks to compare their algorithms and products. A database benchmark must have a dataset D. To be application-specific, this dataset D should be empirical. However, D may be too small, or too large, for the benchmarking experiments. D must, therefore, be scaled to the desired size. To ensure the scaled D' is similar to D, previous work typically specifies or extracts a fixed set of features F = F_1, F_2, . . . , F_n from D, then uses F to generate synthetic data for D'. However, this approach (D -> F -> D') becomes increasingly intractable as F gets larger, so a new solution is necessary. Different from existing approaches, this paper proposes ASPECT to scale D to enforce similarity. ASPECT first uses a size-scaler (S0) to scale D to D'. Then the user selects a set of desired features F'_1, . . . , F'_n. For each desired feature F'_k, there is a tweaking tool T_k that tweaks D' to make sure D' has the required feature F'_k. ASPECT coordinates the tweaking of T_1,...,T_n to D', so T_n(...(T_1(D'))...) has the required features F'_1,...,F'_n. By shifting from D -> F -> D' to D -> D' -> F', data scaling becomes flexible. The user can customise the scaled dataset with their own interested features. Extensive experiments on real datasets show that ASPECT can enforce similarity in the dataset effectively and efficiently.

READ FULL TEXT

page 9

page 16

page 17

page 18

page 19

research
09/27/2016

Benchmarking the Graphulo Processing Framework

Graph algorithms have wide applicablity to a variety of domains and are ...
research
05/06/2020

A Comprehensive Survey on Outlying Aspect Mining Methods

In recent years, researchers have become increasingly interested in outl...
research
10/15/2021

Aspect-Oriented Summarization through Query-Focused Extraction

A reader interested in a particular topic might be interested in summari...
research
10/04/2022

A Framework for Large Scale Synthetic Graph Dataset Generation

Recently there has been increasing interest in developing and deploying ...
research
08/24/2022

A Hierarchical Interactive Network for Joint Span-based Aspect-Sentiment Analysis

Recently, some span-based methods have achieved encouraging performances...
research
06/15/2022

Condensing Graphs via One-Step Gradient Matching

As training deep learning models on large dataset takes a lot of time an...
research
06/07/2023

A Unified One-Step Solution for Aspect Sentiment Quad Prediction

Aspect sentiment quad prediction (ASQP) is a challenging yet significant...

Please sign up or login with your details

Forgot password? Click here to reset