Synthetic Embedding-based Data Generation Methods for Student Performance

01/03/2021
by   Dom Huh, et al.
3

Given the inherent class imbalance issue within student performance datasets, samples belonging to the edges of the target class distribution pose a challenge for predictive machine learning algorithms to learn. In this paper, we introduce a general framework for synthetic embedding-based data generation (SEDG), a search-based approach to generate new synthetic samples using embeddings to correct the detriment effects of class imbalances optimally. We compare the SEDG framework to past synthetic data generation methods, including deep generative models, and traditional sampling methods. In our results, we find SEDG to outperform the traditional re-sampling methods for deep neural networks and perform competitively for common machine learning classifiers on the student performance task in several standard performance metrics.

READ FULL TEXT

page 4

page 7

page 11

page 13

page 14

page 16

page 17

research
06/20/2022

Convex space learning improves deep-generative oversampling for tabular imbalanced classification on smaller datasets

Data is commonly stored in tabular format. Several fields of research (e...
research
06/29/2023

Synthetic Demographic Data Generation for Card Fraud Detection Using GANs

Using machine learning models to generate synthetic data has become comm...
research
08/02/2023

Exploiting Synthetic Data for Data Imbalance Problems: Baselines from a Data Perspective

We live in a vast ocean of data, and deep neural networks are no excepti...
research
01/14/2022

Synthesising Electronic Health Records: Cystic Fibrosis Patient Group

Class imbalance can often degrade predictive performance of supervised l...
research
05/07/2020

Minority Class Oversampling for Tabular Data with Deep Generative Models

In practice, data scientists are often confronted with imbalanced data. ...
research
05/04/2021

Out-of-distribution Detection and Generation using Soft Brownian Offset Sampling and Autoencoders

Deep neural networks often suffer from overconfidence which can be partl...
research
04/25/2017

Deep Over-sampling Framework for Classifying Imbalanced Data

Class imbalance is a challenging issue in practical classification probl...

Please sign up or login with your details

Forgot password? Click here to reset