Exploiting Synthetically Generated Data with Semi-Supervised Learning for Small and Imbalanced Datasets

03/24/2019
by   Maria Perez-Ortiz, et al.
0

Data augmentation is rapidly gaining attention in machine learning. Synthetic data can be generated by simple transformations or through the data distribution. In the latter case, the main challenge is to estimate the label associated to new synthetic patterns. This paper studies the effect of generating synthetic data by convex combination of patterns and the use of these as unsupervised information in a semi-supervised learning framework with support vector machines, avoiding thus the need to label synthetic examples. We perform experiments on a total of 53 binary classification datasets. Our results show that this type of data over-sampling supports the well-known cluster assumption in semi-supervised learning, showing outstanding results for small high-dimensional datasets and imbalanced learning problems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/18/2021

CReST: A Class-Rebalancing Self-Training Framework for Imbalanced Semi-Supervised Learning

Semi-supervised learning on class-imbalanced data, although a realistic ...
research
05/19/2022

A Topological Approach for Semi-Supervised Learning

Nowadays, Machine Learning and Deep Learning methods have become the sta...
research
09/29/2020

Weakly Supervised-Based Oversampling for High Imbalance and High Dimensionality Data Classification

With the abundance of industrial datasets, imbalanced classification has...
research
06/17/2020

Deep Categorization with Semi-Supervised Self-Organizing Maps

Nowadays, with the advance of technology, there is an increasing amount ...
research
08/05/2023

Generalized Oversampling for Learning from Imbalanced datasets and Associated Theory

In supervised learning, it is quite frequent to be confronted with real ...
research
03/09/2020

Actions speak louder than words: Semi-supervised learning for browser fingerprinting detection

As online tracking continues to grow, existing anti-tracking and fingerp...
research
07/26/2011

Submodular Optimization for Efficient Semi-supervised Support Vector Machines

In this work we present a quadratic programming approximation of the Sem...

Please sign up or login with your details

Forgot password? Click here to reset