Boost AI Power: Data Augmentation Strategies with unlabelled Data and Conformal Prediction, a Case in Alternative Herbal Medicine Discrimination with Electronic Nose

02/05/2021
by   Li Liu, et al.
12

Electronic nose proves its effectiveness in alternativeherbal medicine classification, but due to the supervised learn-ing nature, previous research relies on the labelled training data,which are time-costly and labor-intensive to collect. Consideringthe training data inadequacy in real-world applications, this studyaims to improve classification accuracy via data augmentationstrategies. We stimulated two scenarios to investigate the effective-ness of five data augmentation strategies under different trainingdata inadequacy: in the noise-free scenario, different availability ofunlabelled data were simulated, and in the noisy scenario, differentlevels of Gaussian noises and translational shifts were added tosimulate sensor drifts. The augmentation strategies: noise-addingdata augmentation, semi-supervised learning, classifier-based online learning, inductive conformal prediction (ICP) onlinelearning and the novel ensemble ICP online learning proposed in this study, were compared against supervised learningbaseline, with Linear Discriminant Analysis (LDA) and Support Vector Machine (SVM) as the classifiers. We found thatat least one strategies significantly improved the classification accuracy with LDA(p<=0.05) and showed non-decreasingclassification accuracy with SVM in each tasks. Moreover, our novel strategy: ensemble ICP online learning outperformedthe others by showing non-decreasing classification accuracy on all tasks and significant improvement on most tasks(25/36 tasks,p<=0.05). This study provides a systematic analysis over augmentation strategies, and we provided userswith recommended strategies under specific circumstances. Furthermore, our newly proposed strategy showed botheffectiveness and robustness in boosting the classification model generalizability, which can also be further employed inother machine learning applications.

READ FULL TEXT

page 10

page 13

page 17

page 18

page 19

page 20

research
07/02/2020

Can We Achieve More with Less? Exploring Data Augmentation for Toxic Comment Classification

This paper tackles one of the greatest limitations in Machine Learning: ...
research
08/10/2022

Classifier Transfer with Data Selection Strategies for Online Support Vector Machine Classification with Class Imbalance

Objective: Classifier transfers usually come with dataset shifts. To ove...
research
05/22/2010

Incremental Training of a Detector Using Online Sparse Eigen-decomposition

The ability to efficiently and accurately detect objects plays a very cr...
research
07/04/2020

Building a Competitive Associative Classifier

With the huge success of deep learning, other machine learning paradigms...
research
06/18/2018

An Ensemble of Transfer, Semi-supervised and Supervised Learning Methods for Pathological Heart Sound Classification

In this work, we propose an ensemble of classifiers to distinguish betwe...
research
02/07/2022

SODA: Self-organizing data augmentation in deep neural networks – Application to biomedical image segmentation tasks

In practice, data augmentation is assigned a predefined budget in terms ...
research
09/02/2021

Two Shifts for Crop Mapping: Leveraging Aggregate Crop Statistics to Improve Satellite-based Maps in New Regions

Crop type mapping at the field level is critical for a variety of applic...

Please sign up or login with your details

Forgot password? Click here to reset