Integrating Transformer and Autoencoder Techniques with Spectral Graph Algorithms for the Prediction of Scarcely Labeled Molecular Data

11/12/2022
by   Nicole Hayes, et al.
0

In molecular and biological sciences, experiments are expensive, time-consuming, and often subject to ethical constraints. Consequently, one often faces the challenging task of predicting desirable properties from small data sets or scarcely-labeled data sets. Although transfer learning can be advantageous, it requires the existence of a related large data set. This work introduces three graph-based models incorporating Merriman-Bence-Osher (MBO) techniques to tackle this challenge. Specifically, graph-based modifications of the MBO scheme is integrated with state-of-the-art techniques, including a home-made transformer and an autoencoder, in order to deal with scarcely-labeled data sets. In addition, a consensus technique is detailed. The proposed models are validated using five benchmark data sets. We also provide a thorough comparison to other competing methods, such as support vector machines, random forests, and gradient boosted decision trees, which are known for their good performance on small data sets. The performances of various methods are analyzed using residue-similarity (R-S) scores and R-S indices. Extensive computational experiments and theoretical analysis show that the new models perform very well even when as little as 1 labeled data.

READ FULL TEXT

page 14

page 15

page 16

research
05/25/2023

Persistent Laplacian-enhanced Algorithm for Scarcely Labeled Data Classification

The success of many machine learning (ML) methods depends crucially on h...
research
11/06/2020

A Scalable MIP-based Method for Learning Optimal Multivariate Decision Trees

Several recent publications report advances in training optimal decision...
research
04/16/2021

To Share or not to Share: Predicting Sets of Sources for Model Transfer Learning

In low-resource settings, model transfer can help to overcome a lack of ...
research
10/26/2019

Understanding Isomorphism Bias in Graph Data Sets

In recent years there has been a rapid increase in classification method...
research
08/01/2023

Semisupervised Anomaly Detection using Support Vector Regression with Quantum Kernel

Anomaly detection (AD) involves identifying observations or events that ...
research
07/16/2022

On the Subjectivity of Emotions in Software Projects: How Reliable are Pre-Labeled Data Sets for Sentiment Analysis?

Social aspects of software projects become increasingly important for re...
research
09/24/2020

SoRC – Evaluation of Computational Molecular Co-Localization Analysis in Mass Spectrometry Images

The computational analysis of Mass Spectrometry Imaging (MSI) data aims ...

Please sign up or login with your details

Forgot password? Click here to reset