Utilizing Semantic Textual Similarity for Clinical Survey Data Feature Selection

08/19/2023
by   Benjamin C. Warner, et al.
0

Survey data can contain a high number of features while having a comparatively low quantity of examples. Machine learning models that attempt to predict outcomes from survey data under these conditions can overfit and result in poor generalizability. One remedy to this issue is feature selection, which attempts to select an optimal subset of features to learn upon. A relatively unexplored source of information in the feature selection process is the usage of textual names of features, which may be semantically indicative of which features are relevant to a target outcome. The relationships between feature names and target names can be evaluated using language models (LMs) to produce semantic textual similarity (STS) scores, which can then be used to select features. We examine the performance using STS to select features directly and in the minimal-redundancy-maximal-relevance (mRMR) algorithm. The performance of STS as a feature selection metric is evaluated against preliminary survey data collected as a part of a clinical study on persistent post-surgical pain (PPSP). The results suggest that features selected with STS can result in higher performance models compared to traditional feature selection algorithms.

READ FULL TEXT
research
11/05/2021

Automated Supervised Feature Selection for Differentiated Patterns of Care

An automated feature selection pipeline was developed using several stat...
research
02/20/2020

Pulsars Detection by Machine Learning with Very Few Features

It is an active topic to investigate the schemes based on machine learni...
research
08/14/2018

Generative Invertible Networks (GIN): Pathophysiology-Interpretable Feature Mapping and Virtual Patient Generation

Machine learning methods play increasingly important roles in pre-proced...
research
08/15/2019

Maximum Relevance and Minimum Redundancy Feature Selection Methods for a Marketing Machine Learning Platform

In machine learning applications for online product offerings and market...
research
03/05/2022

DroidRL: Reinforcement Learning Driven Feature Selection for Android Malware Detection

Due to the completely open-source nature of Android, the exploitable vul...
research
11/17/2015

Sacrificing information for the greater good: how to select photometric bands for optimal accuracy

Large-scale surveys make huge amounts of photometric data available. Bec...
research
09/07/2016

Object Tracking via Dynamic Feature Selection Processes

DFST proposes an optimized visual tracking algorithm based on the real-t...

Please sign up or login with your details

Forgot password? Click here to reset