Empirical Analysis of Lifelog Data using Optimal Feature Selection based Unsupervised Logistic Regression (OFS-ULR) Model with Spark Streaming

04/04/2022
by   Sadhana Tiwari, et al.
0

Recent advancement in the field of pervasive healthcare monitoring systems causes the generation of a huge amount of lifelog data in real-time. Chronic diseases are one of the most serious health challenges in developing and developed countries. According to WHO, this accounts for 73 60 now harnessing the potential of lifelog data to explore better healthcare practices. This paper is to construct an optimal feature selection-based unsupervised logistic regression model (OFS-ULR) to classify chronic diseases. Since lifelog data analysis is crucial due to its sensitive nature; thus the conventional classification models show limited performance. Therefore, designing new classifiers for the classification of chronic diseases using lifelog data is the need of the age. The vital part of building a good model depends on pre-processing of the dataset, identifying important features, and then training a learning algorithm with suitable hyper parameters for better performance. The proposed approach improves the performance of existing methods using a series of steps such as (i) removing redundant or invalid instances, (ii) making the data labelled using clustering and partitioning the data into classes, (iii) identifying the suitable subset of features by applying either some domain knowledge or selection algorithm, (iv) hyper parameter tuning for models to get best results, and (v) performance evaluation using Spark streaming environment. For this purpose, two-time series datasets are used in the experiment to compute the accuracy, recall, precision, and f1-score. The experimental analysis proves the suitability of the proposed approach as compared to the conventional classifiers and our newly constructed model achieved highest accuracy and reduced training complexity among all among all.

READ FULL TEXT

page 5

page 16

research
01/24/2019

A XGBoost risk model via feature selection and Bayesian hyper-parameter optimization

This paper aims to explore models based on the extreme gradient boosting...
research
09/02/2021

MrSQM: Fast Time Series Classification with Symbolic Representations

Symbolic representations of time series have proven to be effective for ...
research
02/21/2020

PIANO: A Fast Parallel Iterative Algorithm for Multinomial and Sparse Multinomial Logistic Regression

Multinomial Logistic Regression is a well-studied tool for classificatio...
research
08/16/2021

A complex network approach to time series analysis with application in diagnosis of neuromuscular disorders

Electromyography (EMG) refers to a biomedical signal indicating neuromus...
research
04/21/2020

On-the-Fly Joint Feature Selection and Classification

Joint feature selection and classification in an online setting is essen...
research
01/08/2020

Inflammatory Bowel Disease Biomarkers of Human Gut Microbiota Selected via Ensemble Feature Selection Methods

The tremendous boost in the next generation sequencing and in the omics ...
research
08/06/2022

Efficient Novelty Detection Methods for Early Warning of Potential Fatal Diseases

Fatal diseases, as Critical Health Episodes (CHEs), represent real dange...

Please sign up or login with your details

Forgot password? Click here to reset