Analyzing the patterns from vital signs can help us to detect or predict physiological abnormalities. In the case of patients in critical care units, continuous monitoring and analysis of vital signs are critical to detect or predict any medical emergency . For example, a hypotensive  episode can be defined as an episode of abnormally low blood pressure that may hamper the supply of oxygen and vital nutrients to organs and may cause organ failure which may be fatal [3, 4].
Several works [1, 5] show that, by analyzing patterns in Blood Pressure continuously, it is possible to predict a hypotensive episode well ahead of time. In , a multimodal approach which includes vital signs as one of the parameters is used for detecting pain in neonates. The method combines predictions from different types of data such as facial expression, crying sound, body movements, and vital signs. This work proves that pain can be detected by analyzing vital signs. Vital signs also give information about the stress levels of a person apart from medical conditions. For example, several works [7, 8]
have shown that analysis of patterns from vital signs can help us to detect if a person is under mental stress. Different machine learning algorithms have been used to recognize patterns from vital signs. For example, in5]9], vital sign signals (time-series data) are encoded as pixels of an image at first and then analyzed using Gramian angular fields. Spectrograms have been used to represent ECG signals as images. For example, in , spectrogram has been used for analyzing ECG signals to detect obstructive sleep apnea. There is some research involving spectrogram analysis of Radar signals for detection of vital signs like [11, 12]. To the best of our knowledge, there is no work analyzing patterns from vital signs specifically Heart Rate, Respiratory Rate, Blood Pressure using spectrograms. This paper explores the usage of spectrograms for pattern recognition from vital signs.
Vital signs have low variability, low sampling frequency, and almost no seasonality. Due to this, it is difficult to extract patterns from vital signs using spectrograms directly. To solve this issue, we propose a novel solution for this using Frequency Modulation. Frequency modulation translates the amplitude of a vital signs signal to the frequency of another signal. In essence, the frequency of the other signal carries the information about the amplitude of the vital signs signal. By this method, we introduce high-frequency variability into the signal and also carry information about the amplitude of vital signs in frequency which can be analyzed using spectrograms.
to translate a time-domain signal into frequency domain signal. It divides the whole signal into individual portions or windows and calculates Fourier transform on each window and stitch them back into a single image revealing dominant frequencies in different windows. Spectrograms are typically applied to audio signals as they are best suited for frequency domain analysis. They can also be applied to time-series signals which exhibit seasonality. In Figure1, we can observe that, as there are high variations in frequency and amplitude in the audio signal and many different frequency components with high sampling frequency, we can easily obtain frequency domain information using spectrograms. On the contrary, in the case of vital sign signal, there is no much variability, sampling frequency is low, and the spectrograms does not exhibit any useful patterns.
Ii-B Frequency Modulation
Frequency Modulation (FM)  is translating information or amplitude of a signal into the frequency of another signal. FM deals with 3 signals namely carrier signal, modulating signal, modulated signal. The carrier signal is typically a high-frequency signal which will be modulated. The modulating signal is a message signal which carries information in its amplitude. Using FM, the information in the message signal is encoded as a proportional frequency of the carrier signal. In essence, the frequency of carrier signal varies proportionally to the amplitude of the message signal thereby carrying the information in its frequency. A frequency-domain analysis on FM signal will reveal the information of the original message signal. Equation for frequency modulation  can be given as, where, stands for frequency modulated signal in time-domain, stands for carrier signal amplitude, stands for amplitude of message or modulating signal, stands for instantaneous time, and stands for modulation index which decides by how much factor does the frequency of carrier wave varies with the amplitude of modulating signal. Modulation index can be given by where, f stands for frequency deviation which gives the information about how much frequency should carrier wave change in accordance with the amplitude of modulating signal.
Ii-C Deep Neural Networks and Data Augmentation
are used for feature extraction from images. Few popular CNN architectures include VGG16[19, 14] and ResNet [20, 14]. CNNs require large amount of data point which is rare in medical datasets [21, 16, 22, 23]. To increase the number of samples, data augmentation techniques are performed as described in [14, 24]
. We particularly use Gaussian noise which is a set of random numbers distributed normally over a given mean and within a standard deviation. An example of this noise addition is shown in Figure2. It can be observed that the signal after adding noise well preserves the characteristics of the original signal.
The proposed approach has two main steps. The first one is to reconstruct the vital sign signal using the FM technique and generate the spectrogram from the new modified signals. The second one is to extract the features from the spectrogram using CNN. Figure 3 shows the overall pipeline of the proposed approach.
Iii-a Reconstruction of Vital Sign Signal Using FM
As mentioned earlier, vital signs have a low sampling rate, low variability which may not be suited to be analyzed using spectrograms as shown in Figure 5. As vital signs signal is a time-series signal, we need to analyze the instantaneous values of the signal at different points of time. As frequency modulation encodes amplitude as the frequency of another signal, frequency domain analysis of the new signal will provide us information about the original signal. For the purpose of the experiments presented in this paper, the carrier signal frequency() was set at and the frequency deviation () used was .
Figure 4 shows how the frequency of frequency modulated wave varies proportionally to the amplitude of sinusoidal modulating signal. Applying spectrogram on frequency modulated signal will help us get back the modulating sinusoidal signal as shown in the fourth tile of the Figure. In the Figure 5 and Figure 6 the vital signs signals, the corresponding reconstructed FM signals, and spectrogram of reconstructed signals are plotted for 2 different classes from USF-MNPAD-I dataset . We can see that the frequency of reconstructed signal varies proportionally to the amplitude of vital signs signal. It can also be observed that information from the original vital signs signal is encoded as frequency and that information can be retrieved back by applying spectrogram on the reconstructed signal. But the spectrogram of original signal does not show any patterns comparatively. Spectrograms uncover the frequency components in a given time-series signal and the FM signal’s frequency has information of the amplitude of input signal.
Iii-B Classification Using CNN
The generated spectrograms are fed into CNNs for pattern recognition. VGG16  network with an input image size of
has been predominantly used in this paper for the classification of spectrograms. Subject-wise cross-validation has been used to validate the model on different portions of data to get a true estimate of the performance of the model. In this experiment, we useSnapshot Ensemble 
which takes multiple snapshots of weights of models at different epochs during training. Doing this will reduce the consumption of computational resources. For the final predictions, an odd number of snapshots are ensembled by voting method. Further details can be found in the individual experiment sections.
Iv Experimental Results and Discussion
To evaluate the efficacy of the proposed approach, the method is tested on 4 medical datasets. These are MIMIC-III dataset [21, 27, 28], Pediatric Intensive Care dataset (PIC) [16, 28], USF-MNPAD-I , and Non-EEG dataset [22, 28]. Among these, MIMIC-III  and PIC  are used for prediction tasks, while USF-MNPAD-I  and Non-EEG  datasets are used for detection tasks.
Iv-A1 MIMIC-III Dataset
MIMIC-III [21, 27, 28] consists of deidentified medical data of over 40,000 patients who stayed in critical care units of the Beth Israel Deaconess Medical Center between 2001 and 2012. It includes information such as demographics, vital sign measurements made at the bedside (1 data point per hour), mortality, medications, and so on. The data covers 38,597 distinct adult patients and 49,785 hospital admissions. The median age of adult patients is 65.8 years (Q1–Q3: 52.8–77.8). In this work, this dataset is used to predict an imminent Hypotensive episode using Mean Arterial Pressure as the feature obtained from the dataset.
Iv-A2 Paediatric Intensive Care (PIC) Dataset
The Paediatric Intensive Care (PIC) dataset [16, 28] consists of medical information of de-identified children admitted to critical care units at a children’s hospital in China. It includes vital sign measurements, medications, laboratory measurements, fluid balance recordings, and so on. The total number of patients in this dataset is 13,941. There is a separate section for vital signs measurement during surgery of neonates which has been used to predict a hypotensive episode. It has information about the specific age category of pediatric patients. The age of subjects ranges from 0-18 years who fall under this category. For the purpose of experiment, only neonates are considered. In this work, we have used systolic pressure obtained during surgery as a feature to predict an imminent hypotensive episode.
USF-MNPAD-I  dataset contains 58 neonates which were collected during procedural (acute) and postoperative (prolong acute) procedures. Among them, 13 procedural subjects have vital signs sequences which include Heart Rate, Blood Pressure, and Oxygen saturation. It has already been proved in  that vital signs give away patterns regarding the pain state of a neonate. In this experiment, similar to , classification of pain and no-pain based on vital signs is explored using the reconstructed signal spectrogram approach. As oxygen saturation does not change much and even reconstructed signals cannot give out much patterns, it has been discarded for this experiment. Every sequence has 10 seconds sample with a data point for every 1 second along with ground truth values indicating if the baby is experiencing pain or no-pain.
Iv-A4 A Non-EEG Biosignals Dataset (NEBD)
Non-EEG Biosignals Dataset for Assessment and Visualization of Neurological Status (NEBD) [22, 28] contains Non-EEG signals like Heart Rate, SpO2, EDA, Temperature, accelerometer readings of 20 healthy subjects undergoing phases of relaxation and stress for every 5 minutes alternatively. Different signals are sampled at different frequencies. In this work, we use the Heart Rate signal as a feature to detect and classify stress and relaxation.
|Moghadam et al. ||Non-hypotensive||90.55||92.18||91.32||90.67||0.96|
|Moghadam et al. ||Non-hypotensive||86.90||88.24||87.36||86.90||0.95|
|Jafari et al. ||Relaxation||75.00||80.00||77.42||76.67||0.77|
USF-MNPAD-I (Heart Rate)
|Zamzmi et al. ||No-pain||72.58||78.95||75.63||60.82||0.49|
USF-MNPAD-I (Respiration Rate)
|Zamzmi et al. ||No-pain||82.26||89.47||85.71||78.73||0.69|
Iv-B Experimental Details
Iv-B1 Experiment on MIMIC-III
This experiment aims to predict an imminent hypotensive episode using Mean Arterial Pressure data obtained from the MIMIC-III dataset. It has been demonstrated by [1, 5] that we can predict an imminent hypotensive episode by learning patterns from Blood Pressure. As MIMIC-III has sufficient data points to learn patterns, this dataset has been chosen for the experiment. Adult patients with age range (17-90 years) are considered for this experiment. We divide our experiment into 3 parts: Observation window, Gap window, and Target window. We train our model on the data points present in the “Observation window” which has a duration of 2 hours. Then, we predict if a hypotensive episode will occur in the target window with a gap window of 1 hour. According to , the hypotensive episode is defined as mean arterial pressure is less than 60 mm Hg for at least 30 minutes. This is the ground truth for our research.
In short, for the experiment, the final dataset includes 142 unique patients (72 hypotensive, 70 non-hypotensive), 1 sample per patient, and a sampling frequency of 1 data-point per hour. The data augmentation has been done 14 times for samples of both the classes by adding random Gaussian noise of Mean = 0 and STD = 3. Finally, total number of samples are 2130 = 70 (no hypotensive original) + 72 (hypotensive original) + 70*14 (no-hypotensive augmented) + 72*14 (hypotensive augmented). During the training, VGG16 CNN models are trained for 12 epochs by taking a snapshot of weights for every 2 epochs and final predictions are taken using majority voting from snapshots of 12, 10, 8 epochs. RMSProp optimizer has been used with a learning rate of 0.0001. The number of epochs is less because the validation loss starts increasing after 12 epochs. The model is validated using subject-wise 10-fold cross-validation.
Iv-B2 Experiment on PIC
In this experiment, Systolic Blood Pressure values obtained during surgery are used to predict an imminent hypotensive episode before 5 minutes of its occurrence. Data points are split into observation window (20 minutes), gap window (5 minutes), and target window (25 minutes), just like we have used for the MIMIC-III dataset. According to University of Iowa Steady Children’s Hospital, hypotension for neonates is defined as systolic pressure less than 60 mmHg. This threshold has been used to generate the ground truth for hypotensive and non-hypotensive episodes. The systolic pressure signal is split into 3 windows namely Observation window of 20 minutes, Gap window of 5 minutes, and Target window of 20 minutes. Using the data from the observation window, we predict with a gap of 5 minutes whether the target window is going to be a hypotensive episode or not.
In short, for the experiment, the final dataset includes 300 unique patients (225 hypotensive, 75 non-hypotensive), variable number of samples per patient, and a sampling frequency of 1 data-point per 5 minutes. After performing data augmentation with Gaussian Noise of Mean = 0 and STD = 3, the dataset includes 5094 = 255 (hypotensive original ) + 318 (non hypotensive original) + 255*9 (Hypotensive augmented) + 318*7 (Non-Hypotensive augmented) number of samples. During the training, VGG16 CNN models are trained for 24 epochs by taking a snapshot of weights for every 3 epochs and final predictions are taken using majority voting from snapshots of 18, 21, 24 epochs. RMSProp optimizer has been used with a learning rate of 0.00001. The model is validated using subject-wise 10-fold cross-validation.
Iv-B3 Experiment on USF-MNPAD-I
The data considered for the experiment is a 10 seconds sample with a data point for every 1 second. The data is collected in different states like 1 minute before the operation, during the operation, and up to 5 minutes post-operation. Every state has 10 seconds sample along with ground truth values indicating if the baby is experiencing pain or no-pain.
In short, for the experiment, the final dataset includes 12 unique patients of a total 78 number of samples (57 pain, 21 no-pain), variable number of samples per patient, and a sampling frequency of 1 data-point per second. After performing data augmentation, the dataset includes 1137 = 57 (no-pain original) + 21 (pain original) + 57*9 (No-pain augmented) + 21*26 (Pain augmented) number of samples. Data augmentation is done by adding Gaussian noise with Mean = 0 and STD = 3. As the number of subjects, samples and consequently the spectrograms are less, VGG16 may not be a good model to train as it is way too complex with millions of parameters and may over-fit. To overcome this problem, a shallow model (Table II) is adopted which has fewer parameters. The model shown in Table II has 113,489 parameters which are lesser than VGG16 and does not tend to overfit. The experiment is done using subject-wise leave-one-out cross-validation and the snapshot ensemble method is used on every fold. During the training, the model was trained for 45 epochs by taking a snapshot of weights for every 3 epochs and final predictions are taken using majority voting from snapshots of 39, 42, 45 epochs. RMSProp optimizer has been used with a learning rate of 0.0001. The model is validated using subject-wise leave-one-out cross-validation.
Iv-B4 Experiment on NEBD
The experiment performed in this dataset is about classifying stress and relaxed states of 20 subjects based on their Heart Rate signal. In short, for the experiment, the final dataset includes 20 unique patients of a total 120 number of samples (60 stress, 60 relax), 6 number of samples per patient, and a sampling frequency of 1 data point per second. The dataset has ground truth information regarding stress and relaxation. After performing data augmentation, with Gaussian Noise (Mean = 0 and STD = 1) the dataset includes total 3480 (60*28 augmentations for stress + 60*28 augmentations for relaxation + 60 original for stress + 60 original for relaxation) number of samples. During the training, VGG16 models were trained for 30 epochs by taking a snapshot of weights for every 3 epochs and final predictions are taken using majority voting from snapshots of 24, 27, 30 epochs. RMSProp optimizer has been used with a learning rate of 0.00001. The model is validated using subject-wise leave-one-out cross-validation.
Iv-C Result Analysis
From different experiments (Table I), it can be observed that the proposed method can perform well on different vital signs signals such as Blood Pressure, Heart Rate, Respiratory Rate while using reconstructed spectrogram compared to regular spectrogram image.
In case of prediction tasks, for the MIMIC-III dataset , the proposed method has better precision, recall, F1-score for hypotensive class providing an accuracy of 91.55% compared to the baseline method (90.67%). In case of the PIC dataset , to the best of our knowledge, there is no research available that predicts an imminent hypotensive episode for neonates during surgery. The proposed method has better overall precision, recall, F-1 score and providing an accuracy of 87.61% compared to the baseline method (86.90%).
For classification tasks, the proposed approach shows both better accuracy and AUC. In case of the NEBD dataset , which is about classifying stress and relaxation, the proposed method shows better performance than the baseline method in all the metrics. In case of the USF-MNPAD-I dataset of Heart Rate, the proposed method had better overall precision, recall, F-1 score, accuracy, AUC compared to the existing baseline approach. But in the case of Respiratory Rate, the baseline approach performs better than the proposed method in all the metrics. That reveals that for respiration rate it is not able to generate a good pattern using the spectrogram features.
Spectrograms are well suited for audio signals to capture frequency variations. However, in the case of vital signs, it is difficult to use regular spectrograms as input to CNN as they do not show patterns and variations like audio signals. To solve this issue, in this paper, we proposed applying frequency modulation on vital signs and applying spectrograms on FM signals. The proposed method shows better performance in terms of several metrics on several datasets. This method can be helpful when using multi-modal approach when we want to combine results from audio, video and time series data using CNNs. We can also expand the usage of the proposed method to other uni-variate time-series signals such as temperature, EDA, accelerometer readings.
-  E. Tsur, M. Last, V. F. Garcia, R. Udassin, M. Klein, and E. Brotfain, “Hypotensive episode prediction in icus via observation window splitting,” in Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 2018, pp. 472–487.
-  M. Sahni and S. Jain, “Hypotension in neonates,” NeoReviews, vol. 17, no. 10, pp. e579–e589, 2016.
-  M. J. Patel and J. A. De Lemos, “Hypotension,” in Decision Making in Medicine (Third Edition), 3rd ed., S. B. Mushlin and H. L. Greene, Eds. Philadelphia: Mosby, 2010, pp. 76–77.
-  M. S. Zenati, T. R. Billiar, R. N. Townsend, A. B. Peitzman, and B. G. Harbrecht, “A brief episode of hypotension increases mortality in critically ill trauma patients,” Journal of Trauma and Acute Care Surgery, vol. 53, no. 2, pp. 232–237, 2002.
-  M. C. Moghadam, E. Masoumi, S. Kendale, and N. Bagherzadeh, “Predicting hypotension in the icu using noninvasive physiological signals,” Computers in Biology and Medicine, vol. 129, p. 104120, 2021.
-  G. Zamzmi, C.-Y. Pai, D. Goldgof, R. Kasturi, T. Ashmeade, and Y. Sun, “An approach for automated multimodal analysis of infants’ pain,” in 2016 23rd International Conference on Pattern Recognition (ICPR). IEEE, 2016, pp. 4148–4153.
-  C. Eggert, O. D. Lara, and M. A. Labrador, “Recognizing mental stress in chess players using vital sign data,” in 2013 Proceedings of IEEE Southeastcon. IEEE, 2013, pp. 1–4.
-  M. Koussaifi, C. Habib, and A. Makhoul, “Real-time stress evaluation using wireless body sensor networks,” in 2018 Wireless Days (WD). IEEE, 2018, pp. 37–39.
Z. Wang and T. Oates, “Encoding time series as images for visual inspection
and classification using tiled convolutional neural networks,” in
Workshops at the twenty-ninth AAAI conference on artificial intelligence, 2015.
-  J. McNames and A. Fraser, “Obstructive sleep apnea classification based on spectrogram patterns in the electrocardiogram,” in Computers in Cardiology 2000. Vol. 27 (Cat. 00CH37163). IEEE, 2000, pp. 749–752.
-  L. Ren, H. Wang, K. Naishadham, Q. Liu, and A. E. Fathy, “Non-invasive detection of cardiac and respiratory rates from stepped frequency continuous wave radar measurements using the state space method,” in 2015 IEEE MTT-S International Microwave Symposium. IEEE, 2015, pp. 1–4.
-  J.-K. Park, Y. Hong, H. Lee, C. Jang, G.-H. Yun, H.-J. Lee, and J.-G. Yook, “Noncontact rf vital sign sensor for continuous monitoring of driver status,” IEEE transactions on biomedical circuits and systems, vol. 13, no. 3, pp. 493–502, 2019.
-  A. V. Oppenheim, Discrete-time signal processing. Pearson Education India, 1999.
M. S. Salekin, G. Zamzmi, R. Paul, D. Goldgof, R. Kasturi, T. Ho, and Y. Sun, “Harnessing the power of deep learning methods in healthcare: Neonatal pain assessment from crying sound,” in2019 IEEE Healthcare Innovations and Point of Care Technologies, (HI-POCT). IEEE, 2019, pp. 127–130.
-  R. N. Bracewell and R. N. Bracewell, The Fourier transform and its applications. McGraw-Hill New York, 1986, vol. 31999.
-  X. Zeng, G. Yu, Y. Lu, L. Tan, X. Wu, S. Shi, H. Duan, Q. Shu, and H. Li, “Pic, a paediatric-specific intensive care database,” Scientific data, vol. 7, no. 1, pp. 1–8, 2020.
-  G. C. Bjorklund, M. Levenson, W. Lenth, and C. Ortiz, “Frequency modulation (fm) spectroscopy,” Applied Physics B, vol. 32, no. 3, pp. 145–152, 1983.
-  M. S. Salekin, G. Zamzmi, D. Goldgof, R. Kasturi, T. Ho, and Y. Sun, “Multi-channel neural network for assessing neonatal pain from videos,” in 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC). IEEE, 2019, pp. 1551–1556.
-  K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
-  G. B. Moody and R. G. Mark, “A database to support development and evaluation of intelligent intensive care monitoring,” in Computers in Cardiology 1996. IEEE, 1996, pp. 657–660.
-  J. Birjandtalab, D. Cogan, M. B. Pouyan, and M. Nourani, “A non-eeg biosignals dataset for assessment and visualization of neurological status,” in 2016 IEEE International Workshop on Signal Processing Systems (SiPS). IEEE, 2016, pp. 110–114.
-  M. S. Salekin, G. Zamzmi, J. Hausmann, D. Goldgof, R. Kasturi, M. Kneusel, T. Ashmeade, T. Ho, and Y. Sun, “Multimodal neonatal procedural and postoperative pain assessment dataset,” Data in Brief, vol. 35, p. 106796, 2021.
-  M. S. Salekin, G. Zamzmi, D. Goldgof, R. Kasturi, T. Ho, and Y. Sun, “Multimodal spatio-temporal deep learning approach for neonatal postoperative pain assessment,” Computers in Biology and Medicine, vol. 129, p. 104150, 2021.
-  G. Huang, Y. Li, G. Pleiss, Z. Liu, J. E. Hopcroft, and K. Q. Weinberger, “Snapshot ensembles: Train 1, get M for free,” in 5th International Conference on Learning Representations, ICLR 2017, Conference Track Proceedings. OpenReview.net, 2017.
-  T. G. Dietterich, “Ensemble methods in machine learning,” in International workshop on multiple classifier systems. Springer, 2000, pp. 1–15.
-  A. E. Johnson, T. J. Pollard, L. Shen, H. L. Li-Wei, M. Feng, M. Ghassemi, B. Moody, P. Szolovits, L. A. Celi, and R. G. Mark, “Mimic-iii, a freely accessible critical care database,” Scientific data, vol. 3, no. 1, pp. 1–9, 2016.
-  A. L. Goldberger, L. A. Amaral, L. Glass, J. M. Hausdorff, P. C. Ivanov, R. G. Mark, J. E. Mietus, G. B. Moody, C.-K. Peng, and H. E. Stanley, “Physiobank, physiotoolkit, and physionet: components of a new research resource for complex physiologic signals,” circulation, vol. 101, no. 23, pp. e215–e220, 2000.
-  M. C. Moghadam, E. Masoumi, N. Bagherzadeh, D. Ramsingh, and Z. N. Kain, “Supervised machine-learning algorithms in real-time prediction of hypotensive events,” in 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). IEEE, 2020, pp. 5468–5471.
-  A. Jafari, A. Ganesan, C. S. K. Thalisetty, V. Sivasubramanian, T. Oates, and T. Mohsenin, “Sensornet: A scalable and low-power deep convolutional neural network for multimodal data classification,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 66, no. 1, pp. 274–287, 2018.