Preterm birth, defined as babies born with gestation age less than 37 weeks, is a major and growing challenge for public health systems. Nearly 15 millions babies, or about 10% of total world-wide births, are born prematurely each year. About one million of these preterm babies die because of complications following the preterm birth . Currently, the lack of comprehensive understandings of the uterine contraction initiation mechanism hinders effective early-stage treatment of preterm birth. Once delivery starts, it can not be prevented. Thus, early detection and preemptive treatments are a promising direction for preventing premature babies. Frequently used preterm diagnosis methods include Tocogrametry, Intra-uterine Pressure Catheter, Fetal Fibronectin, Cervical length measurement etc, but none of theses provides reliable results .
The expulsion of a fetus is a direct consequence of strong periodic uterine contractions, which results from the generation and propagation of action potentials . The corresponding electric signals can be recorded by electrodes placed on the abdomen of pregnant women, using the electrohysterogram (EHG) technique. Due to the close relation between uterine contraction and the underlying electrical activities, EHG provides a new direction for the development of preterm diagnosis method [4, 5].
Taking advantage of the recent progress in machine learning, a set of new preterm diagnosis methods have been proposed [6, 7, 8]. Overall, the preterm diagnosis can be categorized as a classification problem, i.e. to decide or to predict a patient (pregnant woman) is at the risk for preterm birth, based on a set of physical examination data (sample) and the features contained therein. It is well conceived that both the abundance of the sample w.r.t. different classes and the quality of the features that distinct different classes are vital to achieve satisfactory classification results.
In recent years, the TPEHG (Term Preterm EHG) database has been widely used for training and testing variant machine- learning-based preterm diagnosis methods. Although there are millions of preterm babies world-wide, the fraction of preterm birth is quite small, compared to the total number of births. This fact is reflected in the composition of the TPEHG database which is a public available database that contains 300 EHG samples of pregnant women 
. It is noticeable that only 38 EHG samples are collected from patients whose pregnancies will result in preterm delivery, while the other 262 EHG samples are from patients with normal term delivery. Due to the strong difference between the number of preterm and normal delivery samples, applying conventional machine learning algorithms with such extremely imbalanced data will tend to classify the minority samples into the majority class, i.e., there exists a bias towards the majority, which is likely to result in inaccurate diagnosis result.
. The state-of-the-art research methods to deal with imbalanced data mining problem can be categorized into two directions: 1) over-sampling the minority class or under-sampling the majority one in order to compensate the imbalance of samples between classes to be identified; 2) synthesizing artificial samples from minority class. The former is of limited use when the size of the dataset is small.To be specific, under-sampling could significantly reduce the number of samples to be used in training the learning model, potentially leading to under-estimation; while over-sampling could magnify the feature variations of training samples, possibly resulting in over-estimation.
On the other side, the synthetic sampling with data generation methods aims to generate synthetic data that originated from the minority class. The synthesizing procedure mimics the random distribution of sampling data in the feature space of minority, so that the generated samples are assumed to be close to the actual distribution of minority in its feature space. Including these samples as the minority training set eliminates the imbalance in the original dataset, and removes the classification bias towards the majority. The frequently used synthetic algorithms such as SMOTE  and ADASYN , have exhibited certain advantages in real applications of preterm diagnosis  and other problems [12, 15].
Since both the abundance of training examples and the quality of features are key factors to improve the precision of classification , it is also important to extract new features from EHG signals, so that the performance of machine-learning-based preterm prediction algorithms can be improved by combining these new features [17, 18, 19, 20]. However, notice that when new features are adopted in the training process, the effect of imbalance may deteriorate . Also, when more synthetic/artificial data of the minority class are generated, the representation ability of the features may change. In addition, with the increase of synthetic samples, the noise in the original samples might intensify. Being trained with these dataset, the classifier would overfit . Therefore, adding synthetic samples may affect in a complex way the quality of the features used by the algorithm, and as a result, may alter the classification performances. Although there are works devoted to optimize synthetic algorithms [23, 24], to the best of our knowledge, few work concerns the effect of optimal number of synthetic samples to classifier accuracy. For this reason, it is necessary to explore the relation between the amount of synthetic data and the quality of features and investigate some unified formulation.
This paper investigates the relation of bias elimination and feature importance reduction when introducing synthetic samples in the training process. We determine an optimal minority dataset synthesizing strategy by quantifying feature importances and classification precision of synthetic samples. The rest of the paper is organized as follows: In Section II, we introduce some basic factors regarding the problem, and analyze the underlying principles of the prevalent synthetic algorithms SMOTE and ADASYN, from which we demonstrate the importance of finding an optimal inter-class sample ratio in dealing with learning from imbalanced dataset. Section III further quantifies the effect of synthetic samples and formulates the problem of determination of optimal sample balance coefficient. In Section IV, we verify the effectiveness of the proposed method by applying it to EHG based preterm prediction in a numerical way. Section V concludes the paper.
Ii Problem Statement
As explained in the introduction, the strong imbalance between pathological and normal outcomes from the available database results in possible inaccuracies in the classification algorithms. To avoid this, the preferable machine-learning-based algorithms typically introduce a certain amount of synthetic preterm sample data to mitigate the bias towards majority (normal delivery). However, the possibility of mis-classifying term samples increase at the same time. Thus, how many synthetic samples should be added is directly related to the performance of a machine-learning based diagnosis method.
Ii-a Sample balance coefficient and feature score
As stated in Section I, the abundance of training samples of all classes is essential to the performance of classification methods. For instance, the aforementioned TPEHG database contains great more normal term samples than the preterm ones (262:38), so it is natural to utilize data synthesis techniques to generate samples of the minority class, i.e., the artificial preterm sample. However, there is a lack of understanding about how many minority class samples should be synthesized without changing the number of majority class samples, and what the after-effect will be w.r.t. the classification performance if we synthesize more than enough samples of minority class.
The enrichment of minority class samples by applying data synthesis techniques will improve the classification performance in certain sense, however, it is worth noting that introducing synthetic data might alter the original pattern of sample distribution in the feature space, depending on the underlying mechanics of synthesis, i.e., the boundary in feature space between different classes might be blurred. To better quantify the contribution of different features to classification, we introduce the following feature score defined as in ,
where and denote the measured physical value of feature from sample that is in positive (minority or preterm ) class and negative (majority or term ) class, respectively. is the average value among all samples, is the average value of all positive (minority or preterm) samples, and is the average value of negative samples. According to (1), records the discrimination between minority class (+) and majority class (-) by counting how divergence of the samples are in the feature space. It is worth mentioning that the feature score explicitly relates to the size of testing samples. We introduce the sample balance coefficient by,
where and are the numbers of samples in the minority and majority class after applying synthetic algorithms, respectively. Notice that include the number of synthesized data.
Also, for any specified classification problem, different features abstracted from samples jointly contribute to the final classification result. According to (2) and (1), it is reasonable to define the global feature score as the weighted sum of different feature scores , i.e.,
where the weights are introduced to represent the importance of feature to the classification, and is the number of features used in the final classification. By construction, the definition of (3) links the the number of synthetic samples and the quality of the features. It provides a unified performance metric which is essential for further investigation. To proceed, all features are initially used to build a forest, from which we obtain a value of classification accuracy . The reduced classification accuracy by randomly permuting a node in the tree gives a reliable measure of the feature’s importance.
Ii-B Size of the synthesized data and distinguishability of the features
After introducing the definitions of sample balance coefficient and feature score / , it is convenient to investigate the attributes of the conventional data synthesis algorithms such as SMOTE or ADASYN, and consider their feasibilities in the application of preterm diagnosis using the TPEHG database.
Although data synthesis algorithms tend to mimic the natural distribution of sample in its feature space, the nature of random synthesis of minority samples inevitably has certain effects on features’ ability to discriminate different classes. For instance, ADASYN tries to synthesize more data from minority samples surrounded by more majority . As shown in Fig.1 (a), the synthetic samples are more likely to appear on the left due to more majority data samples around each minority ones. Although they intend to ease classification by emphasizing on samples that are hard to learn from, at the same time, they would potentially make originally separable datasets non-discriminated in the feature space.
Contrary to ADASYN, SMOTE does not take into account the surrounding of the minority samples. For any minority sample , it randomly selects another minority among its nearest neighbors and synthesizes an artificial samples with a random number . As a result, the synthesized samples will concentrate in the region containing more minority samples (see Fig.1(b)), which implies that it maintains the original distribution pattern, without diminishing features’ contribution to classification.
To see the effect of synthetic sampling on features’ contribution to classification, we apply the aforementioned two synthetic algorithms to TPEHG database with frequently used features: 1) the root mean square value of the signal (); 2) the median () and peak () frequency of the power spectrum; 3) the sample entropy of the signal () extracted from EHG signals. We use hose with the gestation age less than 37 weeks as the preterm (minority class) samples.
Fig. 2 shows the variation of features’ contribution to classification represented by its after synthetic sampling. It is evident that the peak frequency gives the highest feature score among these four features. Its effectiveness for classification has been confirmed by other authors [2, 28, 29]. Also, it is quite astonished to see that both techniques tend to deteriorate features’ ability in separating samples. SMOTE shows more superiority, i.e., the feature scores are higher after applying SMOTE than those after applying ADASYN. This is consistent with the previous analysis of data synthesis mechanics of SMOTE and ADASYN.
It is also worth noting that features’ classification ability is sensitive to the number of synthetic samples introduced. This can be seen from Fig. 2, which shows the featured scores after adding synthetic data with different sample balance coefficients (see Eq. (2)) , panel (a) and , panel (b). As shown by the figure, adding synthetic samples degrades the capability of the algorithm to distinguish the various features. However, synthetic samples are required to eliminate classification bias against the minority. As a result, a trade-off should be found between the number of synthetic samples and features’ quality, in order to optimize the final performance of classifiers trained with these data.
Iii Determination of optimal sample balance coefficient
As discussed in Section II, to accomplish a machine-learning-based preterm diagnosis, synthetic sampling with data generation of the minority class is inevitable, and the balance between synthesized data and the feature quality must be considered. Intuitively, increasing the number of minority samples by generating synthesized data should increase the prediction precision on the minority class and reduce the bias towards the majority. On the other hand, the prediction precision on majority class may fall if there exists too many minority samples. Ideally, we would expect a no-bias learning system when the sample balance coefficient . In real applications, however, due to the imbalance of the available original samples between classes, the optimum may differ from . To this end, we introduce two functions and describing the putative biases induced by the sampling on the minority and the majority class, respectively,
In the present work, we chose the parameter to describe how the prediction is affected by the balance coefficient . In Eq. (4), is the original sample balance coefficient before generating any synthetic data.
Eq (4) implies that when is smaller than , the majority is well described, and is close to . On the other hand, when is large, the minority is accurately described ( is close to ), but the majority will be affected ( is reduced).
For most of the classification purposes, a high precision on different classes is demanded (no bias towards any class). To this end, in the case of learning problems from imbalanced data samples, it is required that minority training examples are re-sampled to match the number of majority, i.e., , in order to remove potential bias towards majority. However, recall the synthesis procedure, additional training examples are generated from the original minority data sets and can be seen as a kind of “copy” of the original data. As a consequence, adding too many synthetic data improves only the precision on the minority class, while degrading the accuracy on the majority classification.
To account for these effects, we introduce the constant in the definitions of and in Eq.(4). This equation expresses that when there are enough original minority samples (), constant goes to 0. Thus there is no need to synthesize training examples. When learning tasks from imbalanced dataset are encountered (), the optimal balance coefficient deviates from the ideal value.
To take into account the bias described above, we simply multiply the effective feature score, , as defined in Eq. (3), by , to come up with an effective score, . Combining the requirement on high prediction performance on preterm as well as term, we determine the optimal sample balance coefficient as follows,
Iv Experiment verification
The above analysis suggests a way to improve preterm diagnosis precision by determining the optimal sample balance coefficient without sacrificing the precision on term prediction. In this section, we provide experimental results to verify the effectiveness of the proposed method, particularly, we propose the numerical way to the determination of . The general procedure regarding the experiment is summarized in Fig. 3,
Iv-a Material and Terminologies
Electrohysterogram (EHG) data obtained from the Physionet database (TPEHG) are used. Root mean square (), peak () and median () frequencies and sample entropy () are extracted from the recorded EHG signals. Based on the recorded gestation age (), 300 samples are spilt into two groups with the criteria weeks, which gives a minority class ( preterm ) of 38 samples and a majority group (term) of 262 samples.
Recalling the purpose of predicting gestation status, frequently used simple but powerful classifiers, like Support Vector Classifier (SVC), Linear Discriminant Classifier (LDC), Logistic Regression Classifier (LRC), Decision Tree Classifier (DTC), Gradient Boosting Classifier (GBC) etc., are used to verify the proposed method. Following, Holdout Cross-Validation with 80% of the whole dataset is designated for training the classifiers and the rest 20% for testing, from which we calculate sensitivity (True Positive (preterm) Rate, TPR) and specificity (True Negative (term) Rate, TNR). Notice that these two quantities alone can not well represent our requirement on high performance of both positive and negative prediction. For this purpose, we introduce and Overall Accuracy () as the performance metrics as follows,
where TP and TN are correctly predicted positive and negative test samples, FP and FN are incorrect predictions, respectively. Apparently, classifiers provide the most accurate prediction on both preterm and term classes are typically with the highest value of and overall accuracy . Beside these quantities, Area Under (Receiver Operator) Curve (AUC)  is also used to verify the proposed method.
Iv-B Optimal synthetic preterm samples
Based on the intuitive analysis given in the previous sections, SMOTE has stronger capability than ADASYN in keeping features importance in classification. We first apply it to generate synthesized samples of minority class to match a given sample balance coefficient .
Fig. 4(a) shows the measured feature score with different . Obviously, it is hard to determine how many synthetic samples should be generated, as features have their measured values well separated and vary with .
After having synthesized enough artificial samples, we use the features to build a forest, from which we obtain a value of classification accuracy. The reduced classification accuracy by randomly permuting a node in the tree gives a reliable measure of the feature’s importance (weight) . Calculating features’ importance (weight) at each allows us to examine the combined effect of synthetic samples on features’ importance . As shown in Fig. 4 (b), decreases with the increase of synthetic samples, illustrating the drawbacks of the synthetic sampling. This also implies the importance of the determination of the optimal sample balance coefficient .
Combining the activation and inactivation functions previously introduced ( Fig. 5(a) ), the effective feature score shows a trend that helps us to easily determine out the optimal sample balance coefficient . As shown in Fig. 5(b), firstly increases with the increase of , manifesting the effect of synthetic samples for eliminating bias towards the majority (term) samples. Due to the weakening of features’ scores and the bias towards term at large , when reaches its peak at , it starts to decrease. The position where the peak locates provides the optimal . Fig. 5 demonstrates that with the optimal , the capability of the various features to distinguish between different classes has not been lost, while reducing the bias towards the majority.
To verify the obtained optimal sample balance coefficient
, same features extracted from 80% of samples in TPEHG database are used to train a SVC classifier, the other 20% are then used for verification. As training and testing samples are randomly selected, the calculated quantities representing classifiers’ performance vary from time to time. For this reason, we repeat the training-testing process for 100 times at each. Fig. 6 shows the variation of the calculated system performance with the increase number of synthetic samples added to the dataset, i.e., the increase of . As expected, prediction precision on minority increases, while that of majority decreases. It is worth noting that the two curves intersect at the point , which is the optimal sample balance coefficient determined previously. At this point the trained classifier eliminates most of the bias toward the majority (term) and increases the precision on minority (preterm) prediction without sacrificing too much on the precision of term prediction. This is confirmed by the accompanied variation of and AUC, see Fig. 6(b), where both of these two quantities reach their apex at .
The receiver operator characteristic curve (ROC) and the associated AUC values shown in Fig.7 indicate the cut-off values for the true positive and false positive rates at different sample balance level (different ). Apparently, in the case of optimal sample balance coefficient , the SVC classifier shows better performance. Comparing to the case of ideal balance (), training with optimal amount of synthetic samples leads to a big improvement in terms of AUC values.
The advantages of determining the optimal sample balance coefficient shown in SVC hold for other different classifiers. Table I gives a comparison of frequently used parameters for evaluating classifiers’ performance. It can be observed that with the previously determined optimal sample balance coefficient , all the classifiers show a great improvement in performance, especially for SVC based classifiers.
The effect of optimal sample balance coefficient also works with ADASYN. Although this method has less power in keeping features’ ability in classification, combining the proposed the activation and inactivation functions do give us an easy-to-identify optimal sample balance coefficient . As shown in Fig. 8, the term and preterm prediction accuracy of a SVC classifier trained from these dataset gives the optimal performance at this . However, as effective feature score obtained from ADASYN is less than that from SMOTE as indicated in Fig.5, it can be expected that the corresponding classifier after training through ADASYN method is underperformed than that through SMOTE method.
V Conclusion & Discussion
Machine-learning based automatic disease diagnosing system provides a prospective direction for modern healthcare. In these applications, the availability of healthcare data and the effectiveness of features extracted from these samples play a crucial role. However, healthcare data are typically imbalanced, with most of samples being healthy (majority), and a few being disease (minority). Training with imbalanced dataset, classifiers typically introduce biases towards the majority, making the automatic diagnosing system less useful. To solve the problem, data synthesizing algorithms are used to generate artificial samples from the minority. However, this is typically accompanied by reduced ability of features in class separation. In this paper, we propose a method for determining the optimal number of artificial samples should be synthesized. To proceed, we measure features’ contributions and their weights in class separation in the case of introducing different amount of synthetic samples. Combining with the activation and inactivation functions introduced to describe the effect of sample abundance on classification precision, we obtain the optimal sample balance coefficient that compromises the effect of synthetic samples on eliminating bias and the side-effect of weakening feature importance. We apply the proposed method to predict preterm behavior using features extracted from public available database TPEHG. After applying synthetic algorithms, system performances are compared under different scenarios and the results highlight the importance of optimal sample balance coefficient proposed in the work.
One might argue that it is more critical that an automatic diagnosis system mis-identifies a real preterm patient than that mis-identifies a term patient, considering the consequences of serious complexities the preterm babies would have. As such, increasing synthetic samples should be of greater interest, which is the case shown in Fig.6 and 8. However, special attention should be paid before drawing this conclusion. Since there is no practical test for any EHG based preterm diagnosis system, its performance is typically verified using data samples randomly chosen from the total sample set. The dataset used to check the performance on preterm prediction are synthesized from the same minority class as those used for the training purpose. With the increase number of synthetic samples included, the validation samples are getting closer and closer to the training samples, which lead to unreal high values of preterm prediction accuracy, especially in real applications. In the proposed method, by introducing the activation/inactivation functions that account for the original size of minority samples, we suppress this side-effect. It is believed that validation results should be close to really applications.
-  (2017) Automated detection of premature delivery using empirical mode and wavelet packet decomposition techniques with uterine electromyogram signals. Computers in Biology and Medicine 85, pp. 33–42. Cited by: §I.
-  (2015) Big data for health. IEEE Journal of Biomedical and Health Informatics 19 (4), pp. 1193–1208. Cited by: §I.
-  (2010) Class prediction for high-dimensional class-imbalanced data. BMC Bioinformatics 11 (1), pp. 523–523. Cited by: §I.
-  (2018) Identification of preterm birth based on rqa analysis of electrohysterograms. Computer Methods and Programs in Biomedicine 153, pp. 227–236. Cited by: §I.
-  (2012) DBSMOTE: density-based synthetic minority over-sampling technique. Applied Intelligence 36 (3), pp. 664–684. Cited by: §I.
Mapping complex traits using random forests. BMC Genetics 4 (1), pp. 1–5. Cited by: §II-A, §IV-B.
SMOTE: synthetic minority over-sampling technique.
Journal of Artificial Intelligence Research16 (1), pp. 321–357. Cited by: §I.
-  (2006) EVALUATION of classifiers for an uneven class distribution problem. Applied Artificial Intelligence 20 (5), pp. 381–417. Cited by: §I.
-  (2006) Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7 (1), pp. 3–3. Cited by: §II-A.
-  (2006) An introduction to roc analysis. Pattern Recognition Letters 27 (8), pp. 861–874. Cited by: §IV-A.
-  (2008) A comparison of various linear and non-linear signal processing techniques to separate uterine emg records of term and pre-term delivery groups. Medical Biological Engineering Computing 46 (9), pp. 911–922. Cited by: §I.
-  (2013) Prediction of preterm deliveries from ehg signals using machine learning.. PLOS ONE 8 (10). Cited by: §II-B, §IV-A.
Advanced artificial neural network classification for detecting preterm births using ehg records. Neurocomputing 188, pp. 42–49. Cited by: §I, §I.
-  (2013-04) Better pregnancy monitoring using nonlinear correlation analysis of external uterine electromyography. IEEE Transactions on Biomedical Engineering 60 (4), pp. 1160–1166. External Links: Cited by: §I.
-  (2011) Combination of canonical correlation analysis and empirical mode decomposition applied to denoising the labor electrohysterogram. IEEE Transactions on Biomedical Engineering 58 (9), pp. 2441–2447. Cited by: §I.
-  (2004) The problem of overfitting. Journal of Chemical Information and Computer Sciences 44 (1), pp. 1–12. Cited by: §I.
-  (2008) ADASYN: adaptive synthetic sampling approach for imbalanced learning. In International Symposium on Neural Networks, pp. 1322–1328. Cited by: §I, §II-B.
-  (2009) Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering 21 (9), pp. 1263–1284. Cited by: §I.
-  (2013) Born too soon: preterm birth matters. Reproductive Health 10 (1), pp. 1–9. Cited by: §I.
Classification of trojan nets based on scoap values using supervised learning. In 2019 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1–5. External Links: Cited by: §I.
-  (2008) Patterns of electrical propagation in the intact pregnant guinea pig uterus. American Journal of Physiology-regulatory Integrative and Comparative Physiology 294 (3). Cited by: §I.
-  (1999) Use of the electrohysterogram signal for characterization of contractions during pregnancy. IEEE Transactions on Biomedical Engineering 46 (10), pp. 1222–1229. Cited by: §I.
-  (2011) Use of uterine electromyography to diagnose term and preterm labor.. Acta Obstetricia et Gynecologica Scandinavica 90 (2), pp. 150–157. Cited by: §I, §II-B.
-  (2003) Predicting term and preterm delivery with transabdominal uterine electromyography. Obstetrics & Gynecology 101 (6), pp. 1254 – 1260. External Links: Cited by: §II-B.
-  (2018) Using sub-sampling and ensemble clustering techniques to improve performance of imbalanced classification. Neurocomputing 276, pp. 55–66. Cited by: §I.
-  (2010) Modeling and identification of the electrohysterographic volume conductor by high-density electrodes. IEEE Transactions on Biomedical Engineering 57 (3), pp. 519–527. Cited by: §I.
-  (2015) Improved prediction of preterm delivery using empirical mode decomposition analysis of uterine electromyography signals. PLOS ONE 10 (7). Cited by: §I.
-  (2018) Detection of preterm labor by partitioning and clustering the ehg signal. Biomedical Signal Processing and Control 45, pp. 109–116. Cited by: §I.
-  (2017) Feature selection based on fda and f-score for multi-class classification. Expert Systems with Applications 81, pp. 22 – 27. External Links: Cited by: §II-A.
-  (2019) A critical look at studies applying over-sampling on the tpehgdb dataset. In ARTIFICIAL INTELLIGENCE IN MEDICINE, AIME 2019, D. Riaño, S. Wilk, and A. ten Teije (Eds.), Vol. 11526, pp. 355–364. External Links: Cited by: §V.
-  (2019) A parameter-free cleaning method for smote in imbalanced classification. IEEE Access 7, pp. 23537–23548. External Links: Cited by: §I, §I.
-  C. P. Howson, M. V. Kinney, L. Mcdougall, and J. E. Lawn, “Born too soon: Preterm birth matters,” Reproductive Health, vol. 10, no. 1, pp. 1–9, 2013.
-  M. Lucovnik, R. J. Kuon, L. R. Chambliss, W. L. Maner, S. Shi, L. Shi, J. Balducci, and R. E. Garfield, “Use of uterine electromyography to diagnose term and preterm labor.” Acta Obstetricia et Gynecologica Scandinavica, vol. 90, no. 2, pp. 150–157, 2011.
-  W. J. E. P. Lammers, H. Mirghani, B. Stephen, S. Dhanasekaran, A. Wahab, M. A. H. A. Sultan, and F. Abazer, “Patterns of electrical propagation in the intact pregnant guinea pig uterus,” American Journal of Physiology-regulatory Integrative and Comparative Physiology, vol. 294, no. 3, 2008.
-  H. Leman, C. Marque, and J. Gondry, “Use of the electrohysterogram signal for characterization of contractions during pregnancy,” IEEE Transactions on Biomedical Engineering, vol. 46, no. 10, pp. 1222–1229, 1999.
-  M. Hassan, S. Boudaoud, J. Terrien, B. Karlsson, and C. Marque, “Combination of canonical correlation analysis and empirical mode decomposition applied to denoising the labor electrohysterogram,” IEEE Transactions on Biomedical Engineering, vol. 58, no. 9, pp. 2441–2447, 2011.
-  P. Ren, S. Yao, J. Li, P. A. Valdessosa, and K. M. Kendrick, “Improved prediction of preterm delivery using empirical mode decomposition analysis of uterine electromyography signals,” PLOS ONE, vol. 10, no. 7, 2015.
-  P. Fergus, I. O. Idowu, A. J. Hussain, and C. Dobbins, “Advanced artificial neural network classification for detecting preterm births using ehg records,” Neurocomputing, vol. 188, pp. 42–49, 2016.
-  U. R. Acharya, K. V. Sudarshan, S. Q. Rong, Z. Tan, C. M. Lim, J. E. W. Koh, S. Nayak, and S. V. Bhandary, “Automated detection of premature delivery using empirical mode and wavelet packet decomposition techniques with uterine electromyogram signals,” Computers in Biology and Medicine, vol. 85, pp. 33–42, 2017.
-  G. Feležorž, G. Kavsek, Ž. Novakantolic, and F. Jager, “A comparison of various linear and non-linear signal processing techniques to separate uterine emg records of term and pre-term delivery groups,” Medical Biological Engineering Computing, vol. 46, no. 9, pp. 911–922, 2008.
-  S. Daskalaki, I. Kopanas, and N. M. Avouris, “Evaluation of classifiers for an uneven class distribution problem,” Applied Artificial Intelligence, vol. 20, no. 5, pp. 381–417, 2006.
-  H. He and E. A. Garcia, “Learning from imbalanced data,” IEEE Transactions on Knowledge and Data Engineering, vol. 21, no. 9, pp. 1263–1284, 2009.
-  Y. Yan, R. Liu, Z. Ding, X. Du, J. Chen, and Y. Zhang, “A parameter-free cleaning method for smote in imbalanced classification,” IEEE Access, vol. 7, pp. 23 537–23 548, 2019.
-  N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “Smote: synthetic minority over-sampling technique,” Journal of Artificial Intelligence Research, vol. 16, no. 1, pp. 321–357, 2002.
-  H. He, Y. Bai, E. A. Garcia, and S. Li, “Adasyn: Adaptive synthetic sampling approach for imbalanced learning,” in International Symposium on Neural Networks, 2008, pp. 1322–1328.
-  C. H. Kok, C. Y. Ooi, M. Moghbel, N. Ismail, H. S. Choo, and M. Inoue, “Classification of trojan nets based on scoap values using supervised learning,” in 2019 IEEE International Symposium on Circuits and Systems (ISCAS), May 2019, pp. 1–5.
-  J. Andreuperez, C. C. Y. Poon, R. Merrifield, S. T. C. Wong, and G. Yang, “Big data for health,” IEEE Journal of Biomedical and Health Informatics, vol. 19, no. 4, pp. 1193–1208, 2015.
-  C. C. Rabotti, M. M. Mischi, L. Beulen, S. G. Oei, and J. J. Bergmans, “Modeling and identification of the electrohysterographic volume conductor by high-density electrodes,” IEEE Transactions on Biomedical Engineering, vol. 57, no. 3, pp. 519–527, 2010.
-  M. Hassan, J. Terrien, C. Muszynski, A. Alexandersson, C. Marque, and B. Karlsson, “Better pregnancy monitoring using nonlinear correlation analysis of external uterine electromyography,” IEEE Transactions on Biomedical Engineering, vol. 60, no. 4, pp. 1160–1166, April 2013.
-  M. Borowska, E. Brzozowska, P. Ku, E. Oczeretko, R. Mosdorf, and P. Laudaski, “Identification of preterm birth based on rqa analysis of electrohysterograms,” Computer Methods and Programs in Biomedicine, vol. 153, pp. 227–236, 2018.
-  M. Shahrdad and M. C. Amirani, “Detection of preterm labor by partitioning and clustering the ehg signal,” Biomedical Signal Processing and Control, vol. 45, pp. 109–116, 2018.
-  R. Blagus and L. Lusa, “Class prediction for high-dimensional class-imbalanced data,” BMC Bioinformatics, vol. 11, no. 1, pp. 523–523, 2010.
-  D. M. Hawkins, “The problem of overfitting,” Journal of Chemical Information and Computer Sciences, vol. 44, no. 1, pp. 1–12, 2004.
-  C. Bunkhumpornpat, K. Sinapiromsaran, and C. Lursinsap, “Dbsmote: Density-based synthetic minority over-sampling technique,” Applied Intelligence, vol. 36, no. 3, pp. 664–684, 2012.
-  S. Nejatian, H. Parvin, and E. Faraji, “Using sub-sampling and ensemble clustering techniques to improve performance of imbalanced classification,” Neurocomputing, vol. 276, pp. 55–66, 2018.
-  Q. Song, H. Jiang, and J. Liu, “Feature selection based on fda and f-score for multi-class classification,” Expert Systems with Applications, vol. 81, pp. 22 – 27, 2017. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0957417417301203
-  R. Diazuriarte and S. A. De Andres, “Gene selection and classification of microarray data using random forest,” BMC Bioinformatics, vol. 7, no. 1, pp. 3–3, 2006.
-  A. Bureau, J. Dupuis, B. Hayward, K. Falls, and P. Van Eerdewegh, “Mapping complex traits using random forests,” BMC Genetics, vol. 4, no. 1, pp. 1–5, 2003.
-  W. L. Maner, R. E. Garfield, H. Maul, G. Olson, and G. Saade, “Predicting term and preterm delivery with transabdominal uterine electromyography,” Obstetrics & Gynecology, vol. 101, no. 6, pp. 1254 – 1260, 2003. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0029784403003417
-  P. Fergus, P. Cheung, A. J. Hussain, D. Aljumeily, C. Dobbins, and S. Iram, “Prediction of preterm deliveries from ehg signals using machine learning.” PLOS ONE, vol. 8, no. 10, 2013.
-  T. Fawcett, “An introduction to roc analysis,” Pattern Recognition Letters, vol. 27, no. 8, pp. 861–874, 2006.
-  G. Vandewiele, I. Dehaene, O. Janssens, F. Ongenae, F. De Backere, F. De Turck, K. Roelens, S. Van Hoecke, and T. Demeester, “A critical look at studies applying over-sampling on the tpehgdb dataset,” in ARTIFICIAL INTELLIGENCE IN MEDICINE, AIME 2019, D. Riaño, S. Wilk, and A. ten Teije, Eds., vol. 11526. Springer, 2019, pp. 355–364.