1. Introduction
Distance metrics and their nonlinear variants play a fundamental role in machine learning tasks. They measure the degree of linear or nonlinear similarity between objects, grouping similar objects into different groups (e.g., means clustering) or assigning class labels based on their nearest neighbors (e.g, nearest neighbor classification) [11]. There are multiple choices of distance functions to solve specific problems. Generally, Euclidean distance is used for the majority of classification problems, whereas, cosine distance is more suitable for document classification [23, 22]. Distance metrics can also be part of kernel function design [6]. In [30], the authors use Euclidean distance as part of a graph kernel to capture time difference similarity between patients. In [12], the authors demonstrate the positive definiteness for a distance substitution kernel. Usually, the kernel function for two data objects and and user defined parameter is defined as:
where is the Euclidean distance between and , and is a constant depending on time .
Distance metric learning, on the other hand, tries to find the optimal distance metric or embedding space given a task specific objective for a set of data objects [17]
. Under this scope, semantically similar objects are encouraged to be closer to each other and further apart for nonsimilar objects. The prevalence of deep learning has largely populated the development of deep metric learning, where nonlinear observation can be captured, specifically in unsupervised representation learning for images (e.g., computer vision
[13, 21]). It also motivated the deep kernel learning [15, 28]. In [31], a deep metric learning based graph kernel is proposed to solve the outcome prediction problem in complex chronic disease treatment planning, where cosine distance was shown superior to the Euclidean distance measure.In practice, both distance metric and kernel function are used to solve complex realworld problems, especially in medicine. A problem of interest is drug prescription efficacy prediction [1, 16]. Accurate predictive models for drug prescription improve healthcare. They can further reduce medication errors and identify possible drug prescription pathways to pursue for clinical personnel. In [29], a framework is proposed to predict success and failure outcomes of a given drug prescription for antibiotic treatmentbased disease. The approach is further extended to overcome biased data distribution [30] and chronic disease treatment plan [31]. Moreover, Euclidean and cosine distance measures exhibit different performance behaviors. In [31], cosine distance is found superior to the Euclidean measure under highly data imbalanced chronic disease, however, Euclidean distance is used for shortterm disease in [30]. To further investigate such differences, we establish a unified framework, which integrates two model structures proposed in the aforementioned works, to conduct rigorous empirical evaluation on all diseases investigated in previous studies [30, 31], in addition to a theoretical discussion. The aforementioned prescription efficacy prediction approaches are now under commercial licensing.
Our contributions are as follows:

We propose a scalable unified framework for prescription efficacy prediction.

We evaluate performance using 10fold cross validation on largescale, realworld, electronic health record dataset that includes common and rare, short and chronic illnesses.

We investigate the difference between Euclidean and cosine distance on learned embedding space.

We provide a theoretical explanation from geometry perspective, generalizing Euclidean and cosine distance.
2. Preliminaries
Our previous efforts [29, 30, 31] present a graph kernelbased system for outcome prediction of drug prescription, particularly the success or failure treatment, on shortterm and chronic diseases. In [30], a Multiple Graph Kernel Fusion (MGKF) is proposed to overcome noise effect on shortterm disease. A deep graph kernel learning approach, e.g., CrossGlobal Attention Graph Kernel Network (CrossGlobal), is proposed in [31]
to handle longterm chronic disease. In short, we initially determine success and failure patients for the target disease treatment as training data within a userdefined time quantum, where a set of medical events are extracted between this time period. Then, we construct a patient graph, a graphical representation of the patient EHRs, given the extracted medical events. Finally, we perform binary graph classification as prediction through a graph kernel and a kernelbased classifier. We detail each part of the prediction framework in this section.
2.1. Outcome Selection
We define a failure drug prescription or treatment plan for a disease diagnosis if certain events occur within a predefined time period, otherwise a success. In shortduration diseases, we observe the similar or identical type of disease diagnosis, while in chronic disease, the observed target will be severe complication defined via medication guideline. We name a disease as shortterm if it only considers a single medication with immediate outcome observation and recent medical history (e.g., 2 months prior to the diagnostic). For a chronic disease, we consider a multiple medication treatment plan with longterm outcome observation and medical history (e.g., 10 years prior to the diagnostic). We refer readers to MGKF [30] and CrossGlobal [31] for greater detail.
2.2. Patient Graph
A subset of patient’s EHRs is formulated as directly acyclic graph where a node represents each medical event and an edge with time difference (e.g., days) used as a weight connects two consecutive medical events. Patient demographic information, such as gender and age, is included by connecting to the first medical event with age as an edge weight^{†}^{†}To simplify model assumption, we only use gender and age as demographic information.. We define a patient graph here as in [30] and [31]:
Definition 2.1 (Patient Graph).
Given medical events, set represents a patient’s EHR with denoting a medical event such as diagnosis, and denoting the time for . The patient graph of events is a weighted directed acyclic graph with its vertices containing all events and edges containing all pairs of consecutive events . The edge weight from node i to node j is defined as which defines the time interval between .
2.3. Graph Kernel
For kernelbased binary graph classification, building a pairwise kernel matrix between patient graphs is the first step. The graph kernel computes the similarity between pairs of graph. It is also a positive definite or semidefinite kernel defined for graphs, which performs an inner product implicitly by mapping data point from an input space to the Hilbert space. It can also be treated as a similarity measurement between two data objects (e.g, graphs). We point readers to these articles [18] for a more indepth graph kernel discussion and [6] for a better understanding and design principle of graph kernel and its associated feature maps. In [29, 30], several graph kernels are proposed to solve the drug prescription outcome prediction problem as patient graph classification. Please refer [29, 30] for more indepth descriptions on kernel definitions.
2.4. Prediction Framework
We then formulate a binary graph classification problem on the resulting patient graph by using a kernelized Support Vector Machine (KSVM)
[29], Multiple Graph Kernel Fusion (MGKF) [30], and CrossGlobal Attention Graph Kernel Network (CrossGlobal) [31].As mentioned in 1, cosine distance is superior to its Euclidean counterpart under highly imbalanced chronic disease (CrossGlobal), while both the Euclidean and cosine distance measures achieve high prediction performance under shortterm diseases (MGKF). We now question how they relate to each other. It is not suitable to directly compare CrossGlobal and MGKF since the following:

Different datasets ^{†}^{†}Different database provider. are used in MGKF and CrossGlobal.

Different model structures and optimization perspectives. In MGKF, optimization aims at generating optimal kernel fusion, while it turns out to find optimal graph embedding under CrossGlobal.

Different data balance and imbalance ratio between shortterm and chronic disease.
To fairly compare MGKF and CrossGlobal with Euclidean and cosine distance under shortterm and chronic disease, a unified framework is required. Here, we extend and generalize previous efforts to differentiate the behavior of Euclidean and cosine distance, in addition to the theoretical discussion. A unified framework for graphkernel based drug prescription outcome prediction is presented to conduct a rigorous empirical evaluation on all diseases in previous works on a very largescale realworld EHRs.
3. Discussion from the Geometric Point of View
3.1. Riemannian and SubRiemannian geometries
To discuss differences between Euclidean and cosine distance, we first establish some mathematical properties with those distance under geometry point of view. In fact, we may consider this problem in a more general setting.
Let be linearly independent vector fields on an dimensional real manifold with of the tangent bundle . To find a good kernel function to describe the diffusion (energy flow) between two points in , we need to solve the heat equation associate to the the sum of square of ’s:
When , the operator is elliptic. Assume that
In this case, we have a natural volume element which yields the adjoint vector fields
for . More precisely, for ,whence is the classical LaplaceBeltrami operator whose second order part agrees with the operator . This suggests us that we may use the given differential operator to introduce a geometry on which may help us to solve and hence the heat equation. Hence, for , the solving kernel for the heat operator takes the form
The ’s are functions of and . Here represents the induced Riemannian distance between the points and in . Moreover, stands for negligible error. Furthermore,
i.e., is a solution of the HamiltonJacobi equation. Here is the Hamiltonian function associated with .
The simplest example will be the Euclidean distance and the kernel is the Gaussian (see [3]). In this paper, we are going to use another nontrivial example of Riemannian metric. Given a large sample space, we first embed those sample s in an dimensional sphere . Given two points and in , we define the “distance” . Here is the Euclidean distance between the point and the origin. This is socalled cosine metric. In other words, and are on the same sphere. Hence, and are located on a ”big circle” which is determined by the center of the sphere and these two points. Instead of measure the arclength (which maybe huge), we consider the angle between and
. This metric provides better estimates for the kernel in applications of drug prescription prediction system for long term disease.
When , the operator is nonelliptic. In this case, the subspace is called the “horizontal subspace” of , and the vectors are called horizontal vectors at . Sometimes, we call the distribution the horizontal distribution. The sections in the horizontal bundle are called horizontal vector fields. They are smooth assignments . The set of the horizontal vector fields on will be denoted by . If is an open subset of , the set of horizontal vector fields on will be denoted by . We call the complement of the “missing direction” at .
Now we encounter new problems since , and we cannot find arclength in general. We overcome this difficulty by assuming bracket generating property: “the horizontal vector fields and their brackets span ”, then Chow’s theorem [10] to conclude that given any two points , there is a piecewise horizontal curve such that
and
This yields a distance, and therefore a geometry, which we shall call subRiemannian geometry
. SubRiemannian geometry was first discussed in the field of thermodynamics around 1800s. Carnot discovered the principle of an engine in 1824 involving two isotherms and two adiabatic processes, Jule studied adiabatic processes, and Clausius formulated the existence of the entropy in the second law of thermodynamics in 1854. In 1909 Carathéodory made the point regarding the relationship between the connectivity of two states by adiabatic processes and nonintegrability of a distribution, which is defined by the one form of work. Chow proved the general global connectivity in 1934 which was used in studying of partial differential equations. There are significant differences between Riemannian and subRiemannian geometries. However, this geometry can be applied in many situations in our daily life. For more details, readers can read the book by Calin and Chang
[2].A subRiemannian structure over a manifold is a pair , where is a bracket generating distribution and a fibre inner product defined on . The length of the horizontal curve is
The shortest length is called a CarnotCarathéodory distance between which is given by
where the infimum is taken over all absolutely continuous horizontal curves joining and [5].
3.2. Horizontal Connectivity
In outcome prediction task of drug prescription, one of the main difficulties is to overcome the distinguishing features under shortterm and longterm disease progression. Moreover, for long term diseases, we need to avoid some loweffective (or even useless or dangerous) drugs. The answer to this question not only helps us better characterize the embedding space inferred by Euclidean and subRiemannian distances, but also leads to different optimization formulations. Mathematically, the first task is to address the following question: Given any two points on a topologically connected subRiemannian manifold, under what conditions can we join them by a horizontal curve? In the outcome prediction task of drug prescription, we must distinguish features under shortterm and longterm disease progression. The answer to this question not only helps us to better characterize the embedding space inferred by Euclidean and cosine distance, but also lead to useful optimization formulation under their embedding properties.
To answer this question, we need to prove the following two results. Readers can find more detailed discussions in the book by Calin and Chang [2].
Proposition 3.1. Let be an open set and be a differentiable distribution on . Then for any point there is a manifold such that
. ;
. ;
. any two points of can be joined by a piecewise horizontal curve.
Proof.
Let be the vector fields in local coordinates. Consider the ODE system
(3.1) 
where with , is a system with parameters.
The solutions of (3.1) are horizontal curves with controls . Let be the initial conditions of system (3.1). Standard theorems of ODE system provide the existence and local uniqueness of the solutions, which can be expressed by
for , with . Since the vector components are differentiable, a general theorem states that the functions are twice differentiable with respect to and locally continuous differentiable with respect to .
Since system (3.1
) is autonomous, a simple application of he chain rule shows that the functions
verify the relationswhere and .
Applying the theorem on differentiability with respect to a parameter to system (3.1) yields that are continuous differentiable with respect to if with sufficiently small .
If we let , for , then the formulas
for define a dimensional manifold passing through the point . To finish the proof we will need to show that the rank of Jacobian is maximum, i.e., equal to . This is equivalent with the fact that the vector fields
are linearly independent. Since , it suffices to show that
are linearly independent. Since
it follows that
which are linearly independent vector fields for . It follows that
The proof of this proposition is therefore complete. ∎
Proposition 3.2. Let be a nonintegrable distribution. Assume that through each point of the domains passes a connected dimensional manifold defined by the equations
(3.2) 
where are continuous differentiable functions on a domain , such that
Then there is a domain such that
. for all , there is a connected dimensional manifold passing through ;
. the functions that define the manifolds on have the same properties as the functions ’s in (3.2).
Proof.
Let be the horizontal distribution and
be the extrinsic ideal associated with . Since the distribution is not integrable, the Pfaff system is not integrable; i.e., it cannot have integral manifolds of dimension .
Proof of statement . For any , there is a horizontal vector such that , i.e., not tangent to the manifold .
The proof of is by contradiction. Let be a fixed point. Assume that any horizontal vector field about is tangent to the manifold . Then
Therefore , and since , it follows that the inclusion is in fact identity; i.e., for all . Since the oneforms vanish on , it follows that is an integral plane for the Pfaff system and hence is an integral manifold for is an integral manifold , which is a contradiction, because is not integrable. Hence we prove the assertion ,
Let be a point with coordinates and be the vector given by ; i.e., and . Let be such that
The numbers will be kept constant for the rest of the proof.
Proof of statement . The matrix
has rank at the point .
The first rows of the matrix are the components of the coordinate vector fields on the manifold , which are tangent to , linearly independent, and span the tangent space . The last row of has the component of the vector , which is transversal to , so all vectors are linearly independent at and hence .
Since all the elements of the matrix are continuous functions of the coordinates of the point , while are still kept constant, there is a subdomain such that and on .
From the nonvanishing Jacobian condition on it follows that the following vector fields
(3.3) 
are linearly independent on .
From the preceding discussion, the following vector fields
(3.4) 
are linearly independent on . We can complete system (3.4) with elements of set (3.3), say
(3.5) 
are linearly independent on .
In the following we shall deal with the construction of a dimensional manifold passing through , which depends on parameters. In equation (3.2) consider the parameters frozen. Let be the coordinates on this new manifold . Then
(3.6) 
where is continuous differentiable with respect to and and
The equation of the integral curves of the vector field on are given by
(3.7) 
We shall construct a dimensional manifold by pushing the manifold in the direction of the integral curves of . This can be done by substituting the variables given by (3.6) into the expressions provided by (3.7). Let be the variable. We obtain
where are kept constant. are continuous differentiable functions of .
To show that the equations
(3.8) 
defines a manifold of dimension , we need to show that
(3.9) 
on some neighborhood of included in .
Applying the chain rule yields
Since on a neighborhood of , using that vector fields (3.4) are linearly independent yields
are linearly independent, which means that (3.9) holds.
Using that (3.5) are linearly independent on , it follows that the vector fields
are linearly independent on a subdomain , which contains . Therefore
and hence the functions have the same properties as the functions in (3.2).
In conclusion, through each point of passes a connected manifold defined by equations (3.8), and each manifold depends on parameters. We finish the proof of this proposition.
∎
Now we are in a position to prove the local connectivity property. This result was proved by Teleman [26] for the Pfaff systems that do not contain integrable combinations in 1957. Here we shall prove it from the point of view of distributions.
Theorem 3.3. Let be a nonintegrable differentiable distribution of rank on the open set . Then any domain contains a subdomain such that for any , , there is a piecewise horizontal curve that joins the points and .
Proof.
From Proposition 3.1, for any , there is a dimensional connected manifold passing through . Applying Proposition 3.2 times yields a subdomain such that for all , thee is an dimensional connected manifold passing through .
Let be two arbitrary points. Let be a path joining and contained in ( not necessarily supposed to be a horizontal curve.) Since covers the compact set , there is a finite subcovering; i.e., we can choose points on
such that
We can choose the points such that any two consecutive points and belong to the same manifold . Since the manifolds are connected , the points and can be joined by a horizontal curve. This way, the points and can be joined by a piecewise horizontal curve. ∎
3.3. Subelliptic Heat Kernel
Now we need to use Hamilton or Lagrange formalisms to construct the fundamental solution of the subelliptic heat operator. In other words, we are interested in finding the solving kernels for the operators Inspired by the Gaussian, it is reasonable to expect the kernel has the form:
The modified complex action function can be written as which plays the role of and satisfies the HamiltonJacobi equation
In general, when we deal with a subelliptic heat operator, the heat kernel will depends on parameters (or Lagrange multipliers) . Furthermore, after calculation, one may see that the action function can be written as
We look for a heat kernel in the form . The heat kernel should not depend on . So we use an age old technique to get rid of by summing over it. Since are continuous parameters. Thus we shall look for a heat kernel in the following form:
Here and is socalled volume element which is an appropriate measure that makes the integral is convergent. Now we may apply properties of the heat kernel and reduce the problem to solving the transport equation to find . Once we obtain , then the index can be determined which depends on the Hausdroff dimension of the subRiemannian manifold . When , the is an dimensional Riemannian manifold and where is the topological dimension of the manifold. In the case, the volume element is just the zero section that will recover results in elliptic cases. For more details, readers the books [3], [4] and a forthcoming research article.
4. Unified Framework
To compare how distance metrics affect prediction performance, we present a unified framework for a graph kernelbased drug prescription prediction system in support of a rigorous empirical evaluation. We consider all disease and distance metric configurations for both data balance and imbalance ratios. Motivated by MGKF and CrossGlobal, a hybrid model is formulated to leverage advantages from these two models. Following the same MGKF threekernel architecture, namely a WeisfeilerLehman subtree kernel [24], Temporal topological kernel [30], and Vertex histogram kernel [25], we generate a fused kernel embedding with distance metric loss from CrossGlobal as regularization .
Specifically, three pairwise kernel matrices via the aforementioned graph kernels are constructed. A single representation for a fused kernel embedding is generated through a deep neural network for successive classification. The distance regularization, achieved by contrastive loss
[14], is integrated to combine the power of deep metric learning, and we force a kernel embedding to preserve an optimal distance property. Semantically similar embeddings are encouraged to be closer to each other, and dissimilar further apart in the kernel space. With this setting, kernel embedding is optimized jointly with classification loss and contrastive loss, deriving a single representation with multiviews and selected distance property. We discuss how embedding and prediction performance differs under different distance metrics in the next section.Given a set of patients with their patient graphs where and associated class labels where , we compute their pairwise kernel gram matrices , , and by , , and respectively. Let be a deep neural network parameterized by weight and
as a single layer sigmoid function parameterized by
, we defined an unified framework as the following optimization problem:(4.1) 
(4.2) 
(4.3) 
where is a constant margin threshold, if else , and is a pairwise distance metric between kernel embedding calculated by 4.4 or 4.5.
Let and , we also have:
(4.4) 
(4.5) 
where is a cosine distance and is the Euclidean distance. As usual, is a standard inner product. in 4.2 can be either or .
The optimization problem in 4.1
can be solved by minibatch Stochastic Gradient Descent (SGD). Once we find the optimal
and , we can perform the prediction. Considering a new incoming patient with patient graph , we compute pairwise kernel matrices , , and between and all patient graphs in . Then, we have the following decision function:(4.6) 
where is the predicted class label (e.g., success or failure) of .
The problem reduces to find good Riemanniaan or subRiemannian structures to handle a huge and complicated data set under certain constraints. In other words, we need to handle the related optimization problem 4.1 by finding horizontal vector fields and then construct solving kernel of the heat operator associate to the subelliptic operator . In this paper, we consider Reproducing Kernel Hilbert Space (RKHS) with certain geometric properties derived from Euclidean or cosine distances.
Disease  Number of cases  Number of failure  Number of success  Failuresuccess ratio 

Urinary tract infection  1,501,310  703,646  797,664  47%:53% 
Acute otitis media  151,522  72,264  79,258  48%:52% 
Pneumonia  95,796  37,724  58,072  39%:61% 
Acute cystitis  733,119  301,902  431,217  41%:59% 
Hypertension  235,695  104,936  130,759  45%:55% 
Hyperlipidemia  123,380  26,043  97,337  21%:79% 
Diabetes  131,997  34,414  97,583  26%:74% 
5. Dataset and Evaluation Protocol
To investigate how distance metric relates to kernel embedding and prediction, we conduct a rigorous empirical evaluation with our proposed unified framework under different data balanceimbalance ratio on a very largescale realworld EHRs, a subset of the Taiwanese National Health Insurance Research Database (NHIRD)^{†}^{†}https://nhird.nhri.org.tw/en/.
Our sample of the NHIRD contains a 20plus year, complete, medical history for over onemillion randomly sampled patients. The database is provided by Taiwan’s National Health Insurance Administration and the Ministry of Health and Welfare. Data are composed of registration files and original claim data for reimbursement to hospitals that participate in the National Health Insurance (NHI) program. The International Classification of Diseases, 9th Revision, Clinical Modification (ICD9CM) code indicates the disease diagnosed. A unique identifier is used per drug and can be further linked to the Anatomical Therapeutic Chemical (ATC) code. For privacy purposes, the NHIRD contains no patient personal information such as name, contact information, and exact birth day (e.g., only with birth year and month). Also all identification numbers for patients and hospitals are deidentified in an attempt to prevent possible information leak. Institutional Review Board (IRB) approvals for our research were granted by all associated institutions.
We select four shortterm diseases and three most prevalent chronic diseases in Taiwan. We follow our previous efforts setting an observation window for each type of disease. Refer Table 1 for complete disease list, data statistics, and outcome observation setup. To validate the claim, cosine distance is superior under data imbalance in chronic disease, in CrossGlobal [31], and examine such a case on shortterm disease in MGKF [30]
, we prepare balance and imbalance data. For balanced one, we downsample the size of majority cases to minority cases designating rare diseases, while keeping 70 percent of majority cases and 30 percents of minority cases for imbalanced one. The pairwise ttest with a pvalue set to 0.01 is used to reject the nullhypothesis to measure the statistical significance for comparisons.
We compare Euclidean (Euclidean) and cosine (Cosine) distance on our unified framework. In
, we set 5000 dimensions for the first embedding layer of each kernel and 50 dimensions for kernel fusion layer, as a twolayer Multilayer perceptron (MLP). We set 50 dimensions for sigmoid classifier
. During training, we use the Adamax optimizer [19]with a fixed learning rate 0.0001 and setup 64 batch size for 1000 epochs with early stopping criteria on batch loss. Two machine learning models are included as baselines for comparison purposes, e.g., Support Vector Machine (SVM) and Logistic Regression (LR) with all regularization constant set up to 1 (e.g.,
). All patients are represented as documents containing all medical codes from all visits, and transfer to lowdimensional embedding via Paragraph to Vector [20]with embedding size 512. Accuracy (ACC), Macro F1score (F1), and the area under the receiver operating characteristic curve (AUC) are used as our evaluation metrics. All models are developed by Tensorflow and scikitlearn packages using Python. The experiments are executed on an Intel Core i9 CPU with 64GB memory and one Nvidia TitanRTX GPU with 24GB memory
^{†}^{†}We do not perform hyperparameters tuning for all models.. Accuracy comparisons with other learning models, including a diversity of the latest neural configurations, are found in [30, 31]6. Main Result
Tables ^{†}^{†} Shaded regions indicate statistical equivalence (light) and significant difference (dark) of the Euclidean and cosine measures. ^{†}^{†} A designation indicates statistical significance over all baselines (SVM and LR) with pvalue at 0.01. 2, 3, 4, and 5 show evaluation results under shortterm diseases with data balanceimbalance, under different models. Chronic diseases are reported in Tables 6, 7, and 8. Under a balanced setting, both Euclidean and cosine distance are relatively similar in their shortterm and chronic disease evaluations. Euclidean distance even outperforms cosine distance in 5 out of 7 diseases, implying that Euclidean distance can achieve favorable results when data variation is small, no matter for short/long term disease progression. When it comes to an imbalance setting, the Euclidean distance measure is superior to the cosine measure for all shortterm diseases. This confirms our premise that Euclidean distance is applicable to local problems, namely short disease progression. On the other hand, cosine distance is preferable in imbalance longterm chronic disease, which outperforms Euclidean distance especially in F1 score (e.g., an indicator to model performance under imbalance data set). It is worth noting that the evaluation margin between Euclidean and cosine distance is pretty large in all chronic diseases. The degree of outcome variation (e.g., comorbidity) of longterm chronic disease patient group is larger than patient group in shortterm disease, which reflects that Euclidean is more applicable under a lowvariation data set. The comparison to the baseline models validates our unified framework. Note that, we did not perform any hyperparameters tuning nor customize to any specific disease group. The purpose of this evaluation was strictly to investigate possible conclusions on model behavior under different distance metrics.
Urinary tract infection  
Balanced  
Model  ACC  F1  AUC 
Euclidean  0.6220 0.0212  0.6186 0.0229  0.6220 0.0212 
Cosine  0.6243 0.0284  0.6216 0.0283  0.6243 0.0284 
SVM  0.5047 0.0257  0.5046 0.0258  0.5047 0.0257 
LR  0.5210 0.0240  0.4874 0.0300  0.4988 0.0296 
Imbalanced  
Euclidean  0.6280 0.0469  0.5465 0.0290  0.5895 0.0249 
Cosine  0.6165 0.0532  0.5632 0.0356  0.6194 0.0182 
SVM  0.5023 0.0266  0.5051 0.0266  0.5053 0.0265 
LR  0.5208 0.0240  0.4872 0.0260  0.4896 0.0297 
Acute otitis media  
Balanced  
Model  ACC  F1  AUC 
Euclidean  0.6245 0.0200  0.6224 0.0218  0.6245 0.0200 
Cosine  0.6138 0.0183  0.6097 0.0185  0.6137 0.0185 
SVM  0.5023 0.0204  0.5021 0.0203  0.5023 0.0203 
LR  0.5011 0.0211  0.5010 0.0211  0.5011 0.0212 
Imbalanced  
Euclidean  0.6570 0.0203  0.5453 0.0342  0.6037 0.0324 
Cosine  0.6238 0.0306  0.5554 0.0258  0.6042 0.0346 
SVM  0.5165 0.0196  0.4803 0.0212  0.4899 0.0246 
LR  0.5170 0.0177  0.4804 0.0201  0.4898 0.0237 
Pneumonia  
Balanced  
Model  ACC  F1  AUC 
Euclidean  0.6013 0.0279  0.5922 0.3112  0.6013 0.0279 
Cosine  0.6023 0.0211  0.5918 0.0263  0.6023 0.0211 
SVM  0.4976 0.0127  0.4975 0.0127  0.4976 0.0126 
LR  0.4979 0.0130  0.4978 0.0130  0.4979 0.0129 
Imbalanced  
Euclidean  0.6398 0.0688  0.5626 0.0423  0.6028 0.0270 
Cosine  0.6255 0.0470  0.5712 0.0209  0.6220 0.0250 
SVM  0.5430 0.0243  0.5070 0.0246  0.5179 0.0268 
LR  0.5430 0.0243  0.5074 0.0242  0.5186 0.0261 
Acute cystitis  
Balanced  
Model  ACC  F1  AUC 
Euclidean  0.6143 0.0189  0.6087 0.0245  0.6143 0.0189 
Cosine  0.6095 0.0182  0.6068 0.0199  0.6095 0.0182 
SVM  0.5049 0.0231  0.5048 0.0231  0.5049 0.0231 
LR  0.5037 0.0228  0.5037 0.0228  0.5037 0.0228 
Imbalanced  
Euclidean  0.6353 0.0346  0.5607 0.0231  0.5763 0.0325 
Cosine  0.6280 0.0405  0.5632 0.0201  0.5839 0.0267 
SVM  0.5235 0.0227  0.4871 0.0219  0.4965 0.0233 
LR  0.5230 0.0248  0.4860 0.0232  0.4957 0.0242 
Hypertension  
Balanced  
Model  ACC  F1  AUC 
Euclidean  0.7315 0.0126  0.7305 0.0131  0.7315 0.0126 
Cosine  0.7290 0.0131  0.7282 
Comments
There are no comments yet.