Outline
-
Introduction
-
Data Science Overview
-
Supervised Learning (Inferential Statistics)
-
Classifications
-
Regression
-
-
-
Phylogenetics to Phylogenomics
-
Phylogenetic Trees
-
Space of Phylogenetic Trees
-
-
Basics in Tropical Geometry
-
Tropical Unsupervised Learning (Tropical Descriptive Statistics)
-
Tropical Fermat Weber Points
-
Tropical Frécet Means
-
Tropical Principal Component Analysis (PCA)
-
-
Tropical Supervised Learning (Tropical Inferential Statistics)
-
Tropical Classifications
-
Tropical Support Vector Machines
-
Tropical Linear Discriminant Analysis
-
-
Tropical Regression
-
Key Words: Machine Learning Models, Max-Plus Algebra, Phylogenetics, Phylogenomics, Tropical Geometry, Ultrametrics.
1 Introduction
Due to increasing amount of data today, data science is one of most exciting fields. It finds applications in statistics, computer science, business, biology, data security, physics, and so on. Most of statistical models in data sciences assume that data points in an input sample are distributed over a Euclidean space if they have numerical measurements. However, in some cases this assumption can be failed. For example, a space of phylogenetic trees with a fixed set of leaves is an union of lower dimensional cones over , where with is the number of leaves [2]. Since the space of phylogenetic trees is an union of lower dimensional cones, we cannot just apply statistical models in data science to a set of phylogenetic trees [20].
There has been much work in spaces of phylogenetic trees. In 2001, Billera-Holmes-Vogtman (BHV) developed a notion of a space of phylogenetic trees with a fixed set of labels for leaves [4], which is a set of all possible phylogenetic trees with the fixed set of lebels on leaves and is an union of orthants, each orthant is for all possible phylogenetic trees with a fixed tree topology. In this space, two orthants are next to each other if the tree topology for one orthant is one nearest neighbor interchange (NNI) distance to the tree topology for the other orthant. They also showed that this space is CAT(0) space so that there is a unique shortest connecting paths, or geodesics, between any two points in the space defined by the -metric. There is some work in development on machine learning models with the BHV metric. For example, Nye defined a notion of the first order principal component geodesic as the unique geodesic with the BHV metric over the the tree space which minimizes the sum of residuals between the geodesic and each data point [14]. However, we cannot use a convex hull under the BHV metric for higher principal components because Lin et. al showed that the convex hull of three points with the BHV metric over the tree space can have arbitrarily high dimension [10].
In 2004, Speyer and Sturmfels showed a space of phylogenetic trees with a given set of labels on their leaves is a tropical Grassmanian [18], which is a tropicalization of a linear space defined by a set of linear equations [20] with the max-plus algebra. The tropical metric with max-plus algebra on the tree space is known to behave very well [1, 6]. For example, contrarily to the BHV metric, the dimension of the convex hull of tropical points is at most .
Thus, this paper focuses on the tropical metric over tree spaces. In this paper we review some development on statistical learning models with the tropical metric with max-plus algebra on tree spaces as well as the tropical projective space, and we overview some open problems.
2 Data Science Overview
In this section, we briefly overview statistical models in data science. For more details, we recommend to read Introduction of Statistical Learning with R http://faculty.marshall.usc.edu/gareth-james/ISL/.
In data science there are roughly two sub-branches of data science: unsupervised learning and supervised learning (Figure 1
). In unsupervised learning, our goal is to compute a descriptive statistics to see how data points are distributed over the sample space or how data points are clustered together. In statistics, unsupervised learning corresponds to descriptive statistics. In supervised learning, our goal is to predict/infer the response variable from explanatory variables. In statistics, supervised learning corresponds to inferential statistics. Like unsupervised learning and supervised learning, there are some notations with different names between machine learning and statistics. Thus we summarize some of the differences in Table
1.Statistics | Data Science |
---|---|
descriptive statistics | Unsupervised learning |
inferential statistics | Supervised learning |
response variable | target variable |
explanatory variable | predictor variable |
feature |

2.1 Basic Definitions
-
Response variable – the variable for an interest in a study or experiment. It can be called as a dependent variable. In machine learning it is also called a target variable.
-
Explanatory variable – the variable explains the changes in the response variable. It can be also called a feature or independent variable. In machine learning it is also called feature or predictor.
2.2 Unsupervised Learning
Since unsupervised learning is descriptive, there is no response variables. In unsupervised learning, we try to learn how data points are distributed and how they related to each other. Among them, there are mainly two categories: clustering and dimensionality reduction.
-
Clustering – grouping data points into subsets by their “similarity”. These similarities are defined by a user. These groups are called clusters.
-
Dimensionality reduction – reducing the dimension of data points with minimizing the loss of information. One of the most commonly used methods is principal component analysis (PCA), a dimension reduction procedure via linear algebra.
2.3 Supervised Learning
Supervised learning is inferential. Thus, there are the response variable and explanatory variables in an input data set. Depending on the scale of the response variable, we can separate two groups in supervised learning: classification and regression. In classification, the response variable has categorical scale and in regression, the response variable has numerical (interval) scale.
-
Classifications
– the response variable is categorical. Under classification, there are algorithms like logistic regression, support vector machine, linear discriminant analysis, classification trees, random forests, adaboost and etc.
-
Regression
– the response variable is numerical. There are algorithms like linear regression, regression trees, lasso, ridge regression, random forests, adaboost and etc.
For more details, see the following papers:
-
Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani. Introduction of Statistical Learning with R http://faculty.marshall.usc.edu/gareth-james/ISL/.
3 Phylogenetics to Phylogenomics
In this section we overview basics in phylogenetics and basic problem for phylogenomics.
3.1 Phylogenetic Trees
Evolutionary, or phylogenetic trees, show an organism’s evolutionary relationships over time, through the use of tree diagrams. Phylogenetic trees still consist of vertices (nodes) and edges (branches). Each node in a phylogenetic tree represents a past or present taxon or population: exterior nodes in a phylogenetic tree represent taxon or population at present; and interior nodes represent their ancestors. Edges in a phylogenetic tree have weights and a weight in each edge represents mutation rates multiplied by evolutional time from its ancestor to a taxon.
In Figure 2, an exterior vertex (leaf or tip) represents the current taxa (. An interior vertex represents an extinct taxa where ancestors split into two subgroups. Vertices and edges can still be labeled; however, only the leaves or tips are labeled in a phylogenetic tree. This is due to the past taxa often being inferred and not exactly known. Vertices in phylogenetic trees can be DNA sequences, shared genes or interrelated species, depending on the context of the tree. The root of the tree now represents the common ancestor of all leaves, , , and .

Phylogenetic trees are trees. Thurs, they remain acylic and connected. In terms of evolutional biology, these properties are intuitive, as a species must evolve from something and also as time progresses species can only evolve forward. Weights on edges in a tree represent the notion of time. The distance measures the dissimilarity between Species 1 and Species 2 with respect to time.
Let be the number of leaves on a phylogenetic tree. If a total of weights of all edges in a path from the root to each leaf in a rooted phylogenetic tree is the same for all leaves , then we call a phylogenetic tree equidistant tree. The height of an equidistant tree is the total weight of all edges in a path from the root to each leaf in the tree.
3.1.1 Phylogenetic Tree Reconstruction
Phylogenetic reconstruction uses genetic data to create an inferential evolutionary (phylogenetic) tree. These changing characters are the mutations in DNA sequences. DNA sequences represent a shared gene across multiple species. Trees are excellent at representing the evolutionary changes of this shared gene through node splits and leaves.
Even though we do not discuss details on a phylogenetic tree reconstruction in this paper, multiple steps and techniques are involved in the reconstruction process and there are several types of tree reconstruction methods;
-
Maximum likelihood estimation (MLE) methods
– These methods describe evolution in terms of a discrete-state continuous-time Markov process. -
Maximum Parsimony – Reconstructs tree with the least evolutionary changes which explain data.
-
Bayesian inference for trees
– Use Bayes Theorem and MCMC to estimate the posterior distribution rather than obtaining the point estimation.
-
Distance based methods – Reconstructing a tree from a distance matrix.
3.2 Space of Phylogenetic Trees
There are several ways to define a space of phylogenetic trees with different metrics. One of the very well-known tree spaces is Billera-Holmes-Vogtmann tree space. In 2001, Billera-Holmes-Vogtmann (BHV) introduced a continuous space which models the set of rooted phylogenetic trees with edge lengths on a fixed set of leaves. In this space, edge lengths in a tree are continuous and we assign a coordinate for each interior edge. Note that unrooted trees can be accommodated by designating a fixed leaf node as the root. The BHV tree space is not Euclidean, but it is non-positively curved, and thus has the property that any two points are connected by a unique shortest path through the space, called a geodesic. The distance between two trees is defined as the length of the geodesic connecting them. While in this paper, we do not consider the BHV tree space, read [4] for interested readers.
Through this paper, we assume that all phylogenetic trees are equidistant trees. An equidistance tree is a rooted phylogenetic tree such that the sum of all branch lengths in the unique path from the root to each leaf in the tree, called the height of the tree, is fixed and they are the same for all leaves in the tree. In phylogenetics this assumption is fairly mild since the multispecies coalescent model assumes that all gene trees have the same height.
Example 1.
Suppose . Consider two rooted phylogenetic trees with the set of labels on the leaves in Figure 3. Note that for each tree, the sum of branch lengths in the unique path from the root to each leaf is . Therefore they are equidistant trees with their height are equal to .

For the space of equidistant trees with the fixed set of labels on their leaves, the BHV tree space might not be appropriate [7]. Therefore, we consider the space of ultrametrics. To define ultrametrics and theire relations to equidistant trees, we need to define dissimilarity maps.
Definition 2.
[Dissimilarity Map] A dissimilarity map is a function such that
for all . If a dissimilarity map additionally satisfies the triangle inequality, that is:
for all , then is called a metric. If there exists a phylogenetic tree such that coincides with the total branch length of the edges in the unique path from a leaf to a leaf for all leaves , then we say a tree metric. If a metric is a tree metric and is the total branch length of all edges in the path from a leaf to a leaf for all leaves in a phylogenetic tree , then we say realises a phylogenetic tree or is a realisable of a phylogenetic tree .
Since
to simplify we write
Example 3.
Definition 4 (Three Point Condition).
If a metric satisfies the following condition: For every distinct leaves ,
achieves twice, then we say that satisfies the three point condition.
Definition 5 (Ultrametrics).
If a metric satisfies the three point condition then is called an ultrametric.
Theorem 6 ([8]).
A dissimilarity map is ultrametric if and only if is realisable of an equidistant tree with labels . In addition, for each equidistant tree there exists a unique ultrametric. Conversely, for each ultrametric, there exists a unique equidistant tree.
Example 7.
We again consider equidistant trees in Figure 3. The dissimilarity map obtained from the left tree in Figure 3 is
Similarly, the dissimilarity map obtained from the right tree in Figure 3 is
Since these phylogenetic trees are equidistant trees, these dissimilarity maps are ultrametrics by Theorem 6.
From Theorem 6 we consider the space of ultrametrics with labels as a space of all equidistant trees with the label set . Let be the space of ultrametrics for equidistant trees with the leaf labels . In fact we can write as the tropicalization of the linear space generated by linear equations.
Let be the linear subspace defined by the linear equations such that
(1) |
for . For the linear equations (1) spanning the linear space , the max-plus tropicalization of the linear space is the tropical linear space with such that
achieves at least twice for all . Note that this is exactly the three point condition defined in Definition 5.
Theorem 8.
[20, Theorem 2.18] The image of in the tropical projective torus coincides with .
For example, if , The space of ultrametrics is a two-dimensional fan with maximal cones.
4 Basics in Tropical Geometry
Here we review some basics of tropical arithmetic and geometry, as well as setting up the notation through this paper.
Definition 9 (Tropical arithmetic operations).
Throughout this paper we perform arithmetic over the max-plus tropical semiring . Over this tropical semiring, the basic tropical arithmetic operations of addition and multiplication are defined as the following:
Over this tropical semiring, is the identity element under addition and is the identity element under multiplication.
Example 10.
Suppose we have . Then
Definition 11 (Tropical scalar multiplication and vector addition).
For any and for any , tropical scalar multiplication and tropical vector addition are defined as:
Example 12.
Suppose we have
and . Then we have
and
Throughout this paper we consider the tropical projective torus, that is, the projective space , where , the all-one vector.
Example 13.
Consider . Then let
Then over we have the following equality:
Note that is isometric to .
Example 14.
Consider . Then let
Also let . Then we have
In order to conduct a statistical analysis we need a distance measure between two vectors in the space. Thus we discuss a distance between two vectors in the tropical projective space. In fact the following distance between two vectors in the tropical projective space is a metric.
Definition 15 (Generalized Hilbert projective metric).
For any two points , the tropical distance between and is defined:
(2) |
where and . This distance is a metric in . Therefore, we call tropical metric.
Example 16.
Suppose such that
Then the tropical distance between is
Similar to the BHV metric over the BHV tree space, we need to define a geodesic over the space of ultrametrics. In order to define a tropical geodesic we need to define a tropical polytope:
Definition 17.
Suppose we have a finite subset The tropical convex hull or tropical polytope of is the smallest tropically-convex subset containing written as the set of all tropical linear combinations of such that:
where . A tropical line segment between two points is a tropical convex hull of two points .
Note that the length between two points along the tropical line segment between equals to the tropical distance . In this paper we define a tropical line segment between two points as a tropical geodesic between these points.
Example 18.
Suppose such that
From the previous example, the tropical distance between is
Also the tropical line segment between is a line segment between these three points:
The length of the line segment is
Example 19.

For more details, see the following papers:
-
D. Maclagan and B. Sturmfels. Introduction to Tropical Geometry [9].
5 Tropical Unsupervised Learning
Unsupervised learning is descriptive and we do not know much about descriptive statistics using tropical geometry with max-plus algebra, for example, tropical Fermat Weber (FW) points and tropical Frécet means.
In this section we discuss tropical FW points and tropical Frécet means, what they are and what we know and we do not know. In the end of this section, we discuss tropical principal component analysis (PCA). Over this section we consider the tropical projective torus .
5.1 Tropical Fermat Weber Points
Suppose we have a sample over . A tropical Fermat-Weber point minimizes the sum of distances to the given points.
(3) |
There are properties of tropical Fermat-Weber points of a sample over .
Proposition 20.
Suppose .
Then the set of tropical Fermat-Weber points of a sample over is a convex polytope.
It consists of all optimal solutions to the following linear program:
(4) |
From Proposition 20, there can be infinitely many tropical Fermat-Weber points of a sample.
If we focus on the space of ultrametrics for equidistance trees with leaves, then we have the following proposition:
Proposition 21.
If a sample over the space of ultrametrics , then tropical Fermat-Weber points are in .
In [12], we showed explicitly how to compute the set of all possible Fermat-Weber points in . However, we do not know the minimal set of inequalities needed to define the set of all tropical Fermat-Weber points of a given sample. Thus here is an open problem:
Problem 22.
What is the minimal set of inequalities needed to define the set of all tropical Fermat-Weber points of a given sample? What is the time complexity to compute the set of tropical Fermat-Weber points of a sample of points in ? Is there a polynomial time algorithm to compute the vertices of the polytope of tropical Fermat-Weber points of a sample of points in in and ?
For more details, see the following papers:
-
B. Lin and R. Yoshida Tropical Fermat–Weber Points [12].
5.2 Tropical Frécet Means
Suppose we have a sample over . A tropical Fréchet mean minimizes the sum of distances to the given points.
(5) |
As we formulated computing a tropical Fermat-Weber point as a linear programming problem, we can also formulate computing a tropical Frécet mean as a quadratic programming problem:
(6) |
While we know some propertied of tropical Fermat-Weber points we do not know much about tropical Fréchen means. Here are some basics on tropical Fréchet means.
Proposition 23.
Suppose . Then the set of tropical Fréchen means of a sample over is a convex polytope. It consists of all optimal solutions to the following quadratic program:
(7) |
Still we do not know much about tropical Fréchet means. First we have the following problem.
Problem 24.
If a sample over the space of ultrametrics , then are tropical Féchet means in ?
We still do not know how to compute tropical Fréchet means in efficient ways. So we have the following problem:
Problem 25.
Suppose we have over . Is there an algorithm to compute all tropical Fréchet means in ?
5.3 Tropical Principal Component Analysis (PCA)
Principal component analysis (PCA) is one of the most popular methods to reduce dimensionality of input data and to visualize them. Classical PCA takes data points in a high-dimensional Euclidean space and represents them in a lower-dimensional plane in such a way that the residual sum of squares is minimized. We cannot directly apply the classical PCA to a set of phylogenetic trees because the space of phylogenetic trees with a fixed number of leaves is not Euclidean; it is a union of lower dimensional polyhedral cones in , where is the number of leaves.
There is a statistical method similar to PCA over the space of phylogenetic trees with a fixed set of leaves in terms of the Billera-Holmes-Vogtman (BHV) metric.
In 2001, Billera-Holmes-Vogtman developed the space of phylogenetic trees with fixed labeled leaves and they showed that it is space [5]. Therefore, a geodesic between any two points in the space of phylogenetic trees is unique.
Short after that, Nye showed an algorithm in [15] to compute the first order principal component over the space of phylogenetic trees of leaves with the BHV metric.
Nye in [15] used a convex hull of two points, i.e., the geodesic, on the tree space as the first order PCA. However, this idea can not be generalized to higher order principal components with the BHV metric since the convex hull of three points with the BHV metric over the tree space can have arbitrarily high dimension [11].
On the other hand, the tropical metric in the tree space in terms of the max-plus algebra is well-studied and well-behaved [13]. For example, the dimension of the convex hull of points in terms of the tropical metric is at most . Using the tropical metric, Yoshida et al. in [20] introduced a statistical method similar to PCA with the max-plus tropical arithmetic in two ways: the tropical principal linear space, that is, the best-fit Stiefel tropical linear space of fixed dimension closest to the data points in the tropical projective torus; and the tropical principal polytope, that is, the best-fit tropical polytope with a fixed number of vertices closest to the data points. The authors showed that the latter object can be written as a mixed-integer programming problem to compute them, and they applied the second definition to datasets consisting of collections of phylogenetic trees. Nevertheless, exactly computing the best-fit tropical polytope can be expensive due to the high-dimensionality of the mixed-integer programming problem.
Definition 26.
Let be a tropical polytope with its vertices and let be a sample from the space of ultrametrics . Let , where is the tropical projection of onto a tropical polytope . Then the vertices of the tropical polytope are called the -th order tropical principal polytope of if the tropical polytope minimizes over all possible tropical polytopes with many vertices.
In [16]
, Page et.al developed a heuristic method to compute tropical principal polytope and they applied it to empirical data sets on genome data of influenza flu collected from New York city, Apicomplexa, and African coelacanth genome data sets.
Also Page et.al showed the following theorem and lemma:
Theorem 27 ([16]).
Let be a tropical polytope spanned by ultrametrics in . Then and any two points and in the same cell of are also ultrametrics with the same tree topology.
Lemma 28 ([16]).
Let be a tropical polytope spanned by ultrametrics. The origin is contained in if and only if the path between each pair of leaves passes through the root of some .
There are still some open problem on tropical PCA. Here is one of questions we can work on:
Conjecture 29.
There exists a tropical Fermat-Weber point of a sample of ultrametric trees which is contained in the th order tropical PCA of the dataset for .
6 Tropical Supervised Learning
For tropical supervised learning, there is not much done. For classification, there is some work done. Recently Tang et.al in [19] introduced a notion of tropical support vector machines (SVMs). In this section we discuss tropical SVMs and we introduce a notion of tropical linear discriminant analysis (LDA).
6.1 Tropical Classifications
For tropical classification, we consider the binary response variables. Suppose we have a data set given that
where and . Therefore, the response variable is binary. Thus, we can partition a sample of data points into two sets and such that
6.1.1 Tropical support vector machine SVMs
A support vector machine (SVM) is a supervised learning model to predict the categorical response variable. For a binary response variable, a classical linear SVM classifies data points by finding a linear hyperplane to separate the data points into two groups. In this paper we refer a classical SVM as a classical linear SVM over an Euclidean space
with norm.For an Euclidean space , there are two types of SVMs: hard margin SVMs and soft margin SVMs. A hard margin SVM is a model with the assumption that all data points can be separated by a linear hyperplane into two groups without errors. A soft margin SVM is a model which maximizes the margin and also allows some data points in the wrong side of the hyperplane.
Similar to a classical SVM over a Euclidean space, a tropical SVM is a supervised learning model which classifies data points by finding a tropical hyperplane to separate them. In [19], as a classical SVM, Tang et.al defined two types of tropical SVMs: hard margin tropical SVMs and soft margin tropical SVMs. A hard margin tropical SVM introduced by [3] is, similar to a classical hard margin SVM, a model to find a tropical hyperplane which maximizes the margin, the minimum tropical distance from data points to the tropical hyperplane (which is in Figure 5), to separate these data points into open sectors. Note that an open sector of a tropical hyperplane can be seen as a tropical version of an open half space defined by a hyperplane. A tropical soft margin SVM introduced by [19] is a model to find a tropical SVM to maximizes the margin but it also allows some data points into a wrong open sector.
The authors in [3] showed that computing a tropical hyperplane for a tropical hard margin SVM from a given sample on the tropical projective space can be formulated as a linear programming problem. Again, note that, similar to the classical hard margin SVMs, hard margin tropical SVMs assume that there exists a tropical hyperplane such that it separates all data points in the tropical projective space into each open sector (see the left figure in Figure 5).
![]() |
![]() |
In order to discuss details on tropical SVMs, we need to define a tropical hyplerplane and their open sectors.
Definition 30.
Suppose . The tropical hyperplane defined by , denoted by , is the set of all points such that
is attained at least twice. is called the normal vector of .
Definition 31.
A tropical hyperplane divides the tropical projective space into components. These components divided by are called open sectors given that:
Example 32.
Consider . Then a tropical hypoerplane in has three open sectors seen as Figure 5. Note that is isometric to .
Now we define the tropical distance from a point to a tropical hyperplane.
Definition 33.
The tropical distance from a point to the tropical hyperplane is defined as:
A tropical hard margin SVM assumes that all points are separated by a tropical hyperplane and all data points with the same category for their response variable are assigned in the same open sector. Thus, to compute a tropical hard margin hyperplane for a tropical SVM, we Want to find the normal vector of a tropical hyperplane such that
where and are the largest and the second largest coordinate of the vector for all .
Theorem 34 ([19]).
The normal vector of the tropical hard margin for a tropical SVM is the optimal solution of the following linear programming problem:
(8) | ||||
(9) | ||||
(10) | ||||
(11) |
As we discussed earlier, tropical soft margin SVMs are similar to tropical hard margin SVMs. They try to find a tropical hyperplane which maximizes the margin but also they allow some points to be in a wrong open sector by introducing extra variables in Figure 5. Tang et.al showed in [19] that a soft margin tropical hyperplane for a tropical SVM is the optimal solution of the following linear programming problem such that:
(12) | ||||
(13) | ||||
(14) | ||||
(15) | ||||
(16) |
There are still many open questions we can ask in terms of tropical SVMs. In general, if we use methods to find a hard margin or soft margin tropical hyperplane developed in [19], then we have to go through exponentially many linear programming problems. However, we do not know the exact time complexity to find a tropical hard margin or soft margin tropical hyperplane for a tropical SVM.
Problem 35.
What is the time complexity of a hard or a soft margin tropical hyperplane for a tropical SVM over the tropical projective torus? Is it NP-hard?
In addition, the authors in [19] focused on tropical hyperplanes for tropical SVMs over the tropical projective torus not over the space of ultrametrics . Again note that is an union of dimensional cones over . Thus we are interested in how and a tropical SVM over related to each other. More specifically:
Problem 36.
Can we describe how a hard or soft margin tropical hyperplane for a tropical SVM over the tropical projective torus separates points in the space of ultrametrics in terms of geometry?
Also we are interested in defining a tropical SVM over and developing algorithms to compute them.
Problem 37.
Define tropical hard and soft margin ”hyperplane” for tropical SVMs over . To define them can we use a tropical polytope instead of a tropical hyperplane? How can we compute them? Can we formulate as an optimization problem?
For more details, see the following papers:
-
Tang, Wang, and Yoshida. Tropical Support Vector Machine and its Applications to Phylogenomics. [19].
6.1.2 Tropical Linear Discriminant Analysis (LDA)
In this section we discuss tropical linear discriminant analysis (LDA). LDA is one of the classical statistical methods to classify dataset into two classes or more as the same time they reduce the dimensionality.
LDA is related to PCA in a Euclidean space and these relations are shown in Figure 6. The different between PCA and LDA is how to find the direction of a linear plane.

For two classes of samples , the linear space for the classical LDA can be found as the optimal solution of an optimization problem such that
(17) |
Here we use the max-plus algebra in tropical setting. Also we consider the tropical projective space for now. Let as a tropical distance between two points in the tropical projective space . Then we can formulate the tropical linear space for tropical LDA in Equation (17) as
(18) |
Problem 38.
Can we define a tropical LDA over the tropical projective space? If so how can we find a tropical linear space (or tropical polytope) for a tropical LDA?
Problem 39.
Can we define a tropical LDA over the space of ultrametrics ?
6.2 Tropical Regression
For a classical multiple linear regression, with the observed data set
where and , we try to find a vector such that
where with
is the Gaussian distribution with the mean
and the standard deviation
, is a response variable, and are explanatory variables with the smallest following value:(19) |
The value in Equation 19 is called the sum of squared residuals. Thus, for a classical multiple linear regression over the Euclidean space , we try to find the linear hyperplane with the smallest sum of squared residuals.
For tropical regression over the tropical projective space, one can define a tropical regression ”polytope” as the tropical polytope with
It has nothing done in tropical regression. Thus, it would be interesting to see how one can define them in the tropical projective space as well as the space of ultrametrics.
References
- [1] M. Akian, S. Gaubert, N. Viorel, and I. Singer. Best approximation in max-plus semimodules. Linear Algebra Appl., 435:3261–3296, 2011.
- [2] F. Ardila and C. J. Klivans. The bergman complex of a matroid and phylogenetic trees. journal of combinatorial theory. Series B, 96(1):38–49, 2006.
- [3] B.Gärtner and M. Jaggi. Tropical support vector machines, 2006.
- [4] L.J. Billera, S.P. Holmes, and K. Vogtmann. Geometry of the space of phylogenetic trees. Adv Appl Math, 27(4):733–767, 2001.
- [5] Louis J. Billera, Susan P. Holmes, and Karen Vogtmann. Geometry of the Space of Phylogenetic Trees. Advances in Applied Mathematics, 27(4):733–767, 2001.
- [6] G. Cohen, S. Gaubert, and J.P. Quadrat. Duality and separation theorems in idempotent semimodules. Linear Algebra Appl., 379:395–422, 2004.
- [7] A. Gavryushkin and A.J. Drummond. The spaceofultrametricphylogenetictrees. Journal ofTheoreticalBiology, 403:197–208, 2016.
- [8] C.J. Jardine, N. Jardine, and R. Sibson. The Structure and Construction of Taxonomic Hierarchies. Mathematical Biosciences, 1(2):173–179, 1967.
- [9] M. Joswig. Essentials of tropical combinatorics, 2017.
- [10] B. Lin, B. Sturmfels, X. Tang, and R. Yoshida. Convexity in tree spaces. SIAM Discrete Math, 3:2015–2038, 2017.
- [11] Bo Lin, Bernd Sturmfels, Xiaoxian Tang, and Ruriko Yoshida. Convexity in Tree Spaces. SIAM Journal on Discrete Mathematics, 31(3):2015–2038, 2017.
- [12] Bo Lin and Ruriko Yoshida. Tropical Fermat–Weber Points. SIAM Journal on Discrete Mathematics, 2018. To appear. Available at arXiv:1604.04674.
- [13] D. Maclagan and B. Sturmfels. Introduction to Tropical Geometry, volume 161 of Graduate Studies in Mathematics. American Mathematical Society, Providence, RI, 2015.
- [14] T. M. W. Nye. Principal components analysis in the space of phylogenetic trees. Ann. Stat., 39(5):2716–2739, 2011.
- [15] Tom M. W. Nye. Principal Components Analysis in the Space of Phylogenetic Trees. The Annals of Statistics, 39(5):2716–2739, 2011.
- [16] R. Page, R. Yoshida, and L. Zhang. Tropical principal component analysis on the space of ultrametrics, 2019.
- [17] C. Semple and M. Steel. Phylogenetics, volume 24 of Oxford Lecture Series in Mathematics and its Applications. Oxford University Press, 2003.
- [18] D. Speyer and B. Sturmfels. Tropical mathematics. Mathematics Magazine, 82:163–173, 2009.
- [19] X. Tang, H. Wang, and R. Yoshida. Tropical support vector machines and its applications to phylogenomics, 2020.
- [20] R. Yoshida, L. Zhang, and X. Zhang. Tropical principal component analysis and its application to phylogenetics. Bulletin of Mathematical Biology, 81:568–597, 2019.
Comments
There are no comments yet.