Phylo2Vec: a vector representation for binary trees

04/25/2023
by   Matthew J. Penn, et al.
0

Binary phylogenetic trees inferred from biological data are central to understanding the shared evolutionary history of organisms. Inferring the placement of latent nodes in a tree by any optimality criterion (e.g., maximum likelihood) is an NP-hard problem, propelling the development of myriad heuristic approaches. Yet, these heuristics often lack a systematic means of uniformly sampling random trees or effectively exploring a tree space that grows factorially, which are crucial to optimisation problems such as machine learning. Accordingly, we present Phylo2Vec, a new parsimonious representation of a phylogenetic tree. Phylo2Vec maps any binary tree with n leaves to an integer vector of length n. We prove that Phylo2Vec is both well-defined and bijective to the space of phylogenetic trees. The advantages of Phylo2Vec are twofold: i) easy uniform sampling of binary trees and ii) systematic ability to traverse tree space in very large or small jumps. As a proof of concept, we use Phylo2Vec for maximum likelihood inference on five real-world datasets and show that a simple hill climbing-based optimisation efficiently traverses the vastness of tree space from a random to an optimal tree.

READ FULL TEXT
research
03/06/2020

On the Collection of Fringe Subtrees in Random Binary Trees

A fringe subtree of a rooted tree is a subtree consisting of one of the ...
research
08/31/2018

How to Fit a Tree in a Box

We study compact straight-line embeddings of trees. We show that perfect...
research
03/12/2023

The tree reconstruction game: phylogenetic reconstruction using reinforcement learning

We propose a reinforcement-learning algorithm to tackle the challenge of...
research
06/09/2023

Leaping through tree space: continuous phylogenetic inference for rooted and unrooted trees

Phylogenetics is now fundamental in life sciences, providing insights in...
research
03/10/2019

On the convergence of the maximum likelihood estimator for the transition rate under a 2-state symmetric model

Maximum likelihood estimators are used extensively to estimate unknown p...
research
04/29/2019

Ranking top-k trees in tree-based phylogenetic networks

'Tree-based' phylogenetic networks proposed by Francis and Steel have at...
research
05/28/2018

Non-bifurcating phylogenetic tree inference via the adaptive LASSO

Phylogenetic tree inference using deep DNA sequencing is reshaping our u...

Please sign up or login with your details

Forgot password? Click here to reset