Modelling phylogeny in 16S rRNA gene sequencing datasets using string kernels

10/14/2022
by   Jonathan Ish-Horowicz, et al.
0

Motivation: Bacterial community composition is commonly quantified using 16S rRNA (ribosomal ribonucleic acid) gene sequencing. One of the defining characteristics of these datasets is the phylogenetic relationships that exist between variables. Here, we demonstrate the utility of modelling phylogenetic relationships in two tasks (the two sample test and host trait prediction) using a novel application of string kernels. Results: We show via simulation studies that a kernel two-sample test using string kernels is sensitive to the phylogenetic scale of the difference between the two populations and is more powerful than tests using kernels based on popular microbial distance metrics. We also demonstrate how Gaussian process modelling can be used to infer the distribution of bacterial-host effects across the phylogenetic tree using simulations and two real host trait prediction tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/07/2015

String Gaussian Process Kernels

We introduce a new class of nonstationary kernels, which we derive as co...
research
02/18/2018

Scalable Alignment Kernels via Space-Efficient Feature Maps

String kernels are attractive data analysis tools for analyzing string d...
research
04/22/2022

Gene Function Prediction with Gene Interaction Networks: A Context Graph Kernel Approach

Predicting gene functions is a challenge for biologists in the post geno...
research
09/22/2015

Graph Kernels exploiting Weisfeiler-Lehman Graph Isomorphism Test Extensions

In this paper we present a novel graph kernel framework inspired the by ...
research
08/25/2020

A Kernel Two-Sample Test for Functional Data

We propose a nonparametric two-sample test procedure based on Maximum Me...
research
10/02/2020

BOSS: Bayesian Optimization over String Spaces

This article develops a Bayesian optimization (BO) method which acts dir...
research
05/28/2021

Detecting the hosts of bacteriophages using GCN-based semi-supervised learning

Motivation: Bacteriophages (aka phages) are viruses that infect bacteria...

Please sign up or login with your details

Forgot password? Click here to reset