Deep Learning for Reference-Free Geolocation for Poplar Trees

01/31/2023
by   Cai W. John, et al.
0

A core task in precision agriculture is the identification of climatic and ecological conditions that are advantageous for a given crop. The most succinct approach is geolocation, which is concerned with locating the native region of a given sample based on its genetic makeup. Here, we investigate genomic geolocation of Populus trichocarpa, or poplar, which has been identified by the US Department of Energy as a fast-rotation biofuel crop to be harvested nationwide. In particular, we approach geolocation from a reference-free perspective, circumventing the need for compute-intensive processes such as variant calling and alignment. Our model, MashNet, predicts latitude and longitude for poplar trees from randomly-sampled, unaligned sequence fragments. We show that our model performs comparably to Locator, a state-of-the-art method based on aligned whole-genome sequence data. MashNet achieves an error of 34.0 km^2 compared to Locator's 22.1 km^2. MashNet allows growers to quickly and efficiently identify natural varieties that will be most productive in their growth environment based on genotype. This paper explores geolocation for precision agriculture while providing a framework and data source for further development by the machine learning community.

READ FULL TEXT
research
05/12/2022

SeGraM: A Universal Hardware Accelerator for Genomic Sequence-to-Graph and Sequence-to-Sequence Mapping

A critical step of genome sequence analysis is the mapping of sequenced ...
research
05/04/2018

Detecting Mutations by eBWT

In this paper we develop a theory describing how the extended Burrows-Wh...
research
05/10/2019

Alignment- and reference-free phylogenomics with colored de-Bruijn graphs

We present a new whole-genome based approach to infer large-scale phylog...
research
02/05/2020

FPGA Acceleration of Sequence Alignment: A Survey

Genomics is changing our understanding of humans, evolution, diseases, a...
research
05/01/2020

Computing Absolute Free Energy with Deep Generative Models

Fast and accurate evaluation of free energy has broad applications from ...
research
05/26/2015

Large-scale Machine Learning for Metagenomics Sequence Classification

Metagenomics characterizes the taxonomic diversity of microbial communit...

Please sign up or login with your details

Forgot password? Click here to reset