OMXWare, A Cloud-Based Platform for Studying Microbial Life at Scale

11/05/2019
by   Edward E. Seabolt, et al.
0

The rapid growth in biological sequence data is revolutionizing our understanding of genotypic diversity and challenging conventional approaches to informatics. Due to increasing availability of genomic data, traditional bioinformatic tools require substantial computational time and creation of ever larger indices each time a researcher seeks to gain insight from the data. To address these challenges, we pre-compute important relationships between biological entities and capture this information in a relational database.The database can be queried across millions of entities and returns results in a fraction of the time required by traditional methods. In this paper, we describeOMXWare, a comprehensive database relating genotype to phenotype for bacterial life. Continually updated,OMXWare today contains data derived from 200,000 curated, self-consistently assembled genomes. The database stores functional data for over 68 million genes, 52 million proteins, and 239 million domains with associated biological activity annotations from GeneOntology, KEGG, MetaCyc, and Reactome. OMXWare maps connections between each biological entity including the originating genome, gene, protein, and protein domain. Various microbial studies, from infectious disease to environmental health, can benefit from the rich data and relationships within OMXWare. We describe the data selection, the pipeline to create and update OMXWare, and developer tools (Python SDK and Rest APIs) which allow researchers to efficiently study microbial life at scale.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/17/2021

Modeling Protein Using Large-scale Pretrain Language Model

Protein is linked to almost every life process. Therefore, analyzing the...
research
11/10/2020

Biomedical Information Extraction for Disease Gene Prioritization

We introduce a biomedical information extraction (IE) pipeline that extr...
research
01/17/2019

BioSEAL: In-Memory Biological Sequence Alignment Accelerator for Large-Scale Genomic Data

Genome sequences contain hundreds of millions of DNA base pairs. Finding...
research
08/20/2020

Assigning function to protein-protein interactions: a weakly supervised BioBERT based approach using PubMed abstracts

Motivation: Protein-protein interactions (PPI) are critical to the funct...
research
12/12/2022

Graph algorithms for predicting subcellular localization at the pathway level

Protein subcellular localization is an important factor in normal cellul...
research
09/15/2015

Macau: Scalable Bayesian Multi-relational Factorization with Side Information using MCMC

We propose Macau, a powerful and flexible Bayesian factorization method ...
research
03/16/2020

Health State Estimation

Life's most valuable asset is health. Continuously understanding the sta...

Please sign up or login with your details

Forgot password? Click here to reset