Comprehensive and Sensitive Proteogenomics Data Analysis Strategy based on Complementary Multi-Stage Database Search

11/24/2020
by   Inamul Hasan Madar, et al.
0

Proteogenomics provide opportunities for proteomic validation of gene structures, genomic alterations and functional relevance of novel findings obtained from genomic data analysis. However, for effective proteogenomic data integration, an extensive proteome profiling, approaching the gene coverage of genomics data, is critical. Here we developed a multi-stage database search method for comprehensive proteomics data analysis to complement whole transcriptome sequencing data. The method utilizes two complementary database search engines, MS-GF+ and MODa/MODi, in tandem. The MS/MS data were first subjected to MS-GF+ database search (1st stage search) and the unidentified MS/MS data from the 1st stage search were subsequently analyzed with the combined use of MODa and MODi (2nd stage search), tools for blind and unrestrictive modification search, respectively. When combined with mPE-MMR, a tool for accurate and extensive precursor masses assignments to MS/MS data, the multi-stage method exhibited a significant increase in identified peptides, modified peptides, mutated peptides, identified proteins and coding genes, compared to a conventional single-stage method. With the increased coverage of proteome profile, the genomics and proteomics data obtained from the same gastric tumor tissue were effectively integrated as evidenced by proBAMsuite analysis results, which showed abundant examples of peptides uniquely mapped to genomic locations as well as increased coverages of exon-exon junctions and coding regions with the multi-stage search method.

READ FULL TEXT

page 1

page 5

page 7

page 8

research
02/03/2021

HiCOPS: High Performance Computing Framework for Tera-Scale Database Search of Mass Spectrometry based Omics Data

Database-search algorithms, that deduce peptides from Mass Spectrometry ...
research
08/26/2023

Enhancement of database access performance by improving data consistency in a non-relational database system (NoSQL)

This study aims to enhance data consistency in NoSQL databases, traditio...
research
03/25/2023

Thistle: A Vector Database in Rust

We present Thistle, a fully functional vector database. Thistle is an en...
research
05/03/2022

Multi-strategy ensemble binary hunger games search for feature selection

Feature selection is a crucial preprocessing step in the sphere of machi...
research
02/26/2020

Two-stage data-analysis method for total-reflection high-energy positron diffraction (TRHEPD)

A two-stage data-analysis method is proposed for total-reflection high-e...
research
10/11/2018

Multi-Strategy Coevolving Aging Particle Optimization

We propose Multi-Strategy Coevolving Aging Particles (MS-CAP), a novel p...
research
04/27/2012

Magic Sets for Disjunctive Datalog Programs

In this paper, a new technique for the optimization of (partially) bound...

Please sign up or login with your details

Forgot password? Click here to reset