PDBMine: A Reformulation of the Protein Data Bank to Facilitate Structural Data Mining

11/19/2019
by   Casey A Cole, et al.
0

Large scale initiatives such as the Human Genome Project, Structural Genomics, and individual research teams have provided large deposits of genomic and proteomic data. The transfer of data to knowledge has become one of the existing challenges, which is a consequence of capturing data in databases that are optimally designed for archiving and not mining. In this research, we have targeted the Protein Databank (PDB) and demonstrated a transformation of its content, named PDBMine, that reduces storage space by an order of magnitude, and allows for powerful mining in relation to the topic of protein structure determination. We have demonstrated the utility of PDBMine in exploring the prevalence of dimeric and trimeric amino acid sequences and provided a mechanism of predicting protein structure.

READ FULL TEXT

page 1

page 3

page 4

research
08/10/2023

OpenProteinSet: Training data for structural biology at scale

Multiple sequence alignments (MSAs) of proteins encode rich biological i...
research
06/27/2022

ProGen2: Exploring the Boundaries of Protein Language Models

Attention-based models trained on protein sequences have demonstrated in...
research
01/25/2016

PGR: A Graph Repository of Protein 3D-Structures

Graph theory and graph mining constitute rich fields of computational te...
research
12/12/2020

TALI: Protein Structure Alignment Using Backbone Torsion Angles

This article introduces a novel protein structure alignment method (name...
research
11/20/2012

A Brief Review of Data Mining Application Involving Protein Sequence Classification

Data mining techniques have been used by researchers for analyzing prote...
research
07/31/2020

Process of Efficiently Parallelizing a Protein Structure Determination Algorithm

Computational protein structure determination involves optimization in a...
research
08/18/2022

Learned Indexing in Proteins: Extended Work on Substituting Complex Distance Calculations with Embedding and Clustering Techniques

Despite the constant evolution of similarity searching research, it cont...

Please sign up or login with your details

Forgot password? Click here to reset