ASAP-SML: An Antibody Sequence Analysis Pipeline Using Statistical Testing and Machine Learning

03/08/2020
by   Xinmeng Li, et al.
0

Antibodies are capable of potently and specifically binding individual antigens and, in some cases, disrupting their functions. The key challenge in generating antibody-based inhibitors is the lack of fundamental information relating sequences of antibodies to their unique properties as inhibitors. We develop a pipeline, Antibody Sequence Analysis Pipeline using Statistical testing and Machine Learning (ASAP-SML), to identify features that distinguish one set of antibody sequences from antibody sequences in a reference set. The pipeline extracts feature fingerprints from sequences. The fingerprints represent germline, CDR canonical structure, isoelectric point and frequent positional motifs. Machine learning and statistical significance testing techniques are applied to antibody sequences and extracted feature fingerprints to identify distinguishing feature values and combinations thereof. To demonstrate how it works, we applied the pipeline on sets of antibody sequences known to bind or inhibit the activities of matrix metalloproteinases (MMPs), a family of zinc-dependent enzymes that promote cancer progression and undesired inflammation under pathological conditions, against reference datasets that do not bind or inhibit MMPs. ASAP-SML identifies features and combinations of feature values found in the MMP-targeting sets that are distinct from those in the reference sets.

READ FULL TEXT

page 4

page 12

page 20

research
11/04/2021

Lebesgue Constants For Cantor Sets

We evaluate the values of the Lebesgue constants in polynomial interpola...
research
05/03/2023

A Statistical Exploration of Text Partition Into Constituents: The Case of the Priestly Source in the Books of Genesis and Exodus

We present a pipeline for a statistical textual exploration, offering a ...
research
08/10/2022

Diversifying Design of Nucleic Acid Aptamers Using Unsupervised Machine Learning

Inverse design of short single-stranded RNA and DNA sequences (aptamers)...
research
02/23/2017

Steganalysis of 3D Objects Using Statistics of Local Feature Sets

3D steganalysis aims to identify subtle invisible changes produced in gr...
research
07/28/2022

Dive into Machine Learning Algorithms for Influenza Virus Host Prediction with Hemagglutinin Sequences

Influenza viruses mutate rapidly and can pose a threat to public health,...
research
12/17/2015

Unsupervised Feature Construction for Improving Data Representation and Semantics

Feature-based format is the main data representation format used by mach...

Please sign up or login with your details

Forgot password? Click here to reset