DeepAI AI Chat
Log In Sign Up

ASAP-SML: An Antibody Sequence Analysis Pipeline Using Statistical Testing and Machine Learning

by   Xinmeng Li, et al.

Antibodies are capable of potently and specifically binding individual antigens and, in some cases, disrupting their functions. The key challenge in generating antibody-based inhibitors is the lack of fundamental information relating sequences of antibodies to their unique properties as inhibitors. We develop a pipeline, Antibody Sequence Analysis Pipeline using Statistical testing and Machine Learning (ASAP-SML), to identify features that distinguish one set of antibody sequences from antibody sequences in a reference set. The pipeline extracts feature fingerprints from sequences. The fingerprints represent germline, CDR canonical structure, isoelectric point and frequent positional motifs. Machine learning and statistical significance testing techniques are applied to antibody sequences and extracted feature fingerprints to identify distinguishing feature values and combinations thereof. To demonstrate how it works, we applied the pipeline on sets of antibody sequences known to bind or inhibit the activities of matrix metalloproteinases (MMPs), a family of zinc-dependent enzymes that promote cancer progression and undesired inflammation under pathological conditions, against reference datasets that do not bind or inhibit MMPs. ASAP-SML identifies features and combinations of feature values found in the MMP-targeting sets that are distinct from those in the reference sets.


page 4

page 12

page 20


Lebesgue Constants For Cantor Sets

We evaluate the values of the Lebesgue constants in polynomial interpola...

Diversifying Design of Nucleic Acid Aptamers Using Unsupervised Machine Learning

Inverse design of short single-stranded RNA and DNA sequences (aptamers)...

Steganalysis of 3D Objects Using Statistics of Local Feature Sets

3D steganalysis aims to identify subtle invisible changes produced in gr...

End-to-End Intelligent Framework for Rockfall Detection

Rockfall detection is a crucial procedure in the field of geology, which...

Dive into Machine Learning Algorithms for Influenza Virus Host Prediction with Hemagglutinin Sequences

Influenza viruses mutate rapidly and can pose a threat to public health,...

Unsupervised Feature Construction for Improving Data Representation and Semantics

Feature-based format is the main data representation format used by mach...