ImmunoLingo: Linguistics-based formalization of the antibody language

09/26/2022
by   Mai Ha Vu, et al.
0

Apparent parallels between natural language and biological sequence have led to a recent surge in the application of deep language models (LMs) to the analysis of antibody and other biological sequences. However, a lack of a rigorous linguistic formalization of biological sequence languages, which would define basic components, such as lexicon (i.e., the discrete units of the language) and grammar (i.e., the rules that link sequence well-formedness, structure, and meaning) has led to largely domain-unspecific applications of LMs, which do not take into account the underlying structure of the biological sequences studied. A linguistic formalization, on the other hand, establishes linguistically-informed and thus domain-adapted components for LM applications. It would facilitate a better understanding of how differences and similarities between natural language and biological sequences influence the quality of LMs, which is crucial for the design of interpretable models with extractable sequence-functions relationship rules, such as the ones underlying the antibody specificity prediction problem. Deciphering the rules of antibody specificity is crucial to accelerating rational and in silico biotherapeutic drug design. Here, we formalize the properties of the antibody language and thereby establish not only a foundation for the application of linguistic tools in adaptive immune receptor analysis but also for the systematic immunolinguistic studies of immune receptor specificity in general.

READ FULL TEXT

page 2

page 4

page 5

page 6

page 12

research
07/03/2022

Advancing protein language models with linguistics: a roadmap for improved interpretability

Deep neural-network-based language models (LMs) are increasingly applied...
research
05/26/2020

Guiding Symbolic Natural Language Grammar Induction via Transformer-Based Sequence Probabilities

A novel approach to automated learning of syntactic rules governing natu...
research
12/01/2020

Statistical patterns of word frequency suggesting the probabilistic nature of human languages

Traditional linguistic theories have largely regard language as a formal...
research
04/06/2023

Biological Sequence Kernels with Guaranteed Flexibility

Applying machine learning to biological sequences - DNA, RNA and protein...
research
02/18/2016

Corpus analysis without prior linguistic knowledge - unsupervised mining of phrases and subphrase structure

When looking at the structure of natural language, "phrases" and "words"...
research
09/20/2023

Exploring the Relationship between LLM Hallucinations and Prompt Linguistic Nuances: Readability, Formality, and Concreteness

As Large Language Models (LLMs) have advanced, they have brought forth n...

Please sign up or login with your details

Forgot password? Click here to reset