SemanticCAP: Chromatin Accessibility Prediction Enhanced by Features Learning from a Language Model

by   Yikang Zhang, et al.

A large number of inorganic and organic compounds are able to bind DNA and form complexes, among which drug-related molecules are important. Chromatin accessibility changes not only directly affects drug-DNA interactions, but also promote or inhibit the expression of critical genes associated with drug resistance by affecting the DNA binding capacity of TFs and transcriptional regulators. However, Biological experimental techniques for measuring it are expensive and time consuming. In recent years, several kinds of computational methods have been proposed to identify accessible regions of the genome. Existing computational models mostly ignore the contextual information of bases in gene sequences. To address these issues, we proposed a new solution named SemanticCAP. It introduces a gene language model which models the context of gene sequences, thus being able to provide an effective representation of a certain site in gene sequences. Basically, we merge the features provided by the gene language model into our chromatin accessibility model. During the process, we designed some methods to make feature fusion smoother. Compared with other systems under public benchmarks, our model proved to have better performance.


page 1

page 2

page 3

page 4


SimpleChrome: Encoding of Combinatorial Effects for Predicting Gene Expression

Due to recent breakthroughs in state-of-the-art DNA sequencing technolog...

Epigenomic language models powered by Cerebras

Large scale self-supervised pre-training of Transformer language models ...

MIRACLE: Multi-task Learning based Interpretable Regulation of Autoimmune Diseases through Common Latent Epigenetics

DNA methylation is a crucial regulator of gene transcription and has bee...

DPCIPI: A pre-trained deep learning model for estimation of cross-immunity between drifted strains of Influenza A/H3N2

Motivation: This study aims to develop a novel model called DNA Pretrain...

Investigating some attributes of periodicity in DNA sequences via semi-Markov modelling

DNA segments and sequences have been studied thoroughly during the past ...

Generative Language Models on Nucleotide Sequences of Human Genes

Language models, primarily transformer-based ones, obtained colossal suc...

Mechanism of Evolution Shared by Gene and Language

We propose a general mechanism for evolution to explain the diversity of...

Please sign up or login with your details

Forgot password? Click here to reset