SemanticCAP: Chromatin Accessibility Prediction Enhanced by Features Learning from a Language Model

04/05/2022
by   Yikang Zhang, et al.
0

A large number of inorganic and organic compounds are able to bind DNA and form complexes, among which drug-related molecules are important. Chromatin accessibility changes not only directly affects drug-DNA interactions, but also promote or inhibit the expression of critical genes associated with drug resistance by affecting the DNA binding capacity of TFs and transcriptional regulators. However, Biological experimental techniques for measuring it are expensive and time consuming. In recent years, several kinds of computational methods have been proposed to identify accessible regions of the genome. Existing computational models mostly ignore the contextual information of bases in gene sequences. To address these issues, we proposed a new solution named SemanticCAP. It introduces a gene language model which models the context of gene sequences, thus being able to provide an effective representation of a certain site in gene sequences. Basically, we merge the features provided by the gene language model into our chromatin accessibility model. During the process, we designed some methods to make feature fusion smoother. Compared with other systems under public benchmarks, our model proved to have better performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/15/2020

SimpleChrome: Encoding of Combinatorial Effects for Predicting Gene Expression

Due to recent breakthroughs in state-of-the-art DNA sequencing technolog...
research
12/14/2021

Epigenomic language models powered by Cerebras

Large scale self-supervised pre-training of Transformer language models ...
research
06/24/2023

MIRACLE: Multi-task Learning based Interpretable Regulation of Autoimmune Diseases through Common Latent Epigenetics

DNA methylation is a crucial regulator of gene transcription and has bee...
research
02/02/2023

DPCIPI: A pre-trained deep learning model for estimation of cross-immunity between drifted strains of Influenza A/H3N2

Motivation: This study aims to develop a novel model called DNA Pretrain...
research
07/06/2019

Investigating some attributes of periodicity in DNA sequences via semi-Markov modelling

DNA segments and sequences have been studied thoroughly during the past ...
research
07/20/2023

Generative Language Models on Nucleotide Sequences of Human Genes

Language models, primarily transformer-based ones, obtained colossal suc...
research
12/28/2020

Mechanism of Evolution Shared by Gene and Language

We propose a general mechanism for evolution to explain the diversity of...

Please sign up or login with your details

Forgot password? Click here to reset