Natural language processing for clusterization of genes according to their functions

07/17/2022
by   Vladislav Dordiuk, et al.
0

There are hundreds of methods for analysis of data obtained in mRNA-sequencing. The most of them are focused on small number of genes. In this study, we propose an approach that reduces the analysis of several thousand genes to analysis of several clusters. The list of genes is enriched with information from open databases. Then, the descriptions are encoded as vectors using the pretrained language model (BERT) and some text processing approaches. The encoded gene function pass through the dimensionality reduction and clusterization. Aiming to find the most efficient pipeline, 180 cases of pipeline with different methods in the major pipeline steps were analyzed. The performance was evaluated with clusterization indexes and expert review of the results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/14/2023

Features matching using natural language processing

The feature matching is a basic step in matching different datasets. Thi...
research
05/19/2020

Table Search Using a Deep Contextualized Language Model

Pretrained contextualized language models such as BERT have achieved imp...
research
09/27/2022

A general-purpose material property data extraction pipeline from large polymer corpora using Natural Language Processing

The ever-increasing number of materials science articles makes it hard t...
research
09/25/2019

Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models

In natural language processing, it has been observed recently that gener...
research
09/07/2023

Evaluation of large language models for discovery of gene set function

Gene set analysis is a mainstay of functional genomics, but it relies on...
research
08/08/2023

CLASSLA-Stanza: The Next Step for Linguistic Processing of South Slavic Languages

We present CLASSLA-Stanza, a pipeline for automatic linguistic annotatio...
research
03/11/2020

From Algebraic Word Problem to Program: A Formalized Approach

In this paper, we propose a pipeline to convert grade school level algeb...

Please sign up or login with your details

Forgot password? Click here to reset