Evaluation of large language models for discovery of gene set function

09/07/2023
by   Mengzhou Hu, et al.
0

Gene set analysis is a mainstay of functional genomics, but it relies on manually curated databases of gene functions that are incomplete and unaware of biological context. Here we evaluate the ability of OpenAI's GPT-4, a Large Language Model (LLM), to develop hypotheses about common gene functions from its embedded biomedical knowledge. We created a GPT-4 pipeline to label gene sets with names that summarize their consensus functions, substantiated by analysis text and citations. Benchmarking against named gene sets in the Gene Ontology, GPT-4 generated very similar names in 50 remaining cases it recovered the name of a more general concept. In gene sets discovered in 'omics data, GPT-4 names were more informative than gene set enrichment, with supporting statements and citations that largely verified in human review. The ability to rapidly synthesize common gene functions positions LLMs as valuable functional genomics assistants.

READ FULL TEXT

page 4

page 5

page 6

page 7

page 8

page 11

page 15

page 23

research
05/21/2018

GSAE: an autoencoder with embedded gene-set nodes for genomics functional characterization

Bioinformatics tools have been developed to interpret gene expression da...
research
01/16/2020

Elements of Scheduling

In the winter of 1976, Alexander Rinnooy Kan and Jan Karel Lenstra defen...
research
05/21/2023

Gene Set Summarization using Large Language Models

Molecular biologists frequently interpret gene lists derived from high-t...
research
12/17/2014

Gene Similarity-based Approaches for Determining Core-Genes of Chloroplasts

In computational biology and bioinformatics, the manner to understand ev...
research
07/22/2022

Redundancy-aware unsupervised ranking based on game theory – application to gene enrichment analysis

Gene set collections are a common ground to study the enrichment of gene...
research
07/17/2022

Natural language processing for clusterization of genes according to their functions

There are hundreds of methods for analysis of data obtained in mRNA-sequ...
research
07/30/2023

Redundancy-aware unsupervised rankings for collections of gene sets

The biological roles of gene sets are used to group them into collection...

Please sign up or login with your details

Forgot password? Click here to reset