covLLM: Large Language Models for COVID-19 Biomedical Literature

by   Yousuf A. Khan, et al.

The COVID-19 pandemic led to 1.1 million deaths in the United States, despite the explosion of coronavirus research. These new findings are slow to translate to clinical interventions, leading to poorer patient outcomes and unnecessary deaths. One reason is that clinicians, overwhelmed by patients, struggle to keep pace with the rate of new coronavirus literature. A potential solution is developing a tool for evaluating coronavirus literature using large language models (LLMs) – neural networks that are deployed for natural language processing. LLMs can be used to summarize and extract user-specified information. The greater availability and advancement of LLMs and pre-processed coronavirus literature databases provide the opportunity to assist clinicians in evaluating coronavirus literature through a coronavirus literature specific LLM (covLLM), a tool that directly takes an inputted research article and a user query to return an answer. Using the COVID-19 Open Research Dataset (CORD-19), we produced two datasets: (1) synCovid, which uses a combination of handwritten prompts and synthetic prompts generated using OpenAI, and (2) real abstracts, which contains abstract and title pairs. covLLM was trained with LLaMA 7B as a baseline model to produce three models trained on (1) the Alpaca and synCovid datasets, (2) the synCovid dataset, and (3) the synCovid and real abstract datasets. These models were evaluated by two human evaluators and ChatGPT. Results demonstrate that training covLLM on the synCovid and abstract pairs datasets performs competitively with ChatGPT and outperforms covLLM trained primarily using the Alpaca dataset.


page 6

page 7

page 8

page 10

page 11


Identifying Radiological Findings Related to COVID-19 from Medical Literature

Coronavirus disease 2019 (COVID-19) has infected more than one million i...

Large Language Models Are Not Abstract Reasoners

Large Language Models have shown tremendous performance on a large varie...

CORD19STS: COVID-19 Semantic Textual Similarity Dataset

In order to combat the COVID-19 pandemic, society can benefit from vario...

The Capability of Large Language Models to Measure Psychiatric Functioning

The current work investigates the capability of Large language models (L...

Where Was COVID-19 First Discovered? Designing a Question-Answering System for Pandemic Situations

The COVID-19 pandemic is accompanied by a massive "infodemic" that makes...

Beyond original Research Articles Categorization via NLP

This work proposes a novel approach to text categorization – for unknown...

Please sign up or login with your details

Forgot password? Click here to reset