Text2Cohort: Democratizing the NCI Imaging Data Commons with Natural Language Cohort Discovery

05/12/2023
by   Pranav Kulkarni, et al.
0

The Imaging Data Commons (IDC) is a cloud-based database that provides researchers with open access to cancer imaging data, with the goal of facilitating collaboration in medical imaging research. However, querying the IDC database for cohort discovery and access to imaging data has a significant learning curve for researchers due to its complex nature. We developed Text2Cohort, a large language model (LLM) based toolkit to facilitate user-friendly and intuitive natural language cohort discovery in the IDC. Text2Cohorts translates user input into IDC database queries using prompt engineering and autocorrection and returns the query's response to the user. Autocorrection resolves errors in queries by passing the errors back to the model for interpretation and correction. We evaluate Text2Cohort on 50 natural language user inputs ranging from information extraction to cohort discovery. The resulting queries and outputs were verified by two computer scientists to measure Text2Cohort's accuracy and F1 score. Text2Cohort successfully generated queries and their responses with an 88 it failed to generate queries for 6/50 (12 semantic errors. Our results indicate that Text2Cohort succeeded at generating queries with correct responses, but occasionally failed due to a lack of understanding of the data schema. Despite these shortcomings, Text2Cohort demonstrates the utility of LLMs to enable researchers to discover and curate cohorts using data hosted on IDC with high levels of accuracy using natural language in a more intuitive and user-friendly way.

READ FULL TEXT

page 5

page 11

page 12

page 13

page 14

page 15

page 16

research
08/30/2023

Text-to-OverpassQL: A Natural Language Interface for Complex Geodata Querying of OpenStreetMap

We present Text-to-OverpassQL, a task designed to facilitate a natural l...
research
08/13/2023

MDB: Interactively Querying Datasets and Models

As models are trained and deployed, developers need to be able to system...
research
04/14/2021

Translating synthetic natural language to database queries: a polyglot deep learning framework

The number of databases as well as their size and complexity is increasi...
research
10/13/2022

Searching for Better Database Queries in the Outputs of Semantic Parsers

The task of generating a database query from a question in natural langu...
research
02/04/2023

Chat2VIS: Generating Data Visualisations via Natural Language using ChatGPT, Codex and GPT-3 Large Language Models

The field of data visualisation has long aimed to devise solutions for g...
research
10/14/2020

Learning Improvised Chatbots from Adversarial Modifications of Natural Language Feedback

The ubiquitous nature of chatbots and their interaction with users gener...
research
07/16/2019

Conversational Help for Task Completion and Feature Discovery in Personal Assistants

Intelligent Personal Assistants (IPAs) have become widely popular in rec...

Please sign up or login with your details

Forgot password? Click here to reset