Benchmark datasets driving artificial intelligence development fail to capture the needs of medical professionals

01/18/2022
by   Kathrin Blagec, et al.
0

Publicly accessible benchmarks that allow for assessing and comparing model performances are important drivers of progress in artificial intelligence (AI). While recent advances in AI capabilities hold the potential to transform medical practice by assisting and augmenting the cognitive processes of healthcare professionals, the coverage of clinically relevant tasks by AI benchmarks is largely unclear. Furthermore, there is a lack of systematized meta-information that allows clinical AI researchers to quickly determine accessibility, scope, content and other characteristics of datasets and benchmark datasets relevant to the clinical domain. To address these issues, we curated and released a comprehensive catalogue of datasets and benchmarks pertaining to the broad domain of clinical and biomedical natural language processing (NLP), based on a systematic review of literature and online resources. A total of 450 NLP datasets were manually systematized and annotated with rich metadata, such as targeted tasks, clinical applicability, data types, performance metrics, accessibility and licensing information, and availability of data splits. We then compared tasks covered by AI benchmark datasets with relevant tasks that medical practitioners reported as highly desirable targets for automation in a previous empirical study. Our analysis indicates that AI benchmarks of direct clinical relevance are scarce and fail to cover most work activities that clinicians want to see addressed. In particular, tasks associated with routine documentation and patient data administration workflows are not represented despite significant associated workloads. Thus, currently available AI benchmarks are improperly aligned with desired targets for AI automation in clinical settings, and novel benchmarks should be created to fill these gaps.

READ FULL TEXT

page 13

page 15

research
05/08/2019

A new direction to promote the implementation of artificial intelligence in natural clinical settings

Artificial intelligence (AI) researchers claim that they have made great...
research
10/04/2021

A curated, ontology-based, large-scale knowledge graph of artificial intelligence tasks and benchmarks

Research in artificial intelligence (AI) is addressing a growing number ...
research
08/27/2021

Deep learning models are not robust against noise in clinical text

Artificial Intelligence (AI) systems are attracting increasing interest ...
research
06/03/2023

ACI-BENCH: a Novel Ambient Clinical Intelligence Dataset for Benchmarking Automatic Visit Note Generation

Recent immense breakthroughs in generative models such as in GPT4 have p...
research
04/09/2021

Comprehensive systematic review into combinations of artificial intelligence, human factors, and automation

Artificial intelligence (AI)-based models used to improve different fiel...
research
06/02/2023

Can LLMs like GPT-4 outperform traditional AI tools in dementia diagnosis? Maybe, but not today

Recent investigations show that large language models (LLMs), specifical...

Please sign up or login with your details

Forgot password? Click here to reset