Hybrid Approaches for our Participation to the n2c2 Challenge on Cohort Selection for Clinical Trials

03/19/2019
by   Xavier Tannier, et al.
0

Objective: Natural language processing can help minimize human intervention in identifying patients meeting eligibility criteria for clinical trials, but there is still a long way to go to obtain a general and systematic approach that is useful for researchers. We describe two methods taking a step in this direction and present their results obtained during the n2c2 challenge on cohort selection for clinical trials. Materials and Methods: The first method is a weakly supervised method using an unlabeled corpus (MIMIC) to build a silver standard, by producing semi-automatically a small and very precise set of rules to detect some samples of positive and negative patients. This silver standard is then used to train a traditional supervised model. The second method is a terminology-based approach where a medical expert selects the appropriate concepts, and a procedure is defined to search the terms and check the structural or temporal constraints. Results: On the n2c2 dataset containing annotated data about 13 selection criteria on 288 patients, we obtained an overall F1-measure of 0.8969, which is the third best result out of 45 participant teams, with no statistically significant difference with the best-ranked team. Discussion: Both approaches obtained very encouraging results and apply to different types of criteria. The weakly supervised method requires explicit descriptions of positive and negative examples in some reports. The terminology-based method is very efficient when medical concepts carry most of the relevant information. Conclusion: It is unlikely that much more annotated data will be soon available for the task of identifying a wide range of patient phenotypes. One must focus on weakly or non-supervised learning methods using both structured and unstructured data and relying on a comprehensive representation of the patients.

READ FULL TEXT
research
07/27/2022

The Leaf Clinical Trials Corpus: a new resource for query generation from clinical trial eligibility criteria

Identifying cohorts of patients based on eligibility criteria such as me...
research
09/06/2016

An Information Extraction Approach to Prescreen Heart Failure Patients for Clinical Trials

To reduce the large amount of time spent screening, identifying, and rec...
research
02/13/2015

How essential are unstructured clinical narratives and information fusion to clinical trial recruitment?

Electronic health records capture patient information using structured c...
research
05/11/2022

Ontology-Based and Weakly Supervised Rare Disease Phenotyping from Clinical Notes

Computational text phenotyping is the practice of identifying patients w...
research
02/26/2019

Developing and Using Special-Purpose Lexicons for Cohort Selection from Clinical Notes

Background and Significance: Selecting cohorts for a clinical trial typi...
research
12/07/2022

An automated approach to extracting positive and negative clinical research results

Failure is common in clinical trials since the successful failures prese...
research
04/13/2023

LeafAI: query generator for clinical cohort discovery rivaling a human programmer

Objective: Identifying study-eligible patients within clinical databases...

Please sign up or login with your details

Forgot password? Click here to reset