PMC-Patients: A Large-scale Dataset of Patient Notes and Relations Extracted from Case Reports in PubMed Central

02/28/2022
by   Zhengyun Zhao, et al.
13

We present PMC-Patients, a dataset consisting of 167k patient notes with 3.1M relevant article annotations and 293k similar patient annotations. The patient notes are extracted by identifying certain sections from case reports in PubMed Central, and those with at least CC BY-NC-SA license are re-distributed. Patient-article relevance and patient-patient similarity are defined by citation relationships in PubMed. We also perform four tasks with PMC-Patients to demonstrate its utility, including Patient Note Recognition (PNR), Patient-Patient Similarity (PPS), Patient-Patient Retrieval (PPR), and Patient-Article Retrieval (PAR). In summary, PMC-Patients provides the largest-scale patient notes with high quality, diverse conditions, easy access, and rich annotations.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset