BIMCV COVID-19+: a large annotated dataset of RX and CT images from COVID-19 patients

In this work we describe BIMCV-COVID-19+ dataset, a large dataset from Medical Imaging Databank in Valencian Region Medical ImageBank (BIMCV) with chest X-ray images CXR (CR, DX) and computed tomography (CT) imaging of COVID-19+ patients along with their radiological findings and locations, pathologies, radiological reports (in Spanish), DICOM metadata, Polymerase chain reaction (PCR), Immunoglobulin G (IgG) and Immunoglobulin M (IgM) diagnostic antibody tests. The findings are mapped onto standard Unified Medical Language System (UMLS) terminology and they cover a wide spectrum of thoracic entities, contrasting with the much more reduced number of entities annotated in previous datasets. Images are stored in high resolution and entities are localized with anatomical labels in a Medical Imaging Data Structure (MIDS) format. In addition, 10 images were annotated by a team of radiologists to include semantic segmentation of radiological findings. This first iteration of the database includes 1,380 CX, 885 DX and 163 CT studies from 1,311 COVID-19+ patients. To the best of our knowledge, this is the largest COVID-19+ dataset of images available in an open format. The dataset can be downloaded from


