Large-scale data extraction from the UNOS organ donor documents

08/30/2023
by   Marek Rychlik, et al.
0

The scope of our study is all UNOS data of the USA organ donors since 2008. The data is not analyzable in a large scale in the past because it was captured in PDF documents known as "Attachments", whereby every donor is represented by dozens of PDF documents in heterogenous formats. To make the data analyzable, one needs to convert the content inside these PDFs to an analyzable data format, such as a standard SQL database. In this paper we will focus on 2022 UNOS data comprised of ≈ 400,000 PDF documents spanning millions of pages. The totality of UNOS data covers 15 years (2008–20022) and our results will be quickly extended to the entire data. Our method captures a portion of the data in DCD flowsheets, kidney perfusion data, and data captured during patient hospital stay (e.g. vital signs, ventilator settings, etc.). The current paper assumes that the reader is familiar with the content of the UNOS data. The overview of the types of data and challenges they present is a subject of another paper. Here we focus on demonstrating that the goal of building a comprehensive, analyzable database from UNOS documents is an attainable task, and we provide an overview of our methodology. The project resulted in datasets by far larger than previously available even in this preliminary phase.

READ FULL TEXT
research
05/15/2018

Corpus Conversion Service: A machine learning platform to ingest documents at scale [Poster abstract]

Over the past few decades, the amount of scientific articles and technic...
research
05/24/2018

Corpus Conversion Service: A Machine Learning Platform to Ingest Documents at Scale

Over the past few decades, the amount of scientific articles and technic...
research
11/04/2020

Handwriting Classification for the Analysis of Art-Historical Documents

Digitized archives contain and preserve the knowledge of generations of ...
research
12/04/2019

A Method of Fluorescent Fibers Detection on Identity Documents under Ultraviolet Light

In this work we consider the problem of the fluorescent security fibers ...
research
10/23/2018

Towards a Ranking Model for Semantic Layers over Digital Archives

Archived collections of documents (like newspaper archives) serve as imp...
research
03/17/2018

Experiments with Neural Networks for Small and Large Scale Authorship Verification

We propose two models for a special case of authorship verification prob...
research
08/02/2023

A Large-Scale Study of Phishing PDF Documents

Phishing PDFs are malicious PDF documents that do not embed malware but ...

Please sign up or login with your details

Forgot password? Click here to reset