Anonymization of Whole Slide Images in Histopathology for Research and Education

11/11/2022
by   Tom Bisson, et al.
0

Objective: The exchange of health-related data is subject to regional laws and regulations, such as the General Data Protection Regulation (GDPR) in the EU or the Health Insurance Portability and Accountability Act (HIPAA) in the United States, resulting in non-trivial challenges for researchers and educators when working with these data. In pathology, the digitization of diagnostic tissue samples inevitably generates identifying data that can consist of sensitive but also acquisition-related information stored in vendor-specific file formats. Distribution and off-clinical use of these Whole Slide Images (WSI) is usually done in these formats, as an industry-wide standardization such as DICOM is yet only tentatively adopted and slide scanner vendors currently do not provide anonymization functionality. Methods: We developed a guideline for the proper handling of histopathological image data particularly for research and education with regard to the GDPR. In this context, we evaluated existing anonymization methods and examined proprietary format specifications to identify all sensitive information for the most common WSI formats. This work results in a software library that enables GDPR-compliant anonymization of WSIs while preserving the native formats. Results: Based on the analysis of proprietary formats, all occurrences of sensitive information were identified for file formats frequently used in clinical routine, and finally, an open-source programming library with an executable CLI-tool and wrappers for different programming languages was developed. Conclusions: Our analysis showed that there is no straightforward software solution to anonymize WSIs in a GDPR-compliant way while maintaining the data format. We closed this gap with our extensible open-source library that works instantaneously and offline.

READ FULL TEXT

page 1

page 6

research
08/24/2020

ImarisWriter: Open Source Software for Storage of Large Images in Blockwise Multi-Resolution Format

We publish as open source a high performance file writer library to stor...
research
07/23/2014

scikit-image: Image processing in Python

scikit-image is an image processing library that implements algorithms a...
research
12/04/2018

Using Binary File Format Description Languages for Documenting, Parsing, and Verifying Raw Data in TAIGA Experiment

The paper is devoted to the issues of raw binary data documenting, parsi...
research
07/18/2021

Accessing United States Bulk Patent Data with patentpy and patentr

The United States Patent and Trademark Office (USPTO) provides publicly ...
research
11/14/2021

Unicode at Gigabytes per Second

We often represent text using Unicode formats (UTF-8 and UTF-16). The UT...
research
03/18/2020

Supporting Interoperability Between Open-Source Search Engines with the Common Index File Format

There exists a natural tension between encouraging a diverse ecosystem o...

Please sign up or login with your details

Forgot password? Click here to reset