Multimodal Multitask Representation Learning for Pathology Biobank Metadata Prediction

by   Wei-Hung Weng, et al.

Metadata are general characteristics of the data in a well-curated and condensed format, and have been proven to be useful for decision making, knowledge discovery, and also heterogeneous data organization of biobank. Among all data types in the biobank, pathology is the key component of the biobank and also serves as the gold standard of diagnosis. To maximize the utility of biobank and allow the rapid progress of biomedical science, it is essential to organize the data with well-populated pathology metadata. However, manual annotation of such information is tedious and time-consuming. In the study, we develop a multimodal multitask learning framework to predict four major slide-level metadata of pathology images. The framework learns generalizable representations across tissue slides, pathology reports, and case-level structured data. We demonstrate improved performance across all four tasks with the proposed method compared to a single modal single task baseline on two test sets, one external test set from a distinct data source (TCGA) and one internal held-out test set (TTH). In the test sets, the performance improvements on the averaged area under receiver operating characteristic curve across the four tasks are 16.48 metadata prediction system may be adopted to mitigate the effort of expert annotation and ultimately accelerate the data-driven research by better utilization of the pathology biobank.


page 1

page 2

page 3

page 4


Metadata Improves Segmentation Through Multitasking Elicitation

Metainformation is a common companion to biomedical images. However, thi...

How to structure citations data and bibliographic metadata in the OpenCitations accepted format

The OpenCitations organization is working on ingesting citation data and...

MASR: Metadata Aware Speech Representation

In the recent years, speech representation learning is constructed prima...

Mutlitask Learning for Cross-Lingual Transfer of Semantic Dependencies

We describe a method for developing broad-coverage semantic dependency p...

Massively Multitask Networks for Drug Discovery

Massively multitask neural architectures provide a learning framework fo...

Imagination improves Multimodal Translation

We decompose multimodal translation into two sub-tasks: learning to tran...

StreamingHub: Interactive Stream Analysis Workflows

Reusable data/code and reproducible analyses are foundational to quality...

Please sign up or login with your details

Forgot password? Click here to reset