MELINDA: A Multimodal Dataset for Biomedical Experiment Method Classification

12/16/2020
by   Te-Lin Wu, et al.
9

We introduce a new dataset, MELINDA, for Multimodal biomEdicaL experImeNt methoD clAssification. The dataset is collected in a fully automated distant supervision manner, where the labels are obtained from an existing curated database, and the actual contents are extracted from papers associated with each of the records in the database. We benchmark various state-of-the-art NLP and computer vision models, including unimodal models which only take either caption texts or images as inputs, and multimodal models. Extensive experiments and analysis show that multimodal models, despite outperforming unimodal ones, still need improvements especially on a less-supervised way of grounding visual concepts with languages, and better transferability to low resource domains. We release our dataset and the benchmarks to facilitate future research in multimodal learning, especially to motivate targeted improvements for applications in scientific domains.

READ FULL TEXT

page 2

page 7

page 10

page 14

research
09/06/2019

Supervised Multimodal Bitransformers for Classifying Images and Text

Self-supervised bidirectional transformer models such as BERT have led t...
research
10/06/2022

Towards Better Semantic Understanding of Mobile Interfaces

Improving the accessibility and automation capabilities of mobile device...
research
12/21/2021

Multimodal Entity Tagging with Multimodal Knowledge Base

To enhance research on multimodal knowledge base and multimodal informat...
research
07/14/2023

A scoping review on multimodal deep learning in biomedical images and texts

Computer-assisted diagnostic and prognostic systems of the future should...
research
10/11/2022

Enriching Biomedical Knowledge for Low-resource Language Through Translation

Biomedical data and benchmarks are highly valuable yet very limited in l...
research
03/02/2021

MultiSubs: A Large-scale Multimodal and Multilingual Dataset

This paper introduces a large-scale multimodal and multilingual dataset ...
research
04/26/2022

Symlink: A New Dataset for Scientific Symbol-Description Linking

Mathematical symbols and descriptions appear in various forms across doc...

Please sign up or login with your details

Forgot password? Click here to reset