DeepAI AI Chat
Log In Sign Up

New Arabic Medical Dataset for Diseases Classification

by   Jaafar Hammoud, et al.

The Arabic language suffers from a great shortage of datasets suitable for training deep learning models, and the existing ones include general non-specialized classifications. In this work, we introduce a new Arab medical dataset, which includes two thousand medical documents collected from several Arabic medical websites, in addition to the Arab Medical Encyclopedia. The dataset was built for the task of classifying texts and includes 10 classes (Blood, Bone, Cardiovascular, Ear, Endocrine, Eye, Gastrointestinal, Immune, Liver and Nephrological) diseases. Experiments on the dataset were performed by fine-tuning three pre-trained models: BERT from Google, Arabert that based on BERT with large Arabic corpus, and AraBioNER that based on Arabert with Arabic medical corpus.


page 1

page 2

page 3

page 4


ArabGlossBERT: Fine-Tuning BERT on Context-Gloss Pairs for WSD

Using pre-trained transformer models such as BERT has proven to be effec...

Leveraging BERT Language Model for Arabic Long Document Classification

Given the number of Arabic speakers worldwide and the notably large amou...

Supporting Undotted Arabic with Pre-trained Language Models

We observe a recent behaviour on social media, in which users intentiona...

New Results for the Text Recognition of Arabic Maghribī Manuscripts – Managing an Under-resourced Script

HTR models development has become a conventional step for digital humani...

Optimizing Deep Learning Model Parameters with the Bees Algorithm for Improved Medical Text Classification

This paper introduces a novel mechanism to obtain the optimal parameters...