DictaBERT: A State-of-the-Art BERT Suite for Modern Hebrew

08/31/2023
by   Shaltiel Shmidman, et al.
0

We present DictaBERT, a new state-of-the-art pre-trained BERT model for modern Hebrew, outperforming existing models on most benchmarks. Additionally, we release two fine-tuned versions of the model, designed to perform two specific foundational tasks in the analysis of Hebrew texts: prefix segmentation and morphological tagging. These fine-tuned models allow any developer to perform prefix segmentation and morphological tagging of a Hebrew sentence with a single call to a HuggingFace model, without the need to integrate any additional libraries or code. In this paper we describe the details of the training as well and the results on the different benchmarks. We release the models to the community, along with sample code demonstrating their use. We release these models as part of our goal to help further research and development in Hebrew NLP.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/15/2021

Improving Prosody for Unseen Texts in Speech Synthesis by Utilizing Linguistic Information and Noisy Data

Recent advancements in end-to-end speech synthesis have made it possible...
research
01/28/2020

PEL-BERT: A Joint Model for Protocol Entity Linking

Pre-trained models such as BERT are widely used in NLP tasks and are fin...
research
11/28/2022

Large Pre-Trained Models with Extra-Large Vocabularies: A Contrastive Analysis of Hebrew BERT Models and a New One to Outperform Them All

We present a new pre-trained language model (PLM) for modern Hebrew, ter...
research
08/17/2023

Chinese Spelling Correction as Rephrasing Language Model

This paper studies Chinese Spelling Correction (CSC), which aims to dete...
research
08/03/2022

Introducing BEREL: BERT Embeddings for Rabbinic-Encoded Language

We present a new pre-trained language model (PLM) for Rabbinic Hebrew, t...
research
11/12/2021

MS-LaTTE: A Dataset of Where and When To-do Tasks are Completed

Tasks are a fundamental unit of work in the daily lives of people, who a...
research
04/27/2020

ColBERT: Using BERT Sentence Embedding for Humor Detection

Automatic humor detection has interesting use cases in modern technologi...

Please sign up or login with your details

Forgot password? Click here to reset