AI Chat AI Image Generator AI Video Text to Speech

San-BERT: Extractive Summarization for Sanskrit Documents using BERT and it's variants

04/04/2023

∙

by Kartik Bhatnagar, et al.

∙

∙

In this work, we develop language models for the Sanskrit language, namely Bidirectional Encoder Representations from Transformers (BERT) and its variants: A Lite BERT (ALBERT), and Robustly Optimized BERT (RoBERTa) using Devanagari Sanskrit text corpus. Then we extracted the features for the given text from these models. We applied the dimensional reduction and clustering techniques on the features to generate an extractive summary for a given Sanskrit document. Along with the extractive text summarization techniques, we have also created and released a Sanskrit Devanagari text corpus publicly.

Kartik Bhatnagar
1 publication
Sampath Lonka
1 publication
Jammi Kunal
1 publication
Mahabala Rao M G
1 publication

research

∙ 05/14/2020

A pre-training technique to localize medical BERT and enhance BioBERT

Bidirectional Encoder Representations from Transformers (BERT) models fo...

0 Shoya Wada, et al. ∙

research

∙ 08/22/2019

Text Summarization with Pretrained Encoders

Bidirectional Encoder Representations from Transformers (BERT) represent...

0 Yang Liu, et al. ∙

research

∙ 06/07/2019

Leveraging BERT for Extractive Text Summarization on Lectures

In the last two decades, automatic extractive text summarization on lect...

0 Derek Miller, et al. ∙

research

∙ 05/12/2021

Better than BERT but Worse than Baseline

This paper compares BERT-SQuAD and Ab3P on the Abbreviation Definition I...

0 Boxiang Liu, et al. ∙

research

∙ 08/31/2021

Monolingual versus Multilingual BERTology for Vietnamese Extractive Multi-Document Summarization

Recent researches have demonstrated that BERT shows potential in a wide ...

0 Huy To Quoc, et al. ∙

research

∙ 02/24/2022

Finding Inverse Document Frequency Information in BERT

For many decades, BM25 and its variants have been the dominant document ...

0 Jaekeol Choi, et al. ∙

research

∙ 03/17/2023

Trained on 100 million words and still in shape: BERT meets British National Corpus

While modern masked language models (LMs) are trained on ever larger cor...

0 David Samuel, et al. ∙