San-BERT: Extractive Summarization for Sanskrit Documents using BERT and it's variants

04/04/2023
by   Kartik Bhatnagar, et al.
0

In this work, we develop language models for the Sanskrit language, namely Bidirectional Encoder Representations from Transformers (BERT) and its variants: A Lite BERT (ALBERT), and Robustly Optimized BERT (RoBERTa) using Devanagari Sanskrit text corpus. Then we extracted the features for the given text from these models. We applied the dimensional reduction and clustering techniques on the features to generate an extractive summary for a given Sanskrit document. Along with the extractive text summarization techniques, we have also created and released a Sanskrit Devanagari text corpus publicly.

READ FULL TEXT
research
05/14/2020

A pre-training technique to localize medical BERT and enhance BioBERT

Bidirectional Encoder Representations from Transformers (BERT) models fo...
research
08/22/2019

Text Summarization with Pretrained Encoders

Bidirectional Encoder Representations from Transformers (BERT) represent...
research
06/07/2019

Leveraging BERT for Extractive Text Summarization on Lectures

In the last two decades, automatic extractive text summarization on lect...
research
05/12/2021

Better than BERT but Worse than Baseline

This paper compares BERT-SQuAD and Ab3P on the Abbreviation Definition I...
research
08/31/2021

Monolingual versus Multilingual BERTology for Vietnamese Extractive Multi-Document Summarization

Recent researches have demonstrated that BERT shows potential in a wide ...
research
02/24/2022

Finding Inverse Document Frequency Information in BERT

For many decades, BM25 and its variants have been the dominant document ...
research
03/17/2023

Trained on 100 million words and still in shape: BERT meets British National Corpus

While modern masked language models (LMs) are trained on ever larger cor...

Please sign up or login with your details

Forgot password? Click here to reset