Detecting Potential Topics In News Using BERT, CRF and Wikipedia

02/26/2020
by   Swapnil Ashok Jadhav, et al.
0

For a news content distribution platform like Dailyhunt, Named Entity Recognition is a pivotal task for building better user recommendation and notification algorithms. Apart from identifying names, locations, organisations from the news for 13+ Indian languages and use them in algorithms, we also need to identify n-grams which do not necessarily fit in the definition of Named-Entity, yet they are important. For example, "me too movement", "beef ban", "alwar mob lynching". In this exercise, given an English language text, we are trying to detect case-less n-grams which convey important information and can be used as topics and/or hashtags for a news. Model is built using Wikipedia titles data, private English news corpus and BERT-Multilingual pre-trained model, Bi-GRU and CRF architecture. It shows promising results when compared with industry best Flair, Spacy and Stanford-caseless-NER in terms of F1 and especially Recall.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/22/2021

Namesakes: Ambiguously Named Entities from Wikipedia and News

We present Namesakes, a dataset of ambiguously named entities obtained f...
research
09/23/2019

Portuguese Named Entity Recognition using BERT-CRF

Recent advances in language representation using neural networks have ma...
research
08/28/2023

ANER: Arabic and Arabizi Named Entity Recognition using Transformer-Based Approach

One of the main tasks of Natural Language Processing (NLP), is Named Ent...
research
09/17/2018

Similarity measure for Public Persons

For the webportal "Who is in the News!" with statistics about the appear...
research
12/14/2022

Building and Evaluating Universal Named-Entity Recognition English corpus

This article presents the application of the Universal Named Entity fram...
research
11/21/2019

Global Health Monitor: A Web-based System for Detecting and Mapping Infectious Diseases

We present the Global Health Monitor, an online Web-based system for det...
research
07/02/2020

NLNDE: Enhancing Neural Sequence Taggers with Attention and Noisy Channel for Robust Pharmacological Entity Detection

Named entity recognition has been extensively studied on English news te...

Please sign up or login with your details

Forgot password? Click here to reset