Modeling Noisiness to Recognize Named Entities using Multitask Neural Networks on Social Media

06/10/2019
by   Gustavo Aguilar, et al.
0

Recognizing named entities in a document is a key task in many NLP applications. Although current state-of-the-art approaches to this task reach a high performance on clean text (e.g. newswire genres), those algorithms dramatically degrade when they are moved to noisy environments such as social media domains. We present two systems that address the challenges of processing social media data using character-level phonetics and phonology, word embeddings, and Part-of-Speech tags as features. The first model is a multitask end-to-end Bidirectional Long Short-Term Memory (BLSTM)-Conditional Random Field (CRF) network whose output layer contains two CRF classifiers. The second model uses a multitask BLSTM network as feature extractor that transfers the learning to a CRF classifier for the final prediction. Our systems outperform the current F1 scores of the state of the art on the Workshop on Noisy User-generated Text 2017 dataset by 2.45 suitable approach for social media environments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/09/2019

Hate Speech Detection on Vietnamese Social Media Text using the Bidirectional-LSTM Model

In this paper, we describe our system which participates in the shared t...
research
01/27/2018

Deep Neural Networks In Fully Connected CRF For Image Labeling With Social Network Metadata

We propose a novel method for predicting image labels by fusing image co...
research
10/24/2015

Combine CRF and MMSEG to Boost Chinese Word Segmentation in Social Media

In this paper, we propose a joint algorithm for the word segmentation on...
research
12/06/2017

Discourse-Aware Rumour Stance Classification in Social Media Using Sequential Classifiers

Rumour stance classification, defined as classifying the stance of speci...
research
01/30/2020

An Efficient Architecture for Predicting the Case of Characters using Sequence Models

The dearth of clean textual data often acts as a bottleneck in several n...
research
01/24/2023

Multitask Instruction-based Prompting for Fallacy Recognition

Fallacies are used as seemingly valid arguments to support a position an...
research
12/17/2019

To What Extent are Name Variants Used as Named Entities in Turkish Tweets?

Social media texts differ from regular texts in various aspects. One of ...

Please sign up or login with your details

Forgot password? Click here to reset