Improving Joint Layer RNN based Keyphrase Extraction by Using Syntactical Features

by   Miftahul Mahfuzh, et al.

Keyphrase extraction as a task to identify important words or phrases from a text, is a crucial process to identify main topics when analyzing texts from a social media platform. In our study, we focus on text written in Indonesia language taken from Twitter. Different from the original joint layer recurrent neural network (JRNN) with output of one sequence of keywords and using only word embedding, here we propose to modify the input layer of JRNN to extract more than one sequence of keywords by additional information of syntactical features, namely part of speech, named entity types, and dependency structures. Since JRNN in general requires a large amount of data as the training examples and creating those examples is expensive, we used a data augmentation method to increase the number of training examples. Our experiment had shown that our method outperformed the baseline methods. Our method achieved .9597 in accuracy and .7691 in F1.



There are no comments yet.


page 1

page 2

page 3

page 4


Abusive Language Detection and Characterization of Twitter Behavior

In this work, abusive language detection in online content is performed ...

Named Entity Recognition on Twitter for Turkish using Semi-supervised Learning with Word Embeddings

Recently, due to the increasing popularity of social media, the necessit...

KeyXtract Twitter Model - An Essential Keywords Extraction Model for Twitter Designed using NLP Tools

Since a tweet is limited to 140 characters, it is ambiguous and difficul...

Unsupervised and Efficient Vocabulary Expansion for Recurrent Neural Network Language Models in ASR

In automatic speech recognition (ASR) systems, recurrent neural network ...

A comparison of streaming models and data augmentation methods for robust speech recognition

In this paper, we present a comparative study on the robustness of two d...

Learning To Retrieve Prompts for In-Context Learning

In-context learning is a recent paradigm in natural language understandi...

Unsupervised Spoken Term Discovery Based on Re-clustering of Hypothesized Speech Segments with Siamese and Triplet Networks

Spoken term discovery from untranscribed speech audio could be achieved ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.