Code and Named Entity Recognition in StackOverflow

05/04/2020
by   Jeniya Tabassum, et al.
0

There is an increasing interest in studying natural language and computer code together, as large corpora of programming texts become readily available on the Internet. For example, StackOverflow currently has over 15 million programming related questions written by 8.5 million users. Meanwhile, there is still a lack of fundamental NLP techniques for identifying code tokens or software-related named entities that appear within natural language sentences. In this paper, we introduce a new named entity recognition (NER) corpus for the computer programming domain, consisting of 15,372 sentences annotated with 20 fine-grained entity types. We also present the SoftNER model that combines contextual information with domain specific knowledge using an attention network. The code token recognizer combined with an entity segmentation model we proposed, consistently improves the performance of the named entity tagger. Our proposed SoftNER tagger outperforms the BiLSTM-CRF model with an absolute increase of +9.73 F-1 score on StackOverflow data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/19/2018

pioNER: Datasets and Baselines for Armenian Named Entity Recognition

In this work, we tackle the problem of Armenian named entity recognition...
research
04/05/2022

LAMNER: Code Comment Generation Using Character Language Model and Named Entity Recognition

Code comment generation is the task of generating a high-level natural l...
research
03/16/2023

BanglaCoNER: Towards Robust Bangla Complex Named Entity Recognition

Named Entity Recognition (NER) is a fundamental task in natural language...
research
10/18/2016

Vietnamese Named Entity Recognition using Token Regular Expressions and Bidirectional Inference

This paper describes an efficient approach to improve the accuracy of a ...
research
06/19/2020

Chatbot: A Conversational Agent employed with Named Entity Recognition Model using Artificial Neural Network

Chatbot is a technology that is used to mimic human behavior using natur...
research
05/03/2021

Switching Contexts: Transportability Measures for NLP

This paper explores the topic of transportability, as a sub-area of gene...
research
10/31/2016

Named Entity Recognition for Novel Types by Transfer Learning

In named entity recognition, we often don't have a large in-domain train...

Please sign up or login with your details

Forgot password? Click here to reset