Software Entity Recognition with Noise-Robust Learning

08/21/2023
by   Tai Nguyen, et al.
0

Recognizing software entities such as library names from free-form text is essential to enable many software engineering (SE) technologies, such as traceability link recovery, automated documentation, and API recommendation. While many approaches have been proposed to address this problem, they suffer from small entity vocabularies or noisy training data, hindering their ability to recognize software entities mentioned in sophisticated narratives. To address this challenge, we leverage the Wikipedia taxonomy to develop a comprehensive entity lexicon with 79K unique software entities in 12 fine-grained types, as well as a large labeled dataset of over 1.7M sentences. Then, we propose self-regularization, a noise-robust learning approach, to the training of our software entity recognition (SER) model by accounting for many dropouts. Results show that models trained with self-regularization outperform both their vanilla counterparts and state-of-the-art approaches on our Wikipedia benchmark and two Stack Overflow benchmarks. We release our models, data, and code for future research.

READ FULL TEXT
research
02/22/2023

Open-domain Visual Entity Recognition: Towards Recognizing Millions of Wikipedia Entities

Large-scale multi-modal pre-training models such as CLIP and PaLI exhibi...
research
02/08/2017

Automatically Annotated Turkish Corpus for Named Entity Recognition and Text Categorization using Large-Scale Gazetteers

Turkish Wikipedia Named-Entity Recognition and Text Categorization (TWNE...
research
04/13/2021

Transformer-based Methods for Recognizing Ultra Fine-grained Entities (RUFES)

This paper summarizes the participation of the Laboratoire Informatique,...
research
05/11/2023

SemEval-2023 Task 2: Fine-grained Multilingual Named Entity Recognition (MultiCoNER 2)

We present the findings of SemEval-2023 Task 2 on Fine-grained Multiling...
research
05/22/2023

EnCore: Pre-Training Entity Encoders using Coreference Chains

Entity typing is the task of assigning semantic types to the entities th...
research
04/10/2018

SWAT: A System for Detecting Salient Wikipedia Entities in Texts

We study the problem of entity salience by proposing the design and impl...
research
11/17/2018

Unnamed Entity Recognition of Sense Mentions

We consider the problem of recognizing mentions of human senses in text....

Please sign up or login with your details

Forgot password? Click here to reset