An Ensemble Approach for Annotating Source Code Identifiers with Part-of-speech Tags

09/01/2021
by   Christian D. Newman, et al.
0

This paper presents an ensemble part-of-speech tagging approach for source code identifiers. Ensemble tagging is a technique that uses machine-learning and the output from multiple part-of-speech taggers to annotate natural language text at a higher quality than the part-of-speech taggers are able to obtain independently. Our ensemble uses three state-of-the-art part-of-speech taggers: SWUM, POSSE, and Stanford. We study the quality of the ensemble's annotations on five different types of identifier names: function, class, attribute, parameter, and declaration statement at the level of both individual words and full identifier names. We also study and discuss the weaknesses of our tagger to promote the future amelioration of these problems through further research. Our results show that the ensemble achieves 75% accuracy at the identifier level and 84-86% accuracy at the word level. This is an increase of +17% points at the identifier level from the closest independent part-of-speech tagger.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/15/2022

Unsupervised word-level prosody tagging for controllable speech synthesis

Although word-level prosody modeling in neural text-to-speech (TTS) has ...
research
04/01/2017

Topic modeling of public repositories at scale using names in source code

Programming languages themselves have a limited number of reserved keywo...
research
07/15/2020

On the Generation, Structure, and Semantics of Grammar Patterns in Source Code Identifiers

Identifiers make up a majority of the text in code. They are one of the ...
research
04/06/2022

Yunshan Cup 2020: Overview of the Part-of-Speech Tagging Task for Low-resourced Languages

The Yunshan Cup 2020 track focused on creating a framework for evaluatin...
research
08/01/2017

Improving Part-of-Speech Tagging for NLP Pipelines

This paper outlines the results of sentence level linguistics based rule...
research
07/09/2020

Automatic Personality Prediction; an Enhanced Method Using Ensemble Modeling

Human personality is significantly represented by those words which he/s...
research
02/07/2022

Over-the-Air Ensemble Inference with Model Privacy

We consider distributed inference at the wireless edge, where multiple c...

Please sign up or login with your details

Forgot password? Click here to reset