E-SC4R: Explaining Software Clustering for Remodularisation

by   Alvin Jian Jia Tan, et al.

Maintenance of existing software requires a large amount of time for comprehending the source code. The architecture of a software, however, may not be clear to maintainers if up to date documentations are not available. Software clustering is often used as a remodularisation and architecture recovery technique to help recover a semantic representation of the software design. Due to the diverse domains, structure, and behaviour of software systems, the suitability of different clustering algorithms for different software systems are not investigated thoroughly. Research that introduce new clustering techniques usually validate their approaches on a specific domain, which might limit its generalisability. If the chosen test subjects could only represent a narrow perspective of the whole picture, researchers might risk not being able to address the external validity of their findings. This work aims to fill this gap by introducing a new approach, Explaining Software Clustering for Remodularisation, to evaluate the effectiveness of different software clustering approaches. This work focuses on hierarchical clustering and Bunch clustering algorithms and provides information about their suitability according to the features of the software, which as a consequence, enables the selection of the most optimum algorithm and configuration from our existing pool of choices for a particular software system. The proposed framework is tested on 30 open source software systems with varying sizes and domains, and demonstrates that it can characterise both the strengths and weaknesses of the analysed software clustering algorithms using software features extracted from the code. The proposed approach also provides a better understanding of the algorithms behaviour through the application of dimensionality reduction techniques.



There are no comments yet.


page 27


On the Effect of Semantically Enriched Context Models on Software Modularization

Many of the existing approaches for program comprehension rely on the li...

Uncovering the epistemological and ontological assumptions of software designers

The ontological and epistemological positions adopted by information sys...

Recovery of Architecture Module Views using an Optimized Algorithm Based on Design Structure Matrices

Design structure matrices (DSMs) are useful to represent high-level syst...

Recover and RELAX: Concern-Oriented Software Architecture Recovery for Systems Development and Maintenance

The stakeholders of a system are legitimately interested in whether and ...

Discovery of Layered Software Architecture from Source Code Using Ego Networks

Software architecture refers to the high-level abstraction of a system i...

Clustering with a Reject Option: Interactive Clustering as Bayesian Prior Elicitation

A good clustering can help a data analyst to explore and understand a da...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.