Monolith Development History for Microservices Identification: a Comparative Analysis
Recent research has proposed different approaches on the automated identification of candidate microservices on monolith systems, which vary on the monolith representation, similarity criteria, and quality metrics used. On the other hand, they are generally limited in the number of codebases and decompositions evaluated, and few comparisons between approaches exist. Considering the emerging trend in software engineering in techniques based on the analysis of codebases' evolution, we compare a representation based on the monolith code structure, in particular the sequences of accesses to domain entities, with representations based on the monolith development history (file changes and changes authorship). From the analysis on a total of 468k decompositions of 28 codebases, using five quality metrics that evaluate modularity, minimization of the number of transactions per functionality, and reduction of teams and communication, we conclude that the best decompositions on each metric were made by combining data from the sequences of accesses and the development history representations. We also found that the changes authorship representation of codebases with many authors achieves comparable or better results than the sequence of accesses representation of codebases with few authors with respect to minimization of the number of transactions per functionality and the reduction of teams.
READ FULL TEXT