Repository logo
 

Longitudinal Analysis of Code Smell Evolution and Refactoring Effectiveness

Date

2025-07-24

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Refactoring and code smells are closely linked concepts in software engineering. Refactoring involves improving the internal structure of code without altering its external behavior, whereas code smells are the symptoms of deeper design or implementation issues. Although refactoring is commonly assumed to address code smells, the extent to which it does so in practice remains unclear. This thesis investigates the empirical understanding of how code smells and refactorings evolve over time. As a preliminary effort, we contributed to a machine learning-based approach that identifies extract method refactoring candidates using semantic code representations derived from a pre-trained large language model. This approach demonstrates a 30% improvement in F1 score over traditional metric-based baselines. The main contribution of this thesis is a comprehensive empirical study on the evolution of code smells and the effectiveness of refactorings in addressing them. We analyzed over 212,000 commits from 87 open-source Java projects to examine how code smells appear, persist, and are removed. An initial mapping between code smells and refactorings was constructed based on their collocation patterns within commit histories. To deepen our understanding, we performed extensive manual analysis, combining human judgment with the assistance of a large language model to interpret complex cases. Our findings reveal that, although extract method refactoring is among the more impactful techniques, most refactorings do not directly remove code smells. Many smells are removed without being explicitly linked to a refactoring action, and design smells tend to persist significantly longer than implementation smells. These insights advance our understanding of the interplay between code smells and refactoring in software evolution and can inform the development of more effective tools and practices for software maintenance.

Description

Keywords

code smell, refactoring, software quality, software evolution

Citation