Repository logo
 

Semantic and Execution-Aware Extract Method Refactoring via Self-Supervised Learning and Reinforcement Learning-Based Model Alignment

dc.contributor.authorPalit, Indranil
dc.contributor.copyright-releaseYes
dc.contributor.degreeMaster of Computer Science
dc.contributor.departmentFaculty of Computer Science
dc.contributor.ethics-approvalNot Applicable
dc.contributor.external-examinern/a
dc.contributor.manuscriptsYes
dc.contributor.thesis-readerDr. Masud Rahman
dc.contributor.thesis-readerDr. Janarthanan Rajendran
dc.contributor.thesis-supervisorDr. Tushar Sharma
dc.date.accessioned2024-12-16T15:44:39Z
dc.date.available2024-12-16T15:44:39Z
dc.date.defence2024-12-09
dc.date.issued2024-12-15
dc.description.abstractSoftware code refactoring is essential for maintaining and improving code quality, yet it remains challenging for practitioners. While modern tools help identify where code needs refactoring, the current implementation techniques often miss meaningful refactoring opportunities. This accumulates technical debt over time, making software increasingly difficult to maintain and evolve. This thesis presents an automated hybrid approach to identify refactoring candidates and generate refactored code leveraging language models and reinforcement learning. The first major contribution of the thesis addresses the shortcomings of automatic refactoring candidate identification by training machine learning classifiers with rich code semantics. Unlike traditional approaches that rely on metrics and commit messages, we’ve developed a self-supervised learning approach to identify negative samples utilizing state-of-the-art GraphCodeBERT embeddings. This approach achieves a 30% improvement in F1 score compared to existing metric-based techniques to identify extract method refactoring candidates automatically. Our second contribution introduces a novel approach to automated code refactoring using reinforcement learning, with a specific focus on extract method refactoring. While recent advances in large language models have shown promise for code transformation, traditional supervised learning approaches often fail to produce reliable results. These models typically struggle with maintaining code integrity because they treat code generation similar to text generation, overlooking crucial aspects like compilability and functional correctness. To address this limitation, we develop a method that fine-tunes state-of-the-art pre-trained code language models (e.g., CodeT5) with Proximal Policy Optimization (PPO), creating a code-aware transformation framework. Our approach uses carefully designed reward signals based on successful compilation and adherence to established refactoring guidelines, moving beyond simple text-based metrics. When tested against conventional supervised learning methods, our system demonstrates significant improvements in quality and quantity.
dc.identifier.urihttps://hdl.handle.net/10222/84790
dc.language.isoen_US
dc.subjectextract method refactoring
dc.subjectdeep learning
dc.subjectcode representation
dc.subjectreinforcement learning
dc.subjectlarge language models
dc.titleSemantic and Execution-Aware Extract Method Refactoring via Self-Supervised Learning and Reinforcement Learning-Based Model Alignment

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
IndranilPalit2024.pdf
Size:
2.11 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.03 KB
Format:
Item-specific license agreed upon to submission
Description: