Semantic and Execution-Aware Extract Method Refactoring via Self-Supervised Learning and Reinforcement Learning-Based Model Alignment

Palit, Indranil

Semantic and Execution-Aware Extract Method Refactoring via Self-Supervised Learning and Reinforcement Learning-Based Model Alignment

dc.contributor.author	Palit, Indranil
dc.contributor.copyright-release	Yes
dc.contributor.degree	Master of Computer Science
dc.contributor.department	Faculty of Computer Science
dc.contributor.ethics-approval	Not Applicable
dc.contributor.external-examiner	n/a
dc.contributor.manuscripts	Yes
dc.contributor.thesis-reader	Dr. Masud Rahman
dc.contributor.thesis-reader	Dr. Janarthanan Rajendran
dc.contributor.thesis-supervisor	Dr. Tushar Sharma
dc.date.accessioned	2024-12-16T15:44:39Z
dc.date.available	2024-12-16T15:44:39Z
dc.date.defence	2024-12-09
dc.date.issued	2024-12-15
dc.description.abstract	Software code refactoring is essential for maintaining and improving code quality, yet it remains challenging for practitioners. While modern tools help identify where code needs refactoring, the current implementation techniques often miss meaningful refactoring opportunities. This accumulates technical debt over time, making software increasingly difficult to maintain and evolve. This thesis presents an automated hybrid approach to identify refactoring candidates and generate refactored code leveraging language models and reinforcement learning. The first major contribution of the thesis addresses the shortcomings of automatic refactoring candidate identification by training machine learning classifiers with rich code semantics. Unlike traditional approaches that rely on metrics and commit messages, we’ve developed a self-supervised learning approach to identify negative samples utilizing state-of-the-art GraphCodeBERT embeddings. This approach achieves a 30% improvement in F1 score compared to existing metric-based techniques to identify extract method refactoring candidates automatically. Our second contribution introduces a novel approach to automated code refactoring using reinforcement learning, with a specific focus on extract method refactoring. While recent advances in large language models have shown promise for code transformation, traditional supervised learning approaches often fail to produce reliable results. These models typically struggle with maintaining code integrity because they treat code generation similar to text generation, overlooking crucial aspects like compilability and functional correctness. To address this limitation, we develop a method that fine-tunes state-of-the-art pre-trained code language models (e.g., CodeT5) with Proximal Policy Optimization (PPO), creating a code-aware transformation framework. Our approach uses carefully designed reward signals based on successful compilation and adherence to established refactoring guidelines, moving beyond simple text-based metrics. When tested against conventional supervised learning methods, our system demonstrates significant improvements in quality and quantity.
dc.identifier.uri	https://hdl.handle.net/10222/84790
dc.language.iso	en_US
dc.subject	extract method refactoring
dc.subject	deep learning
dc.subject	code representation
dc.subject	reinforcement learning
dc.subject	large language models
dc.title	Semantic and Execution-Aware Extract Method Refactoring via Self-Supervised Learning and Reinforcement Learning-Based Model Alignment

Files

Original bundle

Now showing 1 - 1 of 1

Name:: IndranilPalit2024.pdf
Size:: 2.11 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 2.03 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Faculty of Graduate Studies Online Theses