GRPO-Rad: Group Relative Policy Optimization for Radiology Report Summarization
| dc.contributor.author | Nassiri, Fargol | |
| dc.contributor.copyright-release | Not Applicable | |
| dc.contributor.degree | Master of Computer Science | |
| dc.contributor.department | Faculty of Computer Science | |
| dc.contributor.ethics-approval | Not Applicable | |
| dc.contributor.external-examiner | N/A | |
| dc.contributor.manuscripts | No | |
| dc.contributor.thesis-reader | Vlado Keselj | |
| dc.contributor.thesis-reader | Hassan Sajjad | |
| dc.contributor.thesis-supervisor | Frank Rudzicz | |
| dc.date.accessioned | 2025-12-11T15:22:40Z | |
| dc.date.available | 2025-12-11T15:22:40Z | |
| dc.date.defence | 2025-11-28 | |
| dc.date.issued | 2025-12-10 | |
| dc.description.abstract | Radiology report summarization requires condensing detailed findings into concise impressions, a task where traditional supervised fine-tuning (SFT) often struggles to balance syntactic correctness, clinical accuracy, and brevity. This thesis investigates Group Relative Policy Optimization (GRPO) as a superior alternative, enabling direct optimization of a composite reward function combining ROUGE-L syntactic similarity and length constraint. Using the MIMIC-III dataset and Qwen 3.0 decoder-only models (0.6B and 1.7B parameters) with parameter-efficient LoRA fine-tuning, we systematically evaluate 24 configurations varying model size, prompting, and few-shot learning. Results demonstrate that GRPO consistently outperforms both zero-shot baseline and SFT across syntactic (ROUGE-L) and clinical (F1-RadGraph) metrics. The optimal GRPO configuration achieves 32.65 ROUGE-L and 30.28 F1-RadGraph, representing a 16% improvement over SFT with statistical significance (p < 0.05). This work presents the first application of GRPO to medical text, establishing it as a robust framework for clinical documentation tasks requiring multi-objective optimization. | |
| dc.identifier.uri | https://hdl.handle.net/10222/85559 | |
| dc.language.iso | en | |
| dc.subject | Group Relative Policy Optimization | |
| dc.subject | Medical Text Summarization | |
| dc.subject | Reinforcement Learning from Human Feedback | |
| dc.title | GRPO-Rad: Group Relative Policy Optimization for Radiology Report Summarization |
