Repository logo

GRPO-Rad: Group Relative Policy Optimization for Radiology Report Summarization

Loading...
Thumbnail Image

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Radiology report summarization requires condensing detailed findings into concise impressions, a task where traditional supervised fine-tuning (SFT) often struggles to balance syntactic correctness, clinical accuracy, and brevity. This thesis investigates Group Relative Policy Optimization (GRPO) as a superior alternative, enabling direct optimization of a composite reward function combining ROUGE-L syntactic similarity and length constraint. Using the MIMIC-III dataset and Qwen 3.0 decoder-only models (0.6B and 1.7B parameters) with parameter-efficient LoRA fine-tuning, we systematically evaluate 24 configurations varying model size, prompting, and few-shot learning. Results demonstrate that GRPO consistently outperforms both zero-shot baseline and SFT across syntactic (ROUGE-L) and clinical (F1-RadGraph) metrics. The optimal GRPO configuration achieves 32.65 ROUGE-L and 30.28 F1-RadGraph, representing a 16% improvement over SFT with statistical significance (p < 0.05). This work presents the first application of GRPO to medical text, establishing it as a robust framework for clinical documentation tasks requiring multi-objective optimization.

Description

Keywords

Group Relative Policy Optimization, Medical Text Summarization, Reinforcement Learning from Human Feedback

Citation