Repository logo

A Graph-Based Machine Learning Approach for Feature Set Matching: Towards Multi-Factorial Donor-Recipient Matching for Kidney Transplantation

dc.contributor.authorMajouni, Sheida
dc.contributor.copyright-releaseNot Applicable
dc.contributor.degreeDoctor of Philosophy
dc.contributor.departmentFaculty of Computer Science
dc.contributor.ethics-approvalNot Applicable
dc.contributor.external-examinerDr. Jose Manuel Juarez Herrero
dc.contributor.manuscriptsNot Applicable
dc.contributor.thesis-readerDr. Hassan Sajjad
dc.contributor.thesis-readerDr. Samina Abidi
dc.contributor.thesis-supervisorDr. Syed Sibte Raza Abidi
dc.contributor.thesis-supervisorDr. Karthik Tennankore
dc.date.accessioned2026-03-11T12:29:35Z
dc.date.available2026-03-11T12:29:35Z
dc.date.defence2026-03-06
dc.date.issued2026-03-10
dc.descriptionMany decision-support tasks require determining optimal correspondences between two heterogeneous sets of entities, services to clients, and donors to recipients. Heterogeneous feature set matching (FSM) problems involve complex relationships, multiple criteria, and constraints that challenge traditional machine learning (ML) methods. These models usually use flat feature vectors and overlook interaction networks. We present a novel graph-based representation-learning framework that predicts optimal matches. The primary contribution is the construction of a bipartite knowledge graph integrating structured domain knowledge, temporal dynamics, and multi-relational interactions as diverse multi-typed edges. To address scalability and training instability in large, dense graphs, we introduce an edge-pruning strategy that preserves the connections. We formulate FSM as link prediction and develop a Graph Neural Network (GNN) architecture that learns embeddings for heterogeneous entities. The model captures attribute-level similarity and relational context, enabling predictions, rankings, and interpretable matching. A key component is a novel, temporal-aware graph partitioning technique that addresses limitations of standard graph splitting, preventing information leakage. We validate this framework through a large-scale case study on donor-recipient matching (DRM) problem in kidney transplantation (KT), involving 27,000 patients and 49 million relationships, using the Scientific Registry of Transplant Recipients (SRTR) dataset. The framework outperforms well-established ML models (e.g., XGBoost, random forest, logistic regression) and shows the ability to discover hidden compatibility structure. Beyond predictive accuracy, this thesis emphasizes interpretability. By integrating domain knowledge into the graph structure and using attention-based message passing, our framework examines how compatibility is acquired. Through layer-wise embedding analysis, attention weight inspection, and neighborhood influence tracing, we demonstrate which donor and recipient features contribute most to the model’s predictions. This provides transparency into the learned matching function, revealing compatibility patterns that align with domain expectations. Additionally, we design a simulation-based waitlist evaluation protocol to replicate real allocation scenarios by reconstructing counterfactual waitlists and ranking candidate matches for each donor. These simulated waitlists enable evaluating whether the model suggests better matches than historical allocations. Overall, this work introduces a temporally aware framework for complex, constrained FSM by contributing: (i) a generalizable multi-relational graph construction framework; (ii) a scalable and temporally consistent GNN-based matching model; and (iii) a comprehensive evaluation methodology that combines predictive accuracy, interpretability, and real-world inference tasks. This provides a novel solution for decision support in healthcare and other critical domains.
dc.description.abstractMany decision-support tasks require determining optimal correspondences between heterogeneous entities, such as services to clients or donors to recipients. These heterogeneous feature set matching (FSM) problems involve complex relationships, multiple criteria, and domain constraints that challenge traditional machine learning methods based on flat feature vectors. This thesis presents a graph-based representation-learning framework for predicting optimal matches. We construct a bipartite knowledge graph integrating domain knowledge, temporal dynamics, and multi-relational interactions, and introduce an edge-pruning strategy to improve scalability in dense graphs. FSM is formulated as a link prediction problem using a Graph Neural Network (GNN) that learns embeddings capturing both attribute similarity and relational context. A temporal-aware graph partitioning strategy prevents information leakage during evaluation. The framework is validated on donor-recipient matching in kidney transplantation using the SRTR dataset (27,000 patients and 49 million relationships), outperforming traditional ML models. Interpretability analysis and simulation-based waitlist evaluation further demonstrate the framework’s ability to reveal compatibility patterns and support real-world allocation decisions.
dc.identifier.urihttps://hdl.handle.net/10222/85853
dc.language.isoen
dc.subjectGraph Neural Networks
dc.subjectDonor Recipient Matching
dc.subjectKidney Transplantation
dc.subjectGraph Attention Networks
dc.subjectEdge Prediction
dc.subjectKnowledge-Driven Graph
dc.titleA Graph-Based Machine Learning Approach for Feature Set Matching: Towards Multi-Factorial Donor-Recipient Matching for Kidney Transplantation

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
SheidaMajouni2026.pdf
Size:
8.36 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.12 KB
Format:
Item-specific license agreed upon to submission
Description: