A Graph-Based Machine Learning Approach for Feature Set Matching: Towards Multi-Factorial Donor-Recipient Matching for Kidney Transplantation
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Many decision-support tasks require determining optimal correspondences between heterogeneous entities, such as services to clients or donors to recipients. These heterogeneous feature set matching (FSM) problems involve complex relationships, multiple criteria, and domain constraints that challenge traditional machine learning methods based on flat feature vectors.
This thesis presents a graph-based representation-learning framework for predicting optimal matches. We construct a bipartite knowledge graph integrating domain knowledge, temporal dynamics, and multi-relational interactions, and introduce an edge-pruning strategy to improve scalability in dense graphs. FSM is formulated as a link prediction problem using a Graph Neural Network (GNN) that learns embeddings capturing both attribute similarity and relational context.
A temporal-aware graph partitioning strategy prevents information leakage during evaluation. The framework is validated on donor-recipient matching in kidney transplantation using the SRTR dataset (27,000 patients and 49 million relationships), outperforming traditional ML models. Interpretability analysis and simulation-based waitlist evaluation further demonstrate the framework’s ability to reveal compatibility patterns and support real-world allocation decisions.
Description
Many decision-support tasks require determining optimal correspondences between two heterogeneous sets of entities, services to clients, and donors to recipients. Heterogeneous feature set matching (FSM) problems involve complex relationships, multiple criteria, and constraints that challenge traditional machine learning (ML) methods. These models usually use flat feature vectors and overlook interaction networks.
We present a novel graph-based representation-learning framework that predicts optimal matches. The primary contribution is the construction of a bipartite knowledge graph integrating structured domain knowledge, temporal dynamics, and multi-relational interactions as diverse multi-typed edges. To address scalability and training instability in large, dense graphs, we introduce an edge-pruning strategy that preserves the connections.
We formulate FSM as link prediction and develop a Graph Neural Network (GNN) architecture that learns embeddings for heterogeneous entities. The model captures attribute-level similarity and relational context, enabling predictions, rankings, and interpretable matching. A key component is a novel, temporal-aware graph partitioning technique that addresses limitations of standard graph splitting, preventing information leakage. We validate this framework through a large-scale case study on donor-recipient matching (DRM) problem in kidney transplantation (KT), involving 27,000 patients and 49 million relationships, using the Scientific Registry of Transplant Recipients (SRTR) dataset. The framework outperforms well-established ML models (e.g., XGBoost, random forest, logistic regression) and shows the ability to discover hidden compatibility structure.
Beyond predictive accuracy, this thesis emphasizes interpretability. By integrating domain knowledge into the graph structure and using attention-based message passing, our framework examines how compatibility is acquired. Through layer-wise embedding analysis, attention weight inspection, and neighborhood influence tracing, we demonstrate which donor and recipient features contribute most to the model’s predictions. This provides transparency into the learned matching function, revealing compatibility patterns that align with domain expectations. Additionally, we design a simulation-based waitlist evaluation protocol to replicate real allocation scenarios by reconstructing counterfactual waitlists and ranking candidate matches for each donor. These simulated waitlists enable evaluating whether the model suggests better matches than historical allocations.
Overall, this work introduces a temporally aware framework for complex, constrained FSM by contributing: (i) a generalizable multi-relational graph construction framework; (ii) a scalable and temporally consistent GNN-based matching model; and (iii) a comprehensive evaluation methodology that combines predictive accuracy, interpretability, and real-world inference tasks. This provides a novel solution for decision support in healthcare and other critical domains.
Keywords
Graph Neural Networks, Donor Recipient Matching, Kidney Transplantation, Graph Attention Networks, Edge Prediction, Knowledge-Driven Graph
