Data-centric Prediction Explanation and Model Editing for Deep Neural Networks
Abstract
Over the past decade, complex black-box models have excelled in various tasks, but their lack of transparency undermines trust in their predictions. This study contributes to Explainable AI (XAI) by introducing data-centric post-hoc explainers. We present two frameworks, FEHAN and DICTA, for locally explaining text classifiers through interpretable surrogate models. Experimental evaluations on four datasets demonstrate their effectiveness, with a focus on simplifying the explanation process. Additionally, we explore the explainability of Graph Convolutional Networks (GCNs) applied to molecular structures, offering multiple perspectives on their predictions. We also introduce HD-Explain, a post-hoc, model-aware, example-based explanation method for neural classifiers. HD-Explain uses Kernelized Stein Discrepancy (KSD) to identify influential training data points and potential distribution mismatches. This research advances the understanding of data contributions to machine learning models and addresses the emerging challenge of Machine Unlearning (MU) by leveraging insights into data-model interactions.