Faithful Concept-based Explanations For Partition-based Document Clustering

Husain, Syed Mohammad Baqir2024-08-302024-08-302024-08-30http://hdl.handle.net/10222/84532This research introduces the Conceptual Document Clustering Explanation Model (CDCEM), a novel explanation model for explaining unsupervised textual clustering. CDCEM explains the discovered clusters and document assignments. Furthermore, it ensures faithfulness—meaning it accurately reflects the decision-making process—using the core elements of black-box textual clustering, such as document embedding and centroids from k-means. This faithfulness and comprehensiveness boost user trust and understanding and help debug clustering. Using Wikipedia, CDCEM first performs wikification, which extracts real-world concepts from the text. It then evaluates these concepts' significance for cluster assignment to produce concept-based explanations. CDCEM determines the importance of each concept within a cluster by measuring the cosine similarity between the concept's embedding (representing its contextual meaning) with the cluster centres (representing the cluster's theme), both of which it derives from a black-box model (using ELMO for embeddings and K-means for clustering). This concept's importance for each cluster facilitates generating concept-based explanations at two levels: cluster-level explanations, which describe the concepts that best represent the clusters, and document-level explanations, which clarify why the black-box model assigns a document to a particular cluster. We quantitatively evaluate the faithfulness of CDCEM using AG News, DBpedia, and Reuters-21578 datasets, comparing it with explainable classification methods (Decision Tree, Logistic Regression, and Naive Bayes) by treating clusters as classes and computing the agreement between the black-box model's predictions and explanations. Additionally, a user study was conducted to compare CDCEM with the best baseline in terms of comprehensiveness, accuracy, usefulness, user satisfaction, and usability of the explanation visualization tool on the AG News dataset. CDCEM showed higher faithfulness than the baseline model in quantitative evaluations, indicating accurate explanations of unsupervised clustering decisions. Qualitative evaluations revealed that users preferred CDCEM's cluster-level and document-level explanations for accuracy, clarity, logic, and comprehensibility.enExplanation ModelDocument ClusteringFaithfulnessFaithful Concept-based Explanations For Partition-based Document Clustering