Repository logo
 

TOP2LABEL: EXPLAINABLE ZERO SHOT TOPIC LABELLING USING KNOWLEDGE GRAPHS

Date

2022-12-14

Authors

Chaudhary, Akhil

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Automatic topic labelling aims to generate sound, interpretable, and meaningful topic labels used to interpret topics. A topic is usually represented by a list of terms and documents, ranked by their probability, and we are using Top2Vec for our topic modelling. Automatic Topic labelling intends to reduce the effort to interpret while investigating the topics. In this study, we introduce a novel three-phase zero-shot topic labelling framework using the ConceptNet knowledge graph (a freely-available the semantic network of words and phrases) and language models as external sources of information. The first phase uses the knowledge graph by extending the top n words (based on semantic similarity) neighbourhood and filling missing connections and information gaps by querying ConceptNet and generating a candidate sub-graph to generate candidate labels. In the second phase, it develops a neighbourhood graph for each candidate label, scores each node based on its semantic similarity with the topic and retains the best sub-graph based on semantic similarity. In the third phase, we utilize the language model to determine the labels using the final graph as input. We use a knowledge graph and language model to extend the knowledge beyond topic documents to optimize discovered topics with better representative terms while retaining the topic information. The proposed framework decreases the computation burden by utilizing a zero-shot approach and reduces the cognitive and interpretation load of the end-user by creating three types of labels for each topic, i.e., a one-word label, sentence label and summary label. The experimental results showed that our model significantly outperforms the unsupervised baselines and classic topic labelling models and is comparable to supervised baselines topic labelling models.

Description

We presented the first zero-shot model to generate all three types of textual labels (i.e., 1. One Word Label, 2. Sentence Label, and 3. Summary Label) for automatically generated topics. We have defined our evaluation matrix based on BERTScore, which is used to measure the similarities between the generated label and gold standard labels in the case of One Worded Label and between the original Article and generated label for Sentence and Summary labels. Our zero-shot approach is sound and produces appropriate labels.

Keywords

Topic Labelling, Topic Modelling, document labelling, summarization, conceptnet, Knowledge Graph, Explainable, Zero-Shot, ConceptNet

Citation