TOP2LABEL: EXPLAINABLE ZERO SHOT TOPIC LABELLING USING KNOWLEDGE GRAPHS
Date
2022-12-14
Authors
Chaudhary, Akhil
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Automatic topic labelling aims to generate sound, interpretable, and meaningful topic
labels used to interpret topics. A topic is usually represented by a list of terms and
documents, ranked by their probability, and we are using Top2Vec for our topic
modelling. Automatic Topic labelling intends to reduce the effort to interpret while
investigating the topics. In this study, we introduce a novel three-phase zero-shot
topic labelling framework using the ConceptNet knowledge graph (a freely-available
the semantic network of words and phrases) and language models as external sources
of information. The first phase uses the knowledge graph by extending the top n
words (based on semantic similarity) neighbourhood and filling missing connections
and information gaps by querying ConceptNet and generating a candidate sub-graph
to generate candidate labels. In the second phase, it develops a neighbourhood graph
for each candidate label, scores each node based on its semantic similarity with the
topic and retains the best sub-graph based on semantic similarity. In the third phase,
we utilize the language model to determine the labels using the final graph as input.
We use a knowledge graph and language model to extend the knowledge beyond
topic documents to optimize discovered topics with better representative terms while
retaining the topic information. The proposed framework decreases the computation
burden by utilizing a zero-shot approach and reduces the cognitive and interpretation
load of the end-user by creating three types of labels for each topic, i.e., a one-word
label, sentence label and summary label. The experimental results showed that our
model significantly outperforms the unsupervised baselines and classic topic labelling
models and is comparable to supervised baselines topic labelling models.
Description
We presented the first zero-shot model to generate all three types of textual labels (i.e., 1. One Word Label, 2. Sentence Label, and 3. Summary Label) for automatically generated topics. We have defined our evaluation matrix based on BERTScore, which is used to measure the similarities between the generated label and gold standard labels in the case of One Worded Label and between the original Article and generated label for Sentence and Summary labels. Our zero-shot approach is sound and produces appropriate labels.
Keywords
Topic Labelling, Topic Modelling, document labelling, summarization, conceptnet, Knowledge Graph, Explainable, Zero-Shot, ConceptNet