Tag Generalization For Facet-Based Search
Date
2013-08-23
Authors
Niewiarowski, Tomasz
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
In this project we address over-specification of tags, a common problem of modern
tag-based document management systems. In such systems tags are essential for the
document retrieval task. The accuracy of this process depends mainly on the “human
factor” i.e. the quality of tags assigned by users. While tagging, users are likely to
pick only very specific tags that describe the content of a resource, forgetting about
general concepts that represent the resource. Our proposed method to deal with this
problem is an automatic tag generalization algorithm which assigns general tags to
newly tagged resources. The objective of the algorithm is to provide a layer of tags
consisting of general concepts and to provide good support for a system user. The
proposed method automatically tags resources with more general and similar tags
to user-assigned tags. The method is unsupervised and domain independent. The
proposed tag generalization method consists of three stages: (1) the disambiguation
and concept mapping stage maps specific tags to Wikipedia articles representing the
same concept; (2) link based tag generalization is meant to find similar and more
general articles using the Wikipedia link structure; (3) the concept unification stage
where the system assigns tags based on the list of general articles. Evaluation on four
real-life tag data sets demonstrates that the proposed method is domain independent
and outperforms supervised tag recommendation systems for practical training data
set sizes.