MICROBLOG TEXT PARSING: A COMPARISON OF STATE-OF-THE-ART PARSERS
Date
2015
Authors
Abbas, Syed Muhammad Faisal
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Parsing is a natural language processing task in which relationships between words are
deduced. It is essential for higher levels of semantic analysis, especially when predicates
are required to be extracted from the text.
Parsing is a widely established task and much effort has been put into devising good
methods for it, which has resulted in reasonably accurate processing of this task. However,
most of the work has been limited to formally written text such as news articles or
discussion groups. Microblog text is a significant body of text that is written by laypeople
in quite an informal language which is significantly different from formal written
language so as to require special considerations. There are many applications in the
area of analysis of microblog text that require high-quality and fast parsing, such as
identification of user intentions.
Dealing with large amount microblog text, we need to consider the running-time performance
of the methods for many reasons: the amount of microblog text is huge and
the pace new text is being generated is insurmountable, as well as the life span of its
significance is very short.
In this thesis we evaluated various parsers and their parsing performance as it relates
to microblog text: we evaluated eight (8) state of the art parsers, five (5) of these
parsers are inherently constituency (Phrase-Structure) parsers, while three (3) of them
are dependency parsers. We compared all of the parsers after converting the output
of constituency parsers to dependency trees and evaluated the performances using Unlabelled
Attachment Score (UAS). In addition we compared the constituency parsers
using PARSEVAL and FREVAL measures. Finally, we evaluated the selected parsers
for their running-time performance as well.
Description
Keywords
Microblog, Parsing, Parser, Twitter