Repository logo
 

MICROBLOG TEXT PARSING: A COMPARISON OF STATE-OF-THE-ART PARSERS

Date

2015

Authors

Abbas, Syed Muhammad Faisal

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Parsing is a natural language processing task in which relationships between words are deduced. It is essential for higher levels of semantic analysis, especially when predicates are required to be extracted from the text. Parsing is a widely established task and much effort has been put into devising good methods for it, which has resulted in reasonably accurate processing of this task. However, most of the work has been limited to formally written text such as news articles or discussion groups. Microblog text is a significant body of text that is written by laypeople in quite an informal language which is significantly different from formal written language so as to require special considerations. There are many applications in the area of analysis of microblog text that require high-quality and fast parsing, such as identification of user intentions. Dealing with large amount microblog text, we need to consider the running-time performance of the methods for many reasons: the amount of microblog text is huge and the pace new text is being generated is insurmountable, as well as the life span of its significance is very short. In this thesis we evaluated various parsers and their parsing performance as it relates to microblog text: we evaluated eight (8) state of the art parsers, five (5) of these parsers are inherently constituency (Phrase-Structure) parsers, while three (3) of them are dependency parsers. We compared all of the parsers after converting the output of constituency parsers to dependency trees and evaluated the performances using Unlabelled Attachment Score (UAS). In addition we compared the constituency parsers using PARSEVAL and FREVAL measures. Finally, we evaluated the selected parsers for their running-time performance as well.

Description

Keywords

Microblog, Parsing, Parser, Twitter

Citation