On The Use of Vector Representation for Improved Accuracy and Currency of Twitter POS Tagging

Samuel, David

On The Use of Vector Representation for Improved Accuracy and Currency of Twitter POS Tagging

Files

Samuel-David-MCS-CSCI-Jan-2017.pdf (3.87 MB)

Date

2017-01-12T13:39:04Z

Authors

Samuel, David

Abstract

The scarcity of labelled text corpora has inspired alternative methods for harnessing data for training and development of Natural Language Processing systems geared toward tasks such as Part-of-Speech (POS) tagging, Chunking and Semantic Role Labelling. Of particular interest is the performance of POS taggers on corpora which are largely informal and unstructured such as Twitter posts. In modern business activity, the expansion of social media networks has led to increased ’lead generation’ activity; POS taggers form a significant part of such activities. We have trained a neural network based POS tagger using commercially available, labelled Penn Tree-bank data together with Twitter word embeddings. Word embeddings (or vector representations) are generated from tweets and used for training of the POS tagger. We illustrate the value of harnessing tweets as an unlimited, freely available data source by demonstration of improved performance on tagging of twitter text.

Keywords

Twitter, POS Tagging, Natural language processing (Computer science), Point-of-speech tagging

URI

http://hdl.handle.net/10222/72631

Collections

Faculty of Graduate Studies Online Theses

Full item page

On The Use of Vector Representation for Improved Accuracy and Currency of Twitter POS Tagging

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections