Repository logo
 

On The Use of Vector Representation for Improved Accuracy and Currency of Twitter POS Tagging

Date

2017-01-12T13:39:04Z

Authors

Samuel, David

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

The scarcity of labelled text corpora has inspired alternative methods for harnessing data for training and development of Natural Language Processing systems geared toward tasks such as Part-of-Speech (POS) tagging, Chunking and Semantic Role Labelling. Of particular interest is the performance of POS taggers on corpora which are largely informal and unstructured such as Twitter posts. In modern business activity, the expansion of social media networks has led to increased ’lead generation’ activity; POS taggers form a significant part of such activities. We have trained a neural network based POS tagger using commercially available, labelled Penn Tree-bank data together with Twitter word embeddings. Word embeddings (or vector representations) are generated from tweets and used for training of the POS tagger. We illustrate the value of harnessing tweets as an unlimited, freely available data source by demonstration of improved performance on tagging of twitter text.

Description

Keywords

Twitter, POS Tagging, Natural language processing (Computer science), Point-of-speech tagging

Citation