On The Use of Vector Representation for Improved Accuracy and Currency of Twitter POS Tagging
Date
2017-01-12T13:39:04Z
Authors
Samuel, David
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The scarcity of labelled text corpora has inspired alternative methods for harnessing data for training and development of Natural Language Processing systems geared toward tasks such as Part-of-Speech (POS) tagging, Chunking and Semantic Role Labelling. Of particular interest is the performance of POS taggers on corpora which are largely informal and unstructured such as Twitter posts. In modern business activity, the expansion of social media networks has led to increased ’lead generation’ activity; POS taggers form a significant part of such activities. We have trained a neural network based POS tagger using commercially available, labelled Penn Tree-bank data together with Twitter word embeddings. Word embeddings (or vector representations) are generated from tweets and used for training of the POS tagger. We illustrate the value of harnessing tweets as an unlimited, freely available data source by demonstration of improved performance on tagging of twitter text.
Description
Keywords
Twitter, POS Tagging, Natural language processing (Computer science), Point-of-speech tagging