Repository logo
 

On The Use of Vector Representation for Improved Accuracy and Currency of Twitter POS Tagging

dc.contributor.authorSamuel, David
dc.contributor.copyright-releaseNot Applicableen_US
dc.contributor.degreeMaster of Computer Scienceen_US
dc.contributor.departmentFaculty of Computer Scienceen_US
dc.contributor.ethics-approvalNot Applicableen_US
dc.contributor.external-examinerN/Aen_US
dc.contributor.graduate-coordinatorDr. Malcolm Heywooden_US
dc.contributor.manuscriptsNot Applicableen_US
dc.contributor.thesis-readerDr. Evangelos Miliosen_US
dc.contributor.thesis-readerDr. James Blusteinen_US
dc.contributor.thesis-supervisorDr. Stan Matwinen_US
dc.date.accessioned2017-01-12T13:39:04Z
dc.date.available2017-01-12T13:39:04Z
dc.date.defence2016-12-16
dc.date.issued2017-01-12T13:39:04Z
dc.description.abstractThe scarcity of labelled text corpora has inspired alternative methods for harnessing data for training and development of Natural Language Processing systems geared toward tasks such as Part-of-Speech (POS) tagging, Chunking and Semantic Role Labelling. Of particular interest is the performance of POS taggers on corpora which are largely informal and unstructured such as Twitter posts. In modern business activity, the expansion of social media networks has led to increased ’lead generation’ activity; POS taggers form a significant part of such activities. We have trained a neural network based POS tagger using commercially available, labelled Penn Tree-bank data together with Twitter word embeddings. Word embeddings (or vector representations) are generated from tweets and used for training of the POS tagger. We illustrate the value of harnessing tweets as an unlimited, freely available data source by demonstration of improved performance on tagging of twitter text.en_US
dc.identifier.urihttp://hdl.handle.net/10222/72631
dc.language.isoenen_US
dc.subjectTwitteren_US
dc.subjectPOS Taggingen_US
dc.subjectNatural language processing (Computer science)
dc.subjectPoint-of-speech tagging
dc.titleOn The Use of Vector Representation for Improved Accuracy and Currency of Twitter POS Taggingen_US
dc.typeThesisen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Samuel-David-MCS-CSCI-Jan-2017.pdf
Size:
3.87 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: