Repository logo
 

Compromised Tweet Detection Using weighted sub-word embeddings

dc.contributor.authorJoshi, Mihir
dc.contributor.copyright-releaseNot Applicableen_US
dc.contributor.degreeMaster of Computer Scienceen_US
dc.contributor.departmentFaculty of Computer Scienceen_US
dc.contributor.ethics-approvalNot Applicableen_US
dc.contributor.external-examinern/aen_US
dc.contributor.graduate-coordinatorMichael McAllisteren_US
dc.contributor.manuscriptsNot Applicableen_US
dc.contributor.thesis-readerDr. Srinivas Sampallien_US
dc.contributor.thesis-readerDr. Malcolm Heywooden_US
dc.contributor.thesis-supervisorDr. Nur zincir-Heywooden_US
dc.date.accessioned2019-08-07T18:18:46Z
dc.date.available2019-08-07T18:18:46Z
dc.date.defence2019-07-31
dc.date.issued2019-08-07T18:18:46Z
dc.description.abstractExtracting features and writing styles from short text messages for compromised tweet detection is always a challenge. Short messages, such as tweets, do not have enough data to perform statistical authorship attribution. Besides, the vocabulary used in these texts is sometimes improvised or misspelled. Therefore, in this thesis, I propose combining four feature extraction techniques namely character n-grams, word n-grams, Flexible Patterns and a new sub-word embedding using the skip-gram model. The proposed system uses a Multi-Layer Perceptron to utilize these features from tweets to analyze short text messages. This proposed system achieves 85\% accuracy, which is a considerable improvement over previous systems. Furthermore, Siamese networks are employed to model the representation of user tweets in order to identify them based on a limited amount of ground truth data. The results show that the proposed system achieves a promising accuracy as the number of authors increase.en_US
dc.identifier.urihttp://hdl.handle.net/10222/76216
dc.language.isoen_USen_US
dc.subjectNatural Language Processingen_US
dc.subjectMachine Learningen_US
dc.subjectSecurity Managementen_US
dc.titleCompromised Tweet Detection Using weighted sub-word embeddingsen_US
dc.typeThesisen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Joshi-Mihir-MCS-CSCI-July-2019.pdf
Size:
1000.37 KB
Format:
Adobe Portable Document Format
Description:
Thesis Manuscript

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: