Authorship Attribution using Written and Read Documents
dc.contributor.author | Gujarati, Afsan | |
dc.contributor.copyright-release | Not Applicable | en_US |
dc.contributor.degree | Master of Electronic Commerce | en_US |
dc.contributor.department | Faculty of Computer Science | en_US |
dc.contributor.ethics-approval | Not Applicable | en_US |
dc.contributor.external-examiner | n/a | en_US |
dc.contributor.graduate-coordinator | Michael McAllister | en_US |
dc.contributor.manuscripts | Not Applicable | en_US |
dc.contributor.thesis-reader | Dr. Stan Matwin | en_US |
dc.contributor.thesis-reader | Dr. Evangelos Milios | en_US |
dc.contributor.thesis-supervisor | Dr. Vlado Keselj | en_US |
dc.date.accessioned | 2019-08-07T17:50:35Z | |
dc.date.available | 2019-08-07T17:50:35Z | |
dc.date.defence | 2019-07-05 | |
dc.date.issued | 2019-08-07T17:50:35Z | |
dc.description.abstract | In Authorship Attribution (AA), a task of identifying the author on an unseen document, it is often hard to obtain large amounts of training text written by an author. In our research, we analyze the influence of the size of training data and we propose a novel alternative of using the documents read by the authors for the AA task. Although it becomes significantly more difficult to identify the author of an unseen document with less written data, classification performance can be drastically improved by using the documents read by the author. The Support Vector Machine method outperformed all the classifiers in the presence of the read documents with an average accuracy of 94.35%, a 23.57% increase after the addition of the read documents. It was found through the feature analysis that there exists a semantic similarity between the written and the read documents that played an important role in improved performance. | en_US |
dc.identifier.uri | http://hdl.handle.net/10222/76215 | |
dc.language.iso | en_US | en_US |
dc.subject | Authorship attribution | en_US |
dc.subject | machine learning | en_US |
dc.subject | document classification | en_US |
dc.subject | natural language processing | en_US |
dc.subject | n-grams approach | en_US |
dc.subject | data processing | en_US |
dc.subject | data collection | en_US |
dc.subject | limited training data | en_US |
dc.subject | read documents | en_US |
dc.title | Authorship Attribution using Written and Read Documents | en_US |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Gujarati-Afsan-MEC-ECMM-August-2019.pdf
- Size:
- 16.83 MB
- Format:
- Adobe Portable Document Format
- Description:
- Thesis Submission
License bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- license.txt
- Size:
- 1.71 KB
- Format:
- Item-specific license agreed upon to submission
- Description: