Authorship Attribution using Written and Read Documents

Gujarati, Afsan

Authorship Attribution using Written and Read Documents

Files

Gujarati-Afsan-MEC-ECMM-August-2019.pdf (16.83 MB)

Date

2019-08-07T17:50:35Z

Authors

Gujarati, Afsan

Abstract

In Authorship Attribution (AA), a task of identifying the author on an unseen document, it is often hard to obtain large amounts of training text written by an author. In our research, we analyze the influence of the size of training data and we propose a novel alternative of using the documents read by the authors for the AA task. Although it becomes significantly more difficult to identify the author of an unseen document with less written data, classification performance can be drastically improved by using the documents read by the author. The Support Vector Machine method outperformed all the classifiers in the presence of the read documents with an average accuracy of 94.35%, a 23.57% increase after the addition of the read documents. It was found through the feature analysis that there exists a semantic similarity between the written and the read documents that played an important role in improved performance.

Keywords

Authorship attribution, machine learning, document classification, natural language processing, n-grams approach, data processing, data collection, limited training data, read documents

URI

http://hdl.handle.net/10222/76215

Collections

Faculty of Graduate Studies Online Theses

Full item page

Authorship Attribution using Written and Read Documents

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections