Show simple item record

dc.contributor.authorKosmajac, Dijana
dc.date.accessioned2020-04-07T12:40:24Z
dc.date.available2020-04-07T12:40:24Z
dc.date.issued2020-04-07T12:40:24Z
dc.identifier.urihttp://hdl.handle.net/10222/78390
dc.description.abstractOver the past couple of decades, the advancement and growth of digital information and communication technologies have resulted in information explosion and these technologies are profoundly changing all aspects of modern society. The popularization of the Internet and mobile technologies fueled the rise of social media, providing technological platforms for information spreading, content generation, and interactive communication, which has been contributing to the global data growth. Additionally, social media have become one of the main outlets for obtaining information about latest news, people, businesses, services, etc. The research on it has gained traction having in mind the growing interest in the applications and related technical and social science challenges and opportunities. One of the big challenges of the widespread online textual data is the structure and size. Structurally, it is not in proper grammatical form, has slang, emoticons, improper sentences, which is the standard way we communicate daily. Size-wise, the text is usually very short. However, this is not only the case with the online data; medical notes, open-ended survey questions, various old-school maintenance reports are just some of the examples. We particularly focus on the problem of author profiling on short texts in three different domains. Automatic author profiling is a set of methods to determine an author's (or group of authors') gender, age, native language, personality type and similar, which can be useful in different application contexts such as forensics, security, marketing, product personalisation, socio-demographic analyses and so on. In the first task, we explore fine-grained language dialect/variety identification and propose a new feature weighting scheme. In the second task, we work on bot detection on social media and propose a simple, but efficient method based on statistical diversity measures. In the third task, we present some interesting findings on topic modelling in relation to author on open-ended survey questions from the Canadian Longitudinal Study on Aging (CLSA).en_US
dc.language.isoenen_US
dc.subjectLanguage Identification (LID)en_US
dc.subjectSocial Media Bot Detectionen_US
dc.subjectTopic Modelling on Short Textsen_US
dc.subjectSocial Media Miningen_US
dc.titleAuthor and Language Profiling of Short Textsen_US
dc.typeThesisen_US
dc.date.defence2020-03-02
dc.contributor.departmentFaculty of Computer Scienceen_US
dc.contributor.degreeDoctor of Philosophyen_US
dc.contributor.external-examinerDr. Robert Merceren_US
dc.contributor.graduate-coordinatorDr. Michael McAllisteren_US
dc.contributor.thesis-readerDr. Evangelos Miliosen_US
dc.contributor.thesis-readerDr. Stan Matwinen_US
dc.contributor.thesis-supervisorDr. Vlado Keseljen_US
dc.contributor.ethics-approvalNot Applicableen_US
dc.contributor.manuscriptsNot Applicableen_US
dc.contributor.copyright-releaseNot Applicableen_US
 Find Full text

Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record