Repository logo
 

NLP AND MACHINE LEARNING TECHNIQUES TO DETECT ONLINE HARASSMENT ON SOCIAL NETWORKING PLATFORMS

Date

2019-08-28T16:33:59Z

Authors

Sharifirad, Sima

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Social media has become an unavoidable part of our daily lives. It attracts di erent users with di erent mindsets. In particular, Twitter is a platform with a diverse audience who engage in di erent topics and interact with personalities from all walks of life. Even though Twitter may be considered as a mere re ection of the discourse between people, it paves a way for certain types of hostile behavior. Twitter not only ampli es marginalized voices but also harassment. Often this harassment is directed towards female users and the purpose is to silence women by threatening, insulting, ignoring or driving them away from Twitter, making it simply the newest wrinkle in that long history of exclusion from public spaces and conversations. Although there exist several recent articles focusing on toxicity, hateful speech and cyberbullying, they do not focus on women as their target. Due to the nature of Twitter, collecting tweets which represent online harassment requires speci c ltering of the hashtags and the nal dataset is usually scarce, valuable and imbalanced. In addition, these tweets carry di erent emotions and are enforced on the victims with di erent intensities. Therefore, understanding this language can help us ag the tweets, understand the mental state of the users and train smart machine learning algorithms which can help us detect and understand this language better. In this thesis, we revisited the problem of toxicity and hateful speech focusing on women as the audience and trying to understand di erent manifestations of it using machine learning and natural language processing techniques. The problem was formulated as a classi cation task and a dataset was formed for this purpose. After that, methods were proposed to increase the quality of the classi cation and test the e ectual state of the users and a real case study was examined.

Description

Keywords

Natural language processing, machine learning

Citation