NLP AND MACHINE LEARNING TECHNIQUES TO DETECT ONLINE HARASSMENT ON SOCIAL NETWORKING PLATFORMS
Date
2019-08-28T16:33:59Z
Authors
Sharifirad, Sima
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Social media has become an unavoidable part of our daily lives. It attracts di erent
users with di erent mindsets. In particular, Twitter is a platform with a diverse
audience who engage in di erent topics and interact with personalities from all walks
of life. Even though Twitter may be considered as a mere re
ection of the discourse
between people, it paves a way for certain types of hostile behavior. Twitter not only
ampli es marginalized voices but also harassment. Often this harassment is directed
towards female users and the purpose is to silence women by threatening, insulting,
ignoring or driving them away from Twitter, making it simply the newest wrinkle in
that long history of exclusion from public spaces and conversations. Although there
exist several recent articles focusing on toxicity, hateful speech and cyberbullying,
they do not focus on women as their target. Due to the nature of Twitter, collecting
tweets which represent online harassment requires speci c ltering of the hashtags and
the nal dataset is usually scarce, valuable and imbalanced. In addition, these tweets
carry di erent emotions and are enforced on the victims with di erent intensities.
Therefore, understanding this language can help us
ag the tweets, understand the
mental state of the users and train smart machine learning algorithms which can
help us detect and understand this language better. In this thesis, we revisited
the problem of toxicity and hateful speech focusing on women as the audience and
trying to understand di erent manifestations of it using machine learning and natural
language processing techniques. The problem was formulated as a classi cation task
and a dataset was formed for this purpose. After that, methods were proposed to
increase the quality of the classi cation and test the e ectual state of the users and
a real case study was examined.
Description
Keywords
Natural language processing, machine learning