Abstract:
Social media is one of the most popular means of communication used today
such as Facebook, Instagram, YouTube and Twitter. With the rise of modern and
social media use, online interactions have become much more difficult to supervise,
in particular abusive comments containing hate speech. Hate speech can be a
motive for "cyber conflict" which can influence both individuals and communities.
Therefore, social media services are aiming to limit these sorts of offensive comments
without violating the right to freedom of expression. However, identifying if
a text contains hate speech or not is still a challenging task for both machines and
humans due to the complexity of human language. In this paper, we will present a
background on hate speech and its related detection approaches. Furthermore, we
present our work on detecting and monitoring hate speech-language in tweets using
machine learning methods: SVM, Logistic Regression, Naive Bayes and sentiment
analysis classification. We explain in detail our proposed approach to identify and
classify abusive text in Kaggle dataset tweets into two categories (hate speech and
non-hate speech), and evaluate the performance of the applied models. Our results
showed that the method that permits to obtain the best scores is logistic regression
with an accuracy of 74%.