Automatic detection of hate speech in text: an overview of the topic and dataset annotation with hierarchical classes

Paula Cristina Teixeira Fortuna

doi:10.34626/hy8t-f260

Please use this identifier to cite or link to this item: https://hdl.handle.net/10216/106028

Full metadata record

DC Field	Value	Language
dc.creator	Paula Cristina Teixeira Fortuna
dc.date.accessioned	2025-11-06T16:26:52Z	-
dc.date.available	2025-11-06T16:26:52Z	-
dc.date.issued	2017-07-07
dc.date.submitted	2017-08-01
dc.identifier.other	sigarra:202853
dc.identifier.uri	https://hdl.handle.net/10216/106028	-
dc.description.abstract	Nowadays people are using more and more social networks to communicate their opinions, share information and experiences. In social networks people have the feeling of being deindividualized and can incur more frequently in aggressive communication. In this context, it is important that government and social networks platforms have tools to detect hate speech because it is harmful to its targets. In our work we investigate the problem of detecting hate speech online. Our first goal is to make a complete overview on the topic. However, describing the state of the art in the area of hate speech is not simple, because this topic is regarded by different areas, such as text mining, social sciences, and law. Our literature review is focused on the perspective of computer science and engineering and it is distinct from other works we found. We adopted an exhaustive and methodical method. We called it Systematic Literature Review. As a result, we concluded that the majority of the studies tackles this problem as a machine learning classification task and the studies use either general text mining features (e.g n-grams, word2vec), or hate speech specific features (e.g othering discourse). In the majority of the studies new datasets are collected, but those remain private, which makes more difficult to compare the results across the different studies. We concluded also that this field is still in an early stage, with several open research opportunities. As we found no research on the topic in Portuguese, the second goal of this work was to annotate a dataset for this language. Regarding the dataset annotation, we built a classification using a hierarchical structure. This is an innovative way of approaching the problem of hate speech automatic classification. Its main advantage is that it allows to better consider nuances in the hate speech concepts. We collect a dataset with 5,668 messages, from 1156 distinct users, annotated not only for hate speech, but also for more 83 subtypes of hate. Finally, we also try to prove that the hierarchical structure of classes used also allows to improve the performance of the classification models, since it is better suited for consider the different subtypes of hate speech and the intersections between those classes.
dc.language.iso	por
dc.rights	openAccess
dc.subject	Engenharia electrotécnica, electrónica e informática
dc.subject	Electrical engineering, Electronic engineering, Information engineering
dc.title	Automatic detection of hate speech in text: an overview of the topic and dataset annotation with hierarchical classes
dc.type	Dissertação
dc.contributor.uporto	Faculdade de Engenharia
dc.identifier.doi	10.34626/hy8t-f260
dc.identifier.tid	201801990
dc.subject.fos	Ciências da engenharia e tecnologias::Engenharia electrotécnica, electrónica e informática
dc.subject.fos	Engineering and technology::Electrical engineering, Electronic engineering, Information engineering
thesis.degree.discipline	Mestrado Integrado em Engenharia Informática e Computação
thesis.degree.grantor	Faculdade de Engenharia
thesis.degree.grantor	Universidade do Porto
thesis.degree.level	1
Appears in Collections:	FEUP - Dissertação

Files in This Item:

File	Description	Size	Format
202853.pdf	Automatic detection of hate speech in text: an overview of the topic and dataset annotation with hierarchical classes	1.61 MB	Adobe PDF	View/Open

Show simple item record Recommend this item Display Statistics