Please use this identifier to cite or link to this item:
https://hdl.handle.net/10216/106028Full metadata record
| DC Field | Value | Language |
|---|---|---|
| dc.creator | Paula Cristina Teixeira Fortuna | |
| dc.date.accessioned | 2025-11-06T16:26:52Z | - |
| dc.date.available | 2025-11-06T16:26:52Z | - |
| dc.date.issued | 2017-07-07 | |
| dc.date.submitted | 2017-08-01 | |
| dc.identifier.other | sigarra:202853 | |
| dc.identifier.uri | https://hdl.handle.net/10216/106028 | - |
| dc.description.abstract | Nowadays people are using more and more social networks to communicate their opinions, share information and experiences. In social networks people have the feeling of being deindividualized and can incur more frequently in aggressive communication. In this context, it is important that government and social networks platforms have tools to detect hate speech because it is harmful to its targets. In our work we investigate the problem of detecting hate speech online. Our first goal is to make a complete overview on the topic. However, describing the state of the art in the area of hate speech is not simple, because this topic is regarded by different areas, such as text mining, social sciences, and law. Our literature review is focused on the perspective of computer science and engineering and it is distinct from other works we found. We adopted an exhaustive and methodical method. We called it Systematic Literature Review. As a result, we concluded that the majority of the studies tackles this problem as a machine learning classification task and the studies use either general text mining features (e.g n-grams, word2vec), or hate speech specific features (e.g othering discourse). In the majority of the studies new datasets are collected, but those remain private, which makes more difficult to compare the results across the different studies. We concluded also that this field is still in an early stage, with several open research opportunities. As we found no research on the topic in Portuguese, the second goal of this work was to annotate a dataset for this language. Regarding the dataset annotation, we built a classification using a hierarchical structure. This is an innovative way of approaching the problem of hate speech automatic classification. Its main advantage is that it allows to better consider nuances in the hate speech concepts. We collect a dataset with 5,668 messages, from 1156 distinct users, annotated not only for hate speech, but also for more 83 subtypes of hate. Finally, we also try to prove that the hierarchical structure of classes used also allows to improve the performance of the classification models, since it is better suited for consider the different subtypes of hate speech and the intersections between those classes. | |
| dc.language.iso | por | |
| dc.rights | openAccess | |
| dc.subject | Engenharia electrotécnica, electrónica e informática | |
| dc.subject | Electrical engineering, Electronic engineering, Information engineering | |
| dc.title | Automatic detection of hate speech in text: an overview of the topic and dataset annotation with hierarchical classes | |
| dc.type | Dissertação | |
| dc.contributor.uporto | Faculdade de Engenharia | |
| dc.identifier.doi | 10.34626/hy8t-f260 | |
| dc.identifier.tid | 201801990 | |
| dc.subject.fos | Ciências da engenharia e tecnologias::Engenharia electrotécnica, electrónica e informática | |
| dc.subject.fos | Engineering and technology::Electrical engineering, Electronic engineering, Information engineering | |
| thesis.degree.discipline | Mestrado Integrado em Engenharia Informática e Computação | |
| thesis.degree.grantor | Faculdade de Engenharia | |
| thesis.degree.grantor | Universidade do Porto | |
| thesis.degree.level | 1 | |
| Appears in Collections: | FEUP - Dissertação | |
Files in This Item:
| File | Description | Size | Format | |
|---|---|---|---|---|
| 202853.pdf | Automatic detection of hate speech in text: an overview of the topic and dataset annotation with hierarchical classes | 1.61 MB | Adobe PDF | ![]() View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
