Home › Technology › Dealing with Annotator Disagreement in Hate Speech Classification

Technology

Dealing with Annotator Disagreement in Hate Speech Classification

arXiv CS Tuesday 09 June 2026, 04:00 UTC By Somaiyeh Dehghan, Mehmet Umut Sen, Berrin Yanikoglu 1 min read

Key Points

Announce Type: replace Abstract: Hate speech detection is a crucial task, especially on social media where harmful content can spread quickly. Collecting social media content (tweets etc.) to train machine learning models is easy, but detecting and categorizing hate speech can be difficult due to the inherently subjective nature. This subjectivity leads to frequent disagreement among annotators, particularly for subtle or borderline content.

arXiv:2502.08266v3 Announce Type: replace Abstract: Hate speech detection is a crucial task, especially on social media where harmful content can spread quickly. Collecting social media content (tweets etc.) to train machine learning models is easy, but detecting and categorizing hate speech can be difficult due to the inherently subjective nature. This subjectivity leads to frequent disagreement among annotators, particularly for subtle or borderline content. Traditional approaches either discard non-consensus samples or force a ''gold standard'' through expert adjudication, ignoring valuable information about uncertainty and diverse human perspectives. We examine the largely overlooked problem of annotator disagreement in hate speech classification and evaluate a range of aggregation methods, including majority voting, ordinal strategies (minimum, maximum, and mean), and analyze their impact across binary, 4-class, and 6-class classification tasks. In addition, we leverage annotators' perceived hate speech strength scores to explore regression-based and hybrid modeling approaches. Among others, we show that filtering non-consensus samples results in over-optimistic results and that the perceived strength provides a complementary signal that enhance classification performance. Finally, we establish new state-of-the-art results for hate speech detection in Turkish tweets, and demonstrate that annotator disagreement, when properly modeled, is a valuable resource for building more robust and reliable systems.

Turkish (ORG)

Originally published by arXiv CS Read original →

Dealing with Annotator Disagreement in Hate Speech Classification

Related Stories

Microsoft's Xbox plans for major layoffs next month, Bloomberg News reports

Xbox warns of a &#8216;reset&#8217; as it prepares for layoffs

Meta, X to Face Moratorium on Users Under 16 With New Canada Bill

'Bloom Is Off the Rose' for Oracle, Cleo's Kunst Says

Xbox warns of a ‘reset’ as it prepares for layoffs