Skip to main navigation Skip to search Skip to main content

Overview of the HASOC subtrack at FIRE 2021: Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages

  • Amit Kumar Jaiswal
  • , Thomas Mandl
  • , Sandip Modha
  • , Gautam Kishore Shahi
  • , Hiren Madhu
  • , Shrey Satapara
  • , Prasenjit Majumder
  • , Johannes Schäfer
  • , Tharindu Ranasinghe
  • , Marcos Zampieri
  • , Durgesh Nandini

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

62 Citations (Scopus)
2 Downloads (Pure)

Abstract

The widespread of offensive content online such as hate speech poses a growing societal problem. AI tools are necessary for supporting the moderation process at online platforms. For the evaluation of these identification tools, continuous experimentation with data sets in different languages are necessary. The HASOC track (Hate Speech and Offensive Content Identification) is dedicated to develop benchmark data for this purpose. This paper presents the HASOC subtrack for English, Hindi, and Marathi. The data set was assembled from Twitter. This subtrack has two sub-tasks. Task A is a binary classification problem (Hate and Not Offensive) offered for all three languages. Task B is a fine-grained classification problem for three classes (HATE) Hate speech, OFFENSIVE and PROFANITY offered for English and Hindi. Overall, 652 runs were submitted by 65 teams. The performance of the best classification algorithms for task A are F1 measures 0.91, 0.78 and 0.83 for Marathi, Hindi and English, respectively. This overview presents the tasks and the data development as well as the detailed results. The systems submitted to the competition applied a variety of technologies. The best performing algorithms were mainly variants of transformer architectures.
Original languageEnglish
Title of host publicationFIRE 2021 - Proceedings of the 13th Annual Meeting of the Forum for Information Retrieval Evaluation
EditorsDebasis Ganguly, Surupendu Gangopadhyay, Mandar Mitra, Prasenjit Majumder, Prasenjit Majumder
PublisherAssociation for Computing Machinery
Pages1-3
Number of pages3
Volume3159
ISBN (Electronic)9781450395960
ISBN (Print)9781450395960
DOIs
Publication statusPublished - 26 Jan 2022
EventFIRE '21 : 13th Annual Meeting of the Forum for Information Retrieval Evaluation - Online
Duration: 13 Dec 202117 Dec 2021

Publication series

NameACM International Conference Proceeding Series

Conference

ConferenceFIRE '21 : 13th Annual Meeting of the Forum for Information Retrieval Evaluation
CityOnline
Period13/12/2117/12/21
OtherFIRE '21 : 13th Annual Meeting of the Forum for Information Retrieval Evaluation (13/12/2021-17/12/2021, Online)

Keywords

  • Deep learning
  • Hate Speech
  • Multilingual Text Classification
  • Offensive Language
  • Social Media
  • machine learning
  • Multilingual Datasets
  • Under-resourced language
  • hate speech
  • social media

ASJC Scopus subject areas

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Overview of the HASOC subtrack at FIRE 2021: Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages'. Together they form a unique fingerprint.

Cite this