Skip to search boxSkip to navigationSkip to main content

Overview of the HASOC subtrack at FIRE 2021: Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages

  • Amit Kumar Jaiswal
    ,
  • Thomas Mandl
    ,
  • Sandip Modha
    ,
  • Gautam Kishore Shahi
    ,
  • Hiren Madhu
    ,
  • Shrey Satapara
Research Output: Chapter in Book/Report/Conference proceeding Conference contribution Peer-review

Open access

Abstract

The widespread of offensive content online such as hate speech poses a growing societal problem. AI tools are necessary for supporting the moderation process at online platforms. For the evaluation of these identification tools, continuous experimentation with data sets in different languages are necessary. The HASOC track (Hate Speech and Offensive Content Identification) is dedicated to develop benchmark data for this purpose. This paper presents the HASOC subtrack for English, Hindi, and Marathi. The data set was assembled from Twitter. This subtrack has two sub-tasks. Task A is a binary classification problem (Hate and Not Offensive) offered for all three languages. Task B is a fine-grained classification problem for three classes (HATE) Hate speech, OFFENSIVE and PROFANITY offered for English and Hindi. Overall, 652 runs were submitted by 65 teams. The performance of the best classification algorithms for task A are F1 measures 0.91, 0.78 and 0.83 for Marathi, Hindi and English, respectively. This overview presents the tasks and the data development as well as the detailed results. The systems submitted to the competition applied a variety of technologies. The best performing algorithms were mainly variants of transformer architectures.

Publication Information

Output type

Research Output: Chapter in Book/Report/Conference proceeding Conference contribution Peer-review

Original language

English

Pages from-to (Number of pages)

Pages 1-3 (3 pages)

Publication milestones

  • Published - 26/01/2022

Publication status

Published - 26/01/2022

Volume

3159

Publisher

Association for Computing Machinery, United States

Publication series

  • Publication series name: ACM International Conference Proceeding Series
9781450395960

ISBN (Electronic)

9781450395960

External Publication IDs

  • handle.net: 10547/625661
  • Scopus: 85124344402

Host publication title

FIRE 2021 - Proceedings of the 13th Annual Meeting of the Forum for Information Retrieval Evaluation

Host publication editors

  • Debasis Ganguly
  • Surupendu Gangopadhyay
  • Mandar Mitra
  • Prasenjit Majumder
  • Prasenjit Majumder