Skip to main navigation Skip to search Skip to main content

Classification of colloquial Arabic tweets in real-time to detect high-risk floods

  • Waleed Alabbas
  • , Haider M. Al-Khateeb
  • , Ali Mansour
  • , Gregory Epiphaniou
  • , Ingo Frommholz
  • University of Bedfordshire

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

26 Citations (Scopus)

Abstract

Twitter has eased real-time information flow for decision makers, it is also one of the key enablers for Open-source Intelligence (OSINT). Tweets mining has recently been used in the context of incident response to estimate the location and damage caused by hurricanes and earthquakes. We aim to research the detection of a specific type of high-risk natural disasters frequently occurring and causing casualties in the Arabian Peninsula, namely 'floods'. Researching how we could achieve accurate classification suitable for short informal (colloquial) Arabic text (usually used on Twitter), which is highly inconsistent and received very little attention in this field. First, we provide a thorough technical demonstration consisting of the following stages: data collection (Twitter REST API), labelling, text pre-processing, data division and representation, and training models. This has been deployed using 'R' in our experiment. We then evaluate classifiers' performance via four experiments conducted to measure the impact of different stemming techniques on the following classifiers SVM, J48, C5.0, NNET, NB and k-NN. The dataset used consisted of 1434 tweets in total. Our findings show that Support Vector Machine (SVM) was prominent in terms of accuracy (F1=0.933). Furthermore, applying McNemar's test shows that using SVM without stemming on Colloquial Arabic is significantly better than using stemming techniques.
Original languageEnglish
Title of host publication2017 International Conference On Social Media, Wearable And Web Analytics, Social Media 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1-8
Number of pages8
ISBN (Electronic)9781509050574
ISBN (Print)9781509050574
DOIs
Publication statusPublished - 6 Oct 2017
Event2017 International Conference On Social Media, Wearable And Web Analytics, Social Media 2017 - London, United Kingdom
Duration: 19 Jun 201720 Jun 2017

Publication series

Name2017 International Conference On Social Media, Wearable And Web Analytics, Social Media 2017
Volume2017-June

Conference

Conference2017 International Conference On Social Media, Wearable And Web Analytics, Social Media 2017
Country/TerritoryUnited Kingdom
CityLondon
Period19/06/1720/06/17

Keywords

  • Arabic text classification
  • big data
  • Colloquialism
  • Event detection
  • Real-time
  • Stemming
  • SVM
  • Twitter

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Hardware and Architecture
  • Information Systems

Fingerprint

Dive into the research topics of 'Classification of colloquial Arabic tweets in real-time to detect high-risk floods'. Together they form a unique fingerprint.

Cite this