Skip to main navigation Skip to search Skip to main content

Arabic text classification methods: Systematic literature review of primary studies

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

29 Citations (Scopus)

Abstract

Recent research on Big Data proposed and evaluated a number of advanced techniques to gain meaningful information from the complex and large volume of data available on the World Wide Web. To achieve accurate text analysis, a process is usually initiated with a Text Classification (TC) method. Reviewing the very recent literature in this area shows that most studies are focused on English (and other scripts) while attempts on classifying Arabic texts remain relatively very limited. Hence, we intend to contribute the first Systematic Literature Review (SLR) utilizing a search protocol strictly to summarize key characteristics of the different TC techniques and methods used to classify Arabic text, this work also aims to identify and share a scientific evidence of the gap in current literature to help suggesting areas for further research. Our SLR explicitly investigates empirical evidence as a decision factor to include studies, then conclude which classifier produced more accurate results. Further, our findings identify the lack of standardized corpuses for Arabic text; authors compile their own, and most of the work is focused on Modern Arabic with very little done on Colloquial Arabic despite its wide use in Social Media Networks such as Twitter. In total, 1464 papers were surveyed from which 48 primary studies were included and analyzed.
Original languageEnglish
Title of host publication2016 4th IEEE International Colloquium on Information Science and Technology (CiSt)
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages361-367
Volume0
ISBN (Electronic)9781509007516
ISBN (Print)9781509007523
DOIs
Publication statusPublished - 5 Jan 2017
Event4th IEEE International Colloquium on Information Science and Technology (CiSt) - Tangier
Duration: 24 Oct 201626 Oct 2016

Conference

Conference4th IEEE International Colloquium on Information Science and Technology (CiSt)
CityTangier
Period24/10/1626/10/16
Other4th IEEE International Colloquium on Information Science and Technology (CiSt) (24/10/2016-26/10/2016, Tangier)

Keywords

  • Arabic text classification
  • Big Data
  • Text corpus
  • data mining
  • systematic literature review

Fingerprint

Dive into the research topics of 'Arabic text classification methods: Systematic literature review of primary studies'. Together they form a unique fingerprint.

Cite this