Big Data reduction methods: a survey

Muhammad Habib Ur Rehman
,
Chee Sun Liew
,
Assad Abbas
,
Prem Prakash Jayaraman
,
Teh Ying Wah
,
Samee U. Khan

School of Computing Engineering and Creative Industries
,
University of Malaya
,
North Dakota State University
,
Swinburne University of Technology

Research Output: Contribution to journal Article Peer-review

Open access

Abstract

Research on big data analytics is entering in the new phase called fast data where multiple gigabytes of data arrive in the big data systems every second. Modern big data systems collect inherently complex data streams due to the volume, velocity, value, variety, variability, and veracity in the acquired data and consequently give rise to the 6Vs of big data. The reduced and relevant data streams are perceived to be more useful than collecting raw, redundant, inconsistent, and noisy data. Another perspective for big data reduction is that the million variables big datasets cause the curse of dimensionality which requires unbounded computational resources to uncover actionable knowledge patterns. This article presents a review of methods that are used for big data reduction. It also presents a detailed taxonomic discussion of big data reduction methods including the network theory, big data compression, dimension reduction, redundancy elimination, data mining, and machine learning methods. In addition, the open research issues pertinent to the big data reduction are also highlighted.

Publication Information

Output type

Research Output: Contribution to journal Article Peer-review

Original language

English

Pages from-to (Number of pages)

Pages 265–284

Journal (Volume, Issue Number)

Data Science and Engineering (Volume 1)

Publication milestones

Accepted/In press - 18/11/2016
Published - 10/12/2016

Publication status

Published - 10/12/2016

ISSN

2364-1185

External Publication IDs

ORCID: /0000-0001-7428-2272/work/63090974
Scopus: 85029474107

Access to documents

10.1007/s41019-016-0022-0

https://link.springer.com/article/10.1007/s41019-016-0022-0

Final published version

s41019-016-0022-0

Final published version, 934.31 KB

License:CC BY

Publication metrics

Metrics

Download statistics

Download count

Scopus

186

PlumX, opens in new tab

Social media

185

Captures

296