Skip to main navigation Skip to search Skip to main content

Enhancing audio classification through MFCC feature extraction and data augmentation with CNN and RNN models

  • Karim Mohammed Rezaul
  • , Md Jewel
  • , Md Shabiul Islam
  • , Kazy Noor e.Alam Siddiquee
  • , Nick Barua
  • , Muhammad Azizur Rahman
  • , Mohammad Shan-A-Khuda
  • , Rejwan Bin Sulaiman
  • , Md Sadeque Imam Shaikh
  • , Md Abrar Hamim
  • , F. M. Tanmoy
  • , Afraz Ul Haque
  • , Musarrat Saberin Nipun
  • , Navid Dorudian
  • , Amer Kareem
  • , Ahmmed Khondokar Farid
  • , Asma Mubarak
  • , Tajnuva Jannat
  • , Umme Fatema Tuj Asha
  • Wrexham University
  • Centre for Applied Research in Software & IT (CARSIT)
  • Multimedia University
  • State University of Bangladesh
  • Kobe Institute of Computing
  • Cardiff Metropolitan University
  • Leeds Beckett University
  • Northumbria University
  • Coventry University
  • Brunel University London
  • Canterbury Christ Church University

Research output: Contribution to journalArticlepeer-review

17 Citations (Scopus)
1 Downloads (Pure)

Abstract

Sound classification is a multifaceted task that necessitates the gathering and processing of vast quantities of data, as well as the construction of machine learning models that can accurately distinguish between various sounds. In our project, we implemented a novel methodology for classifying both musical instruments and environmental sounds, utilizing convolutional and recurrent neural networks. We used the Mel Frequency Cepstral Coefficient (MFCC) method to extract features from audio, which emulates the human auditory system and produces highly distinct features. Knowing how important data processing is, we implemented distinctive approaches, including a range of data augmentation and cleaning techniques, to achieve an optimized solution. The outcomes were noteworthy, as both the convolutional and recurrent neural network models achieved a commendable level of accuracy. As machine learning and deep learning continue to revolutionize image classification, it is high time to explore the development of adaptable models for audio classification. Despite the challenges associated with a small dataset, we successfully crafted our models using convolutional and recurrent neural networks. Overall, our strategy for sound classification bears significant implications for diverse domains, encompassing speech recognition, music production, and healthcare. We hold the belief that with further research and progress, our work can pave the way for breakthroughs in audio data classification and analysis.

Original languageEnglish
Pages (from-to)37-53
Number of pages17
JournalInternational Journal of Advanced Computer Science and Applications
Volume15
Issue number7
DOIs
Publication statusPublished - 2024

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 3 - Good Health and Well-being
    SDG 3 Good Health and Well-being

Keywords

  • audio segmentation
  • CNN
  • data augmentation
  • Deep learning (artificial intelligence)
  • discrete cosine transform
  • fast fourier transform
  • feature extraction
  • frame blocking
  • MFCC
  • RNN
  • signal processing

ASJC Scopus subject areas

  • General Computer Science

Fingerprint

Dive into the research topics of 'Enhancing audio classification through MFCC feature extraction and data augmentation with CNN and RNN models'. Together they form a unique fingerprint.

Cite this