Skip to search boxSkip to navigationSkip to main content

Entity-aware capsule network for multi-class classification of big data: a deep learning approach

  • Amit Kumar Jaiswal
    ,
  • Prayag Tiwari
    ,
  • Sahil Garg
    ,
  • M. Shamim Hossain
Research Output: Contribution to journal Article Peer-review

Abstract

Named entity recognition (NER) is one of the most challenging natural language processing (NLP) tasks, as its performance is related to constantly evolving languages and dependency on expert (human) annotation. The diverse and dynamic content on the web significantly raises the need for a more generalized approach—one that is capable of correctly classifying terms in a corpus and feeding subsequent NLP tasks, such as machine translation, query expansion, and many other applications. Although extensively researched in recent times, the variety of public corpora available nowadays provides room for new and more accurate methods to tackle the NER problem. This paper presents a novel method that uses deep learning techniques based on the capsule network architecture for predicting entities in a corpus. This type of network groups neurons into so-called capsules to detect specific features of an object without reducing the original input unlike convolutional neural networks and their ‘max-pooling’ strategy. Our extensive evaluation on several benchmarked datasets demonstrates how competitive our method is in comparison with state-of-the-art techniques and how the usage of the proposed architecture may represent a significant benefit to further NLP tasks, especially in cases where experts are needed. Also, we explore NER using a theoretical framework that leverages big data for security. For the sake of reproducibility, we make the codebase open-source.

Publication Information

Output type

Research Output: Contribution to journal Article Peer-review

Original language

English

Pages from-to (Number of pages)

Pages 1-11

Journal (Volume, Issue Number)

Future Generation Computer Systems (Volume 117)

Publication milestones

  • Accepted/In press - 14/11/2020
  • Published - 20/11/2020

Publication status

Published - 20/11/2020

ISSN

0167-739X

External Publication IDs

  • handle.net: 10547/624733
  • Scopus: 85097329829

Publication metrics