Skip to main navigation Skip to search Skip to main content

ViMRT: a text-mining tool and search engine for automated virus mutation recognition

  • Yuantao Tong
  • , Fanglin Tan
  • , Honglian Huang
  • , Zeyu Zhang
  • , Hui Zong
  • , Yujia Xie
  • , Danqi Huang
  • , Shiyang Cheng
  • , Ziyi Wei
  • , Meng Fang
  • , James Crabbe
  • , Ying Wang
  • , Xiaoyan Zhang
  • Tongji University
  • Eastern Hepatobiliary Surgery Hospital, Shanghai
  • University of Oxford
  • Shanxi University

Research output: Contribution to journalArticlepeer-review

10 Citations (Scopus)
2 Downloads (Pure)

Abstract

Motivation: Virus mutation is one of the most important research issues which plays a critical role in disease progression and has prompted substantial scientific publications. Mutation extraction from published literature has become an increasingly important task, benefiting many downstream applications such as vaccine design and drug usage. However, most existing approaches have low performances in extracting virus mutation due to both lack of precise virus mutation information and their development based on human gene mutations. Results: We developed ViMRT, a text-mining tool and search engine for automated virus mutation recognition using natural language processing. ViMRT mainly developed 8 optimized rules and 12 regular expressions based on a development dataset comprising 830 papers of 5 human severe disease-related viruses. It achieved higher performance than other tools in a test dataset (1662 papers, 99.17% in F1-score) and has been applied well to two other viruses, influenza virus and severe acute respiratory syndrome coronavirus-2 (212 papers, 96.99% in F1-score). These results indicate that ViMRT is a high-performance method for the extraction of virus mutation from the biomedical literature. Besides, we present a search engine for researchers to quickly find and accurately search virus mutation-related information including virus genes and related diseases.

Original languageEnglish
Article numberbtac721
JournalBioinformatics
Volume39
Issue number1
DOIs
Publication statusPublished - 7 Nov 2022

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 3 - Good Health and Well-being
    SDG 3 Good Health and Well-being

Keywords

  • ViMRT
  • search engine
  • text-mining
  • virus mutation recognition
  • Data Mining/methods
  • Search Engine
  • Mutation
  • Viruses/genetics

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Fingerprint

Dive into the research topics of 'ViMRT: a text-mining tool and search engine for automated virus mutation recognition'. Together they form a unique fingerprint.

Cite this