Skip to main navigation Skip to search Skip to main content

Large vision language model: enhanced-RSCLIP with exemplar-image prompting for uncommon object detection in satellite imagery

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)
1 Downloads (Pure)

Abstract

Large Vision Language Models (LVLMs) have shown promise in remote sensing applications, yet struggle with “uncommon” objects that lack sufficient public labeled data. This paper presents Enhanced-RSCLIP, a novel dual-prompt architecture that combines text prompting with exemplar-image processing for cattle herd detection in satellite imagery. Our approach introduces a key innovation where an exemplar-image preprocessing module using crop-based or attention-based algorithms extracts focused object features which are fed as a dual stream to a contrastive learning framework that fuses textual descriptions with visual exemplar embeddings. We evaluated our method on a custom dataset of 260 satellite images across UK and Nigerian regions. Enhanced-RSCLIP with crop-based exemplar processing achieved 72% accuracy in cattle detection and 56.2% overall accuracy on cross-domain transfer tasks, significantly outperforming text-only CLIP (31% overall accuracy). The dual-prompt architecture enables effective few-shot learning and cross-regional transfer from data-rich (UK) to data-sparse (Nigeria) environments, demonstrating a 41% improvement over baseline approaches for uncommon object detection in satellite imagery.
Original languageEnglish
Article number3071
JournalElectronics
Volume14
Issue number15
DOIs
Publication statusPublished - 31 Jul 2025

Keywords

  • LVLM
  • LVM
  • RS-CLIP
  • few-shot
  • remote sensing
  • satellite imagery

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Signal Processing
  • Hardware and Architecture
  • Computer Networks and Communications
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Large vision language model: enhanced-RSCLIP with exemplar-image prompting for uncommon object detection in satellite imagery'. Together they form a unique fingerprint.

Cite this