Skip to main navigation Skip to search Skip to main content

Learning instruction-guided manipulation affordance via large models for embodied robotic tasks

  • Dayou Li
  • , Chenkun Zhao
  • , Shuo Yang
  • , Lin Ma
  • , Yibin Li
  • , Wei Zhang
  • Shandong University
  • Qilu Hospital of Shandong University
  • Meituan

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)

Abstract

We study the task of language instruction-guided robotic manipulation, in which an embodied robot is supposed to manipulate the target objects based on the language instructions. In previous studies, the predicted manipulation regions of the target object typically do not change with specification from the language instructions, which means that the language perception and manipulation prediction are separate. However, in human behavioral patterns, the manipulation regions of the same object will change for different language instructions. In this paper, we propose Instruction-Guided Affordance Net (IGANet) for predicting affordance maps of instruction-guided robotic manipulation tasks by utilizing powerful priors from vision and language encoders pre-trained on large-scale datasets. We develop a Vison-Language-Models(VLMs)-based data augmentation pipeline, which can generate a large amount of data automatically for model training. Besides, with the help of Large-Language-Models(LLMs), actions can be effectively executed to finish the tasks defined by instructions. A series of real-world experiments revealed that our method can achieve better performance with generated data. Moreover, our model can generalize better to scenarios with unseen objects and language instructions.

Original languageEnglish
Title of host publicationICARM 2024 - 2024 9th IEEE International Conference on Advanced Robotics and Mechatronics
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages662-667
Number of pages6
ISBN (Electronic)9798350385724
DOIs
Publication statusPublished - 18 Oct 2024
Event9th IEEE International Conference on Advanced Robotics and Mechatronics, ICARM 2024 - Tokyo, Japan
Duration: 8 Jul 202410 Jul 2024

Publication series

NameICARM 2024 - 2024 9th IEEE International Conference on Advanced Robotics and Mechatronics

Conference

Conference9th IEEE International Conference on Advanced Robotics and Mechatronics, ICARM 2024
Country/TerritoryJapan
CityTokyo
Period8/07/2410/07/24

ASJC Scopus subject areas

  • Artificial Intelligence
  • Electrical and Electronic Engineering
  • Mechanical Engineering
  • Safety, Risk, Reliability and Quality
  • Control and Optimization
  • Modeling and Simulation

Fingerprint

Dive into the research topics of 'Learning instruction-guided manipulation affordance via large models for embodied robotic tasks'. Together they form a unique fingerprint.

Cite this