Skip to main navigation Skip to search Skip to main content

Detection of LLM deceptive behaviour triggered by the poisonous context injection: the problem demonstration

  • Stanislav Selitskiy*
  • , Chihiro Inoue
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

This paper presents a focused demonstration of deceptive behaviour in Large Language Models (LLMs) arising under poisonous context injection. The case study is constructed around a Japanese haiku, selected for its inherent ambiguity, which serves as a probe for LLM alignment with the humans' real-world model. When presented with a poisonous context, ChatGPT generated translation, interpretation, and literary criticism that were not only incorrect but also internally inconsistent. This experiment highlights a fundamental risk: LLMs can produce outputs that are both linguistically convincing and semantically deceptive. The novelty of this work is in framing LLM deception as a measurable phenomenon and in articulating the feasibility of automated detection through cross-verification with independent models. The contribution of this work establishes the problem space by demonstrating how subtle poisoning can systematically induce deceptive generations. By formalising the problem and identifying a methodological direction, this study positions itself as an initial step in an ongoing research program on trustworthy and self-aware AI. Proof of the concept experiments demonstrated that a committee of five major LLMs estimates the trustworthiness of the poisonous context haiku interpretations at 0.57±0.33 range, while non-poisoned haiku interpretations are estimated at the 0.86±0.15 trustworthiness range.

Original languageEnglish
Title of host publication2025 3rd International Conference on Foundation and Large Language Models, FLLM 2025
EditorsKai Erenli, Christian Guetl, Yaser Jararweh, Jim Jansen
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages732-737
Number of pages6
ISBN (Electronic)9798331594091
DOIs
Publication statusPublished - 25 Nov 2025
Event2025 3rd International Conference on Foundation and Large Language Models, FLLM 2025 - Vienna, Austria
Duration: 25 Nov 202528 Nov 2025

Publication series

Name2025 3rd International Conference on Foundation and Large Language Models, FLLM 2025

Conference

Conference2025 3rd International Conference on Foundation and Large Language Models, FLLM 2025
Country/TerritoryAustria
CityVienna
Period25/11/2528/11/25

Keywords

  • Context alignment
  • LLM deception
  • agentic AI misalignment
  • deception detection
  • poisonous context injection

ASJC Scopus subject areas

  • Language and Linguistics
  • Computer Science Applications
  • Software
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Detection of LLM deceptive behaviour triggered by the poisonous context injection: the problem demonstration'. Together they form a unique fingerprint.

Cite this