Detection of LLM deceptive behaviour triggered by the poisonous context injection: the problem demonstration
- Stanislav Selitskiy,
Abstract
This paper presents a focused demonstration of deceptive behaviour in Large Language Models (LLMs) arising under poisonous context injection. The case study is constructed around a Japanese haiku, selected for its inherent ambiguity, which serves as a probe for LLM alignment with the humans' real-world model. When presented with a poisonous context, ChatGPT generated translation, interpretation, and literary criticism that were not only incorrect but also internally inconsistent. This experiment highlights a fundamental risk: LLMs can produce outputs that are both linguistically convincing and semantically deceptive. The novelty of this work is in framing LLM deception as a measurable phenomenon and in articulating the feasibility of automated detection through cross-verification with independent models. The contribution of this work establishes the problem space by demonstrating how subtle poisoning can systematically induce deceptive generations. By formalising the problem and identifying a methodological direction, this study positions itself as an initial step in an ongoing research program on trustworthy and self-aware AI. Proof of the concept experiments demonstrated that a committee of five major LLMs estimates the trustworthiness of the poisonous context haiku interpretations at 0.57±0.33 range, while non-poisoned haiku interpretations are estimated at the 0.86±0.15 trustworthiness range.
Publication Information
Output type
Original language
EnglishPages from-to (Number of pages)
Pages 732-737 (6 pages)Publication milestones
- Published - 25/11/2025
Publication status
Publisher
Institute of Electrical and Electronics Engineers Inc., United StatesPublication series
- Publication series name: 2025 3rd International Conference on Foundation and Large Language Models, FLLM 2025
ISBN (Electronic)
9798331594091External Publication IDs
- Scopus: 105035890254
Host publication title
2025 3rd International Conference on Foundation and Large Language Models, FLLM 2025Host publication editors
- Kai Erenli
- Christian Guetl
- Yaser Jararweh
- Jim Jansen
