TY - GEN
T1 - AI-enhanced landmark recognition for self-guided tour application using large language models
AU - Karmaker, Pronab
AU - Korre, Danai
AU - Rehman, Muhammad Habib Ur
AU - khodadadzadeh, Massoud
N1 - Publisher Copyright:
© 2025 Copyright held by the owner/author(s).
PY - 2025/9/21
Y1 - 2025/9/21
N2 - Artificial intelligence (AI), particularly Large Language Models (LLMs), has created opportunities to improve user experiences by enabling the development of more interactive applications in various implementation scenarios. This paper proposes a mobile application as a virtual self-guided tour, enabling landmark recognition and enhanced user interaction with LLMs. A landmark classifier is employed for Cloud-based image classification, with accuracy further improved by incorporating GPS-based matching of classification results. These preliminary tests proved that the use of GPS to match the location improved the results and that the London Eye improved from 82 to 88 percent. Subsequently, users are provided with audio information about the identified landmark and access to extended landmark details generated by the used LLM. Users can also engage in text or voice-based interactions with the system. The system architecture integrates real-time image processing, location optimisation, and generative AI, creating interactive and engaging user interfaces.
AB - Artificial intelligence (AI), particularly Large Language Models (LLMs), has created opportunities to improve user experiences by enabling the development of more interactive applications in various implementation scenarios. This paper proposes a mobile application as a virtual self-guided tour, enabling landmark recognition and enhanced user interaction with LLMs. A landmark classifier is employed for Cloud-based image classification, with accuracy further improved by incorporating GPS-based matching of classification results. These preliminary tests proved that the use of GPS to match the location improved the results and that the London Eye improved from 82 to 88 percent. Subsequently, users are provided with audio information about the identified landmark and access to extended landmark details generated by the used LLM. Users can also engage in text or voice-based interactions with the system. The system architecture integrates real-time image processing, location optimisation, and generative AI, creating interactive and engaging user interfaces.
UR - https://www.scopus.com/pages/publications/105031906544
U2 - 10.1145/3737821.3748524
DO - 10.1145/3737821.3748524
M3 - Conference contribution
AN - SCOPUS:105031906544
T3 - MobileHCI 2025 - Adjunct Proceedings of the 2025 Conference on Mobile Human-Computer Interaction
BT - MobileHCI '25 Adjunct: Adjunct Proceedings of the 27th International Conference on Mobile Human-Computer Interaction
A2 - Abdelrahman, Yomna
A2 - Elagroudy, Passant
A2 - Alt, Florian
PB - Association for Computing Machinery
T2 - 27th International Conference on Mobile Human-Computer Interaction, MobileHCI 2025
Y2 - 22 September 2025 through 25 September 2025
ER -