Skip to main navigation Skip to search Skip to main content

AQUACOLD: Aggregated Query Understanding and Construction Over Linked Data

  • Nicholas Collis

Student thesis: Doctoral thesis

Abstract

Question Answering (QA) systems provide direct answers to natural language (NL) questions posed by humans. Linked data (LD) provides an ideal knowledge base for answering complex QA as the framework expresses structure and relationships between data which assist in parsing the question, also the open 'web of data' or knowledge graph formed by interlinking between LD nodes provides a vast and varied domain of knowledge to search over. Despite this, recent attempts at NL QA over LD struggle when faced with complex questions due to the challenges in automatically parsing natural language into a structured LD querylanguage such as SPARQL, forcing end users to learn these languages which can be challenging without a technical background. There is a need for a system which returns accurate answers to complex natural language questions over linked data, improving the accessibility of linked data search by abstracting the complexity of SPARQL whilst retaining its expressivity. This thesis presents AQUACOLD (Aggregated Query Understanding And Construction Over Linked Data) a novel LD QA system which harnesses the power of crowdsourcing to meet this need. AquaCold uses query templates built by system users to answerquestions, rather than an algorithmic solution, and as such can handle queries of significant complexity. AquaCold's effectiveness as a NL LD QA answering system was evaluated using the standard IR metrics of precision, recall and f-score on the QALD-9 question set, a benchmark used by many comparable NL QA systems. 30 participants took part in the study, attempting to answer a subset of QALD-9 questions using AquaCold. Results were analysed and compared against published results for similar NL LD QA systems, for both the AquaCold system overall and with respect to the dimensions of user IT skill to evaluate the utility for non-technical users specifically and with respect to the different crowdsourced components of the system to evaluate the utility of each. AquaCold performed strongly in the QALD9 benchmark study, recording greater f-score and query coverage results than comparable systems. Non-technical users achieved better scores when all or part of the question was available to answer using a query template, but achieved worse scores when no template was available and answers had to be obtained using the query builder component instead. This indicates a viable workflow where technically skilled users create templates which less technically able users could use to answer questions.
Date of AwardAug 2021
Original languageEnglish
Awarding Institution
  • University of Bedfordshire
SupervisorIngo Frommholz (Supervisor) & Hong Qing Yu (Second supervisor)

Keywords

  • Linked Data
  • Question Answering
  • Sparql
  • Semantic Web
  • Natural Language Processing
  • Complex Questions
  • Crowdsourcing
  • Subject Categories::G560 Data Management

Cite this

'