Skip to main navigation Skip to search Skip to main content

Human vs. generative artificial intelligence in writing assessment: investigating feedback alignment, score validity, and teacher agency

  • Tanzeela Anbreen*
  • , Tuba Özturan*
  • , Prithvi Shrestha*
  • , Ammara Maqsood
  • *Corresponding author for this work
  • Erzincan University
  • Open University Milton Keynes
  • Minhaj University Lahore

Research output: Contribution to journalArticlepeer-review

Abstract

The recent advances in technology, such as the emergence of generative artificial intelligence (GenAI) tools, warrant careful integration into education. In particular, exploring feedback and scores generated by both human raters and GenAI tools is crucial for assessing feedback alignment and score validity in L2 writing assessment. Moreover, L2 writing teachers’ agency in collaborating with these tools is a notable area of research. Given the importance of the topic, this mixed-methods research design aims to address three research questions: The alignment of GenAI and human scores and feedback on the same writing task responses; the justifications for scoring and feedback; and teachers’ agency in negotiating their roles in GenAI-supported assessment contexts. For that purpose, fifty essays (an IELTS retired task for Academic Writing Task 2) were rated by a human rater and ChatGPT-5 using the IELTS Task 2 criteria. The results displayed a strong correlation between human and ChatGPT-5 scores, confirming the scoring validity. Then, the rater was asked, and ChatGPT-5 was prompted to investigate the justifications for their scoring decisions. The findings yielded a contrast between the human rater and ChatGPT-5. These findings were also carefully interpreted following Kane’s argument-based approach to validity. Lastly, the thematic analysis of the semi-structured interview to navigate teachers’ agency in GenAI-mediated writing assessment was in accord with Priestley’s ecological model of agency. Overall, the findings illustrate the need for a hybrid model since blending GenAI-led surface-level evaluation with human-led cognitive, critical, and contextual evaluation is essential for a comprehensive and valid writing assessment.

Original languageEnglish
JournalEducational Assessment, Evaluation and Accountability
DOIs
Publication statusPublished - 31 Mar 2026

Keywords

  • Feedback alignment
  • Generative artificial intelligence
  • L2 writing
  • Scoring validity
  • Teacher agency

ASJC Scopus subject areas

  • Education
  • Organizational Behavior and Human Resource Management

Fingerprint

Dive into the research topics of 'Human vs. generative artificial intelligence in writing assessment: investigating feedback alignment, score validity, and teacher agency'. Together they form a unique fingerprint.

Cite this