Speaking test scores are increasingly being used to make high-stakes decisions (for employment,immigration, university admissions) about learners in many countries. Ensuring that these scoresreflect a learner's skill fairly and accurately is critical. This mixed-methods study seeks tostrengthen the socio-cognitive framework for test validation (Chalhoub-Deville & O'Sullivan,2020; Taylor, 2011; Weir, 2005) and deepen our understanding of the complexities involved inderiving scores from L2 speaking tests. Adopting an interactionalist perspective, the researchconsiders interview-format speaking tests as co-constructed events between candidate, examinerand rater. The research examines how certain elements of scoring validity (rater characteristicsof 'Agreeableness', 'Extraversion' and 'Test Experience Level') change how raters perceive orrate spoken performances and modulate their severity. Native-speaker, English teachers fromuniversities across Japan (n = 86) rated 12 video-recorded speaking test performances andafterwards completed a personality instrument. A Hierarchical Multiple Regression showed that'Test Level Experience' and 'Agreeableness' contributed significantly to the regression model, F(6, 79) = 3.126, p = .019 together accounting for 19% of the variation in rater severity. Thesepredictors were negatively correlated with rater severity; higher levels predicted more lenientratings. Trait 'Extraversion' explained an additional 4% of the variation and this was significant,F (7, 78) = 3.426, p = .039. 'Extraversion' was positively correlated with rater severity; higherlevels predicted more severe ratings. Finally, all raters provided written commentary on theirrating procedures and three raters took part in Stimulated Recall Interviews. Thematic analysesof the two types of qualitative data suggested that lenient (experienced, agreeable, introvert)raters perceive different aspects of examiner performance to more severe (inexperienced,disagreeable, extravert) raters and these perceptions sometimes impacted how they cognitivelyapproached the task of rating. In some instances, the differing perceptions and cognitiveapproaches may have impacted raters' final proficiency scores. The research findings offersuggestions for updating our understanding of the co-constructed nature of spoken interaction aswell as the scoring validity component of the socio-cognitive framework. The study also makespractical recommendations on future rater training procedures that incorporate the findings fromthis study.
| Date of Award | Jul 2022 |
|---|
| Original language | English |
|---|
| Awarding Institution | - University of Bedfordshire
|
|---|
| Supervisor | Fumiyo Nakatsuhara (Supervisor) & Chihiro Inoue (Second supervisor) |
|---|
- Personality
- Testing
- Speaking
- Severity
- Raters
Examining the role of rater personality in L2 speaking tests
Roger, A. (Author). Jul 2022
Student thesis: Doctoral thesis