Education
Teachers more likely to accept low AI grades than equivalent human grades, study finds
Key Points
Teachers more likely to accept low AI grades than equivalent human grades, study finds Gaby Clark Scientific Editor Andrew Zinin Lead Editor Teachers were more likely to accept an overly harsh grade given to a student by AI than when the unduly low grade was handed out by a human. As AI is increasingly integrated into decision-making, concerns about AI errors are often countered by assurances that humans will oversee and check the algorithm's work. Publishing in PNAS Nexus, Rigissa...
Teachers more likely to accept low AI grades than equivalent human grades, study finds
Gaby Clark
Scientific Editor
Andrew Zinin
Lead Editor
Teachers were more likely to accept an overly harsh grade given to a student by AI than when the unduly low grade was handed out by a human. As AI is increasingly integrated into decision-making, concerns about AI errors are often countered by assurances that humans will oversee and check the algorithm's work.
Publishing in PNAS Nexus, Rigissa Megalokonomou and colleagues investigated patterns in human AI oversight, using teachers grading student work as a model. Over 1,300 working teachers in Greece were asked to check identical student responses to open-ended questions with scores labeled as either AI-generated or produced by a human colleague.
The authors worked with three educators, a psychologist, and a communication specialist to create realistic grading scenarios and plausible grading errors. Some of the presented scores were too harsh; others were too generous. The teachers were then asked to score the student response themselves.
In general, teachers were highly influenced by the suggested scores. When they did deviate from the suggested scores, they tended to correct overly lenient scores to the same extent whether the suggestion came from a human or an algorithm. However, they tended to correct overly harsh scores less often when the suggested score was said to come from AI than when it came from a human colleague. The resulting gap between the score the teachers gave and a fair score was 22% larger when the score was labeled as AI-generated than when it was labeled as human-generated.
Responses to survey questions suggest that teachers are more likely to accept a strict AI grade when they view the system as competent and accountable than when they see it as incompetent and unaccountable. According to the authors, the results suggest that humans may be limited in their ability to effectively check the work of AI decision-makers.
Publication details
Why do experts miss AI's errors? Evidence from a randomized labeling experiment, PNAS Nexus (2026). doi.org/10.1093/pnasnexus/pgag146
Journal information: PNAS Nexus
Provided by PNAS Nexus