Consistency and Reproducibility of Grades in Higher Education: A Case Study in Deep Learning
Evaluating the performance of students in higher education is essential for gauging the effectiveness of teaching methods and achieving greater equality of opportunities for all. In this study, we investigate the correlation between two teachers' grading practices in a deep learning course at the master's level, offered at CentraleSupélec. The two teachers, who have distinct teaching styles, were responsible for marking the final project oral presentation. Our results indicate a significant positive correlation (0.76) between the two teachers' grading practices, suggesting that their assessments of students' performance are consistent. Although consistent with each other, grades do not seem to be fully reproducible from one examiner to the other suggesting serious drawbacks of only using one examiner for oral projects. Furthermore, we observed that the maximum difference between the grades assigned by the two examiners was 12.5 inter-examiner variability on students' final grades.
READ FULL TEXT