•  
  •  
 

Abstract

Background: Although the proportion of female medical trainees has grown in recent years, disparities related to sex and gender persist, especially among students and physicians in surgical specialties. With clinical evaluations increasingly influencing residency selection, understanding potential gender bias is essential to fostering equity in ophthalmology training.

Purpose: To investigate potential gender bias in ophthalmology clerkship evaluations by examining both quantitative scores and narrative feedback for second- and fourth-year medical students, with attention to the influence of student and evaluator gender.

Methods: This was a retrospective mixed-methods study analyzing 643 ophthalmology clerkship evaluations from the Duke University School of Medicine, including two quantitative scores: house staff potential (HSP) and clerkship grade (CG), in addition to narrative evaluations. The MS4 dataset (n = 510) was used for quantitative analysis of house staff potential (HSP) and clerkship grade (CG) scores. Both MS2 (n = 133) and MS4 evaluation datasets were included in linguistic analyses. Outcomes were evaluated using Wilcoxon rank-sum and ordinal logistic regression models. Narrative evaluations were analyzed using word embedding models with statistical testing to identify gender-based differences in usage of specific words.

Results: No significant differences in HSP or CG scores were observed by student or evaluator gender independently or by their interaction (all p > 0.05). Female students received a higher average word count in their narrative evaluations, which was statistically significant in the MS4 dataset (p = 0.03) but not in the MS2 dataset. Descriptive word analysis revealed subtle differences in word choice: female students were more often described as “engaged” and “interested,” whereas male students were more frequently labeled “independent,” with other additional significant differences in descriptors between cohorts. Evaluator gender did not affect quantitative or narrative outcomes.

Conclusion: No gender disparities were found in quantitative analyses of evaluation scores. However, subtle but significant differences in word choice by gender for both student cohorts suggest implicit gender bias may play a role in the medical education system. Linguistic variations may result in differences in perceptions of student performance. Efforts to further structure narrative evaluations and awareness of implicit bias may enhance equity in clinical assessments within ophthalmology training.

Received Date

06/05/2025

Revised Date

18/07/2025

Accepted Date

06/08/2025

Share

COinS