Is AI the future of evaluation in medical education?? AI vs. human evaluation in objective structured clinical examination

Tekin, Murat; Yurdal, Mustafa Onur; Toraman, Cetin; Korkmaz, Gunes; Uysal, Ibrahim

Is AI the future of evaluation in medical education?? AI vs. human evaluation in objective structured clinical examination

dc.authorid	KORKMAZ, GUNES/0000-0002-9060-5972
dc.authorid	UYSAL, IBRAHIM/0000-0002-7507-3322
dc.contributor.author	Tekin, Murat
dc.contributor.author	Yurdal, Mustafa Onur
dc.contributor.author	Toraman, Cetin
dc.contributor.author	Korkmaz, Gunes
dc.contributor.author	Uysal, Ibrahim
dc.date.accessioned	2025-05-29T02:57:38Z
dc.date.available	2025-05-29T02:57:38Z
dc.date.issued	2025
dc.department	Çanakkale Onsekiz Mart Üniversitesi
dc.description.abstract	BackgroundObjective Structured Clinical Examinations (OSCEs) are widely used in medical education to assess students' clinical and professional skills. Recent advancements in artificial intelligence (AI) offer opportunities to complement human evaluations. This study aims to explore the consistency between human and AI evaluators in assessing medical students' clinical skills during OSCE.MethodsThis cross-sectional study was conducted at a state university in Turkey, focusing on pre-clinical medical students (Years 1, 2, and 3). Four clinical skills-intramuscular injection, square knot tying, basic life support, and urinary catheterization-were evaluated during OSCE at the end of the 2023-2024 academic year. Video recordings of the students' performances were assessed by five evaluators: a real-time human assessor, two video-based expert human assessors, and two AI-based systems (ChatGPT-4o and Gemini Flash 1.5). The evaluations were based on standardized checklists validated by the university. Data were collected from 196 students, with sample sizes ranging from 43 to 58 for each skill. Consistency among evaluators was analyzed using statistical methods.ResultsAI models consistently assigned higher scores than human evaluators across all skills. For intramuscular injection, the mean total score given by AI was 28.23, while human evaluators averaged 25.25. For knot tying, AI scores averaged 16.07 versus 10.44 for humans. In basic life support, AI scores were 17.05 versus 16.48 for humans. For urinary catheterization, mean scores were similar (AI: 26.68; humans: 27.02), but showed considerable variance in individual criteria. Inter-rater consistency was higher for visually observable steps, while auditory tasks led to greater discrepancies between AI and human evaluators.ConclusionsAI shows promise as a supplemental tool for OSCE evaluation, especially for visually based clinical skills. However, its reliability varies depending on the perceptual demands of the skill being assessed. The higher and more uniform scores given by AI suggest potential for standardization, yet refinement is needed for accurate assessment of skills requiring verbal communication or auditory cues.
dc.identifier.doi	10.1186/s12909-025-07241-4
dc.identifier.issn	1472-6920
dc.identifier.issue	1
dc.identifier.pmid	40312328
dc.identifier.scopus	2-s2.0-105003978208
dc.identifier.scopusquality	Q1
dc.identifier.uri	https://doi.org/10.1186/s12909-025-07241-4
dc.identifier.uri	https://hdl.handle.net/20.500.12428/30114
dc.identifier.volume	25
dc.identifier.wos	WOS:001479968600001
dc.identifier.wosquality	Q1
dc.indekslendigikaynak	Web of Science
dc.indekslendigikaynak	Scopus
dc.indekslendigikaynak	PubMed
dc.language.iso	en
dc.publisher	Bmc
dc.relation.ispartof	Bmc Medical Education
dc.relation.publicationcategory	Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rights	info:eu-repo/semantics/closedAccess
dc.snmz	KA_WOS_20250529
dc.subject	OSCE
dc.subject	Clinical skills assessment
dc.subject	Artificial intelligence
dc.subject	Medical education
dc.subject	Evaluator consistency
dc.subject	Interrater reliability
dc.title	Is AI the future of evaluation in medical education?? AI vs. human evaluation in objective structured clinical examination
dc.type	Article

Koleksiyon

WoS İndeksli Yayınlar Koleksiyonu
PubMed İndeksli Yayınlar Koleksiyonu
Scopus İndeksli Yayınlar Koleksiyonu

Is AI the future of evaluation in medical education?? AI vs. human evaluation in objective structured clinical examination

Dosyalar

Koleksiyon