Orthodontic Biomechanical Reasoning with Multimodal Language Models: Performance and Clinical Utility

Arisan, Arda; Genc, Celal; Duran, Gokhan Serhat

Orthodontic Biomechanical Reasoning with Multimodal Language Models: Performance and Clinical Utility

dc.authorid	0009-0004-8920-8605
dc.authorid	0000-0003-4037-9783
dc.authorid	0000-0001-6152-6178
dc.contributor.author	Arisan, Arda
dc.contributor.author	Genc, Celal
dc.contributor.author	Duran, Gokhan Serhat
dc.date.accessioned	2026-02-03T12:00:01Z
dc.date.available	2026-02-03T12:00:01Z
dc.date.issued	2025
dc.department	Çanakkale Onsekiz Mart Üniversitesi
dc.description.abstract	Background: Multimodal large language models (LLMs) are increasingly being explored as clinical support tools, yet their capacity for orthodontic biomechanical reasoning has not been systematically evaluated. This retrospective study assessed their ability to analyze treatment mechanics and explored their potential role in supporting orthodontic decision-making. Methods: Five publicly available models (GPT-o3, Claude 3.7 Sonnet, Gemini 2.5 Pro, GPT-4.0, and Grok) analyzed 56 standardized intraoral photographs illustrating a diverse range of active orthodontic force systems commonly encountered in clinical practice. Three experienced orthodontists independently scored the outputs across four domains-observation, interpretation, biomechanics, and confidence-using a 5-point scale. Inter-rater agreement and consistency were assessed, and statistical comparisons were made between models. Results: GPT-o3 achieved the highest composite score (3.34/5.00; 66.8%), significantly outperforming all other models. The performance ranking was followed by Claude (57.8%), Gemini (52.6%), GPT-4.0 (48.8%), and Grok (38.8%). Inter-rater reliability among the expert evaluators was excellent, with ICC values ranging from 0.786 (Confidence Evaluation) to 0.802 (Observation). Model self-reported confidence showed poor calibration against expert-rated output quality. Conclusions: Multimodal LLMs show emerging potential for assisting orthodontic biomechanical assessment. With expert-guided validation, these models may contribute meaningfully to clinical decision support across diverse biomechanical scenarios encountered in routine orthodontic care.
dc.identifier.doi	10.3390/bioengineering12111165
dc.identifier.issn	2306-5354
dc.identifier.issue	11
dc.identifier.pmid	41301121
dc.identifier.scopus	2-s2.0-105023123253
dc.identifier.scopusquality	Q3
dc.identifier.uri	https://doi.org/10.3390/bioengineering12111165
dc.identifier.uri	https://hdl.handle.net/20.500.12428/34484
dc.identifier.volume	12
dc.identifier.wos	WOS:001624013900001
dc.identifier.wosquality	Q2
dc.indekslendigikaynak	Web of Science
dc.indekslendigikaynak	Scopus
dc.indekslendigikaynak	PubMed
dc.language.iso	en
dc.publisher	Mdpi
dc.relation.ispartof	Bioengineering-Basel
dc.relation.publicationcategory	Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rights	info:eu-repo/semantics/openAccess
dc.snmz	KA_WOS_20260130
dc.subject	orthodontics
dc.subject	orthodontic biomechanics
dc.subject	multimodal large language models
dc.subject	biomechanical reasoning
dc.subject	clinical decision support
dc.subject	AI in dental engineering
dc.title	Orthodontic Biomechanical Reasoning with Multimodal Language Models: Performance and Clinical Utility
dc.type	Article

Koleksiyon

WoS İndeksli Yayınlar Koleksiyonu
PubMed İndeksli Yayınlar Koleksiyonu
Scopus İndeksli Yayınlar Koleksiyonu

Orthodontic Biomechanical Reasoning with Multimodal Language Models: Performance and Clinical Utility

Dosyalar

Koleksiyon