Performance of Advanced Artificial Intelligence Models in Traumatic Dental Injuries in Primary Dentition: A Comparative Evaluation of ChatGPT-4 Omni, DeepSeek, Gemini Advanced, and Claude 3.7 in Terms of Accuracy, Completeness, Response Time, and Readability

[ X ]

Tarih

2025

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Mdpi

Erişim Hakkı

info:eu-repo/semantics/openAccess

Özet

This study aimed to evaluate and compare the performance of four advanced artificial intelligence-powered chatbots-ChatGPT-4 Omni (ChatGPT-4o), DeepSeek, Gemini Advanced, and Claude 3.7 Sonnet-in responding to questions related to traumatic dental injuries (TDIs) in the primary dentition. The assessment focused on accuracy, completeness, readability, and response time, aligning with the 2020 International Association of Dental Traumatology guidelines. Twenty-five open-ended TDI questions were submitted to each model in two separate sessions. Responses were anonymized and evaluated by four pediatric dentists. Accuracy and completeness were rated using Likert scales; readability was assessed using five standard indices; and response times were recorded in seconds. ChatGPT-4o demonstrated significantly higher accuracy than Gemini Advanced (p = 0.005), while DeepSeek outperformed Gemini Advanced in completeness (p = 0.010). Response times differed significantly (p < 0.001), with DeepSeek being the slowest and ChatGPT-4o and Gemini Advanced being the fastest. DeepSeek produced the most readable outputs relatively, though none met public readability standards. Claude 3.7 generated the most complex texts (p < 0.001). A strong correlation existed between accuracy and completeness (rho = 0.701, p < 0.001). These findings emphasize the cautious integration of artificial intelligence chatbots into pediatric dental care due to varied performance. Clinical accuracy, completeness, and readability are critical when offering information aligned with guidelines to support decisions in dental trauma management.

Açıklama

Anahtar Kelimeler

artificial intelligence, chatbot, dental trauma, traumatic dental injuries, primary dentition, large language models, ChatGPT, DeepSeek, Claude, Gemini, accuracy, completeness

Kaynak

Applied Sciences-Basel

WoS Q Değeri

Q2

Scopus Q Değeri

Q1

Cilt

15

Sayı

14

Künye