Evaluation of ChatGPT-4's performance on pediatric dentistry questions: accuracy and completeness analysis

Sezer, Berkant; Okutan, Alev Eda

Evaluation of ChatGPT-4's performance on pediatric dentistry questions: accuracy and completeness analysis

dc.authorid	0000-0001-9399-5761
dc.contributor.author	Sezer, Berkant
dc.contributor.author	Okutan, Alev Eda
dc.date.accessioned	2026-02-03T12:00:30Z
dc.date.available	2026-02-03T12:00:30Z
dc.date.issued	2025
dc.department	Çanakkale Onsekiz Mart Üniversitesi
dc.description.abstract	BackgroundThis study aimed to evaluate the accuracy and completeness of Chat Generative Pre-trained Transformer-4 (ChatGPT-4) responses to frequently asked questions (FAQs) posed by patients and parents, as well as curricular questions related to pediatric dentistry. Additionally, it sought to determine whether the ChatGPT-4's performance varied across different question topics.MethodsResponses from ChatGPT-4 to 30 FAQs by patients and parents and 30 curricular questions covering six pediatric dentistry topics (fissure sealants, fluoride, early childhood caries, oral hygiene practices, development of dentition and occlusion, and pulpal therapy) were evaluated by 30 pediatric dentists. Accuracy was rated using a five-point Likert scale, while completeness was assessed via a three-point scale, capturing distinct aspects of response quality. Statistical analyses included Fisher's Exact test, Mann-Whitney U test, Kruskal-Wallis test, and Bonferroni-adjusted post hoc comparisons.ResultsChatGPT-4's responses demonstrated high overall accuracy across all question types. Mean accuracy scores were 4.21 +/- 0.55 for FAQs and 4.16 +/- 0.70 for curricular questions, indicating that responses were generally rated as good to excellent by pediatric dentists, with no statistically significant difference between the two groups (p = 0.942). Completeness scores were moderate overall, with means of 2.51 +/- 0.40 (median: 3) and 2.61 +/- 1.53 (median: 3) for FAQs and curricular questions, respectively (p = 0.563), reflecting a generally acceptable response coverage. Accuracy scores for curricular questions varied significantly by topic (p = 0.007), with the highest score for fissure sealants (4.45 +/- 0.62; median: 5) and the lowest for pulpal therapy (3.93 +/- 0.93; median: 4).ConclusionFrom a clinical perspective, ChatGPT-4 demonstrates promising accuracy and acceptable completeness in pediatric dental communication. However, its performance in certain curricular areas-particularly fluoride and pulpal therapy-warrants cautious interpretation and requires professional oversight.
dc.identifier.doi	10.1186/s12903-025-06791-9
dc.identifier.issn	1472-6831
dc.identifier.issue	1
dc.identifier.pmid	40993703
dc.identifier.scopus	2-s2.0-105016909321
dc.identifier.scopusquality	Q2
dc.identifier.uri	https://doi.org/10.1186/s12903-025-06791-9
dc.identifier.uri	https://hdl.handle.net/20.500.12428/34629
dc.identifier.volume	25
dc.identifier.wos	WOS:001580458000007
dc.identifier.wosquality	Q1
dc.indekslendigikaynak	Web of Science
dc.indekslendigikaynak	Scopus
dc.indekslendigikaynak	PubMed
dc.language.iso	en
dc.publisher	Bmc
dc.relation.ispartof	Bmc Oral Health
dc.relation.publicationcategory	Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rights	info:eu-repo/semantics/openAccess
dc.snmz	KA_WOS_20260130
dc.subject	Artificial intelligence
dc.subject	Generative artificial intelligence
dc.subject	Large language models
dc.subject	Chatbot
dc.subject	Fissure sealant
dc.subject	Fluoride
dc.subject	Dental pulp
dc.subject	Dental caries
dc.subject	Oral hygiene
dc.subject	Dental occlusion
dc.title	Evaluation of ChatGPT-4's performance on pediatric dentistry questions: accuracy and completeness analysis
dc.type	Article

Koleksiyon

WoS İndeksli Yayınlar Koleksiyonu
PubMed İndeksli Yayınlar Koleksiyonu
Scopus İndeksli Yayınlar Koleksiyonu

Evaluation of ChatGPT-4's performance on pediatric dentistry questions: accuracy and completeness analysis

Dosyalar

Koleksiyon