Evaluation of ChatGPT-4's performance on pediatric dentistry questions: accuracy and completeness analysis

dc.authorid0000-0001-9399-5761
dc.contributor.authorSezer, Berkant
dc.contributor.authorOkutan, Alev Eda
dc.date.accessioned2026-02-03T12:00:30Z
dc.date.available2026-02-03T12:00:30Z
dc.date.issued2025
dc.departmentÇanakkale Onsekiz Mart Üniversitesi
dc.description.abstractBackgroundThis study aimed to evaluate the accuracy and completeness of Chat Generative Pre-trained Transformer-4 (ChatGPT-4) responses to frequently asked questions (FAQs) posed by patients and parents, as well as curricular questions related to pediatric dentistry. Additionally, it sought to determine whether the ChatGPT-4's performance varied across different question topics.MethodsResponses from ChatGPT-4 to 30 FAQs by patients and parents and 30 curricular questions covering six pediatric dentistry topics (fissure sealants, fluoride, early childhood caries, oral hygiene practices, development of dentition and occlusion, and pulpal therapy) were evaluated by 30 pediatric dentists. Accuracy was rated using a five-point Likert scale, while completeness was assessed via a three-point scale, capturing distinct aspects of response quality. Statistical analyses included Fisher's Exact test, Mann-Whitney U test, Kruskal-Wallis test, and Bonferroni-adjusted post hoc comparisons.ResultsChatGPT-4's responses demonstrated high overall accuracy across all question types. Mean accuracy scores were 4.21 +/- 0.55 for FAQs and 4.16 +/- 0.70 for curricular questions, indicating that responses were generally rated as good to excellent by pediatric dentists, with no statistically significant difference between the two groups (p = 0.942). Completeness scores were moderate overall, with means of 2.51 +/- 0.40 (median: 3) and 2.61 +/- 1.53 (median: 3) for FAQs and curricular questions, respectively (p = 0.563), reflecting a generally acceptable response coverage. Accuracy scores for curricular questions varied significantly by topic (p = 0.007), with the highest score for fissure sealants (4.45 +/- 0.62; median: 5) and the lowest for pulpal therapy (3.93 +/- 0.93; median: 4).ConclusionFrom a clinical perspective, ChatGPT-4 demonstrates promising accuracy and acceptable completeness in pediatric dental communication. However, its performance in certain curricular areas-particularly fluoride and pulpal therapy-warrants cautious interpretation and requires professional oversight.
dc.identifier.doi10.1186/s12903-025-06791-9
dc.identifier.issn1472-6831
dc.identifier.issue1
dc.identifier.pmid40993703
dc.identifier.scopus2-s2.0-105016909321
dc.identifier.scopusqualityQ2
dc.identifier.urihttps://doi.org/10.1186/s12903-025-06791-9
dc.identifier.urihttps://hdl.handle.net/20.500.12428/34629
dc.identifier.volume25
dc.identifier.wosWOS:001580458000007
dc.identifier.wosqualityQ1
dc.indekslendigikaynakWeb of Science
dc.indekslendigikaynakScopus
dc.indekslendigikaynakPubMed
dc.language.isoen
dc.publisherBmc
dc.relation.ispartofBmc Oral Health
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/openAccess
dc.snmzKA_WOS_20260130
dc.subjectArtificial intelligence
dc.subjectGenerative artificial intelligence
dc.subjectLarge language models
dc.subjectChatbot
dc.subjectFissure sealant
dc.subjectFluoride
dc.subjectDental pulp
dc.subjectDental caries
dc.subjectOral hygiene
dc.subjectDental occlusion
dc.titleEvaluation of ChatGPT-4's performance on pediatric dentistry questions: accuracy and completeness analysis
dc.typeArticle

Dosyalar