Arşiv logosu
  • Türkçe
  • English
  • Giriş
    Yeni kullanıcı mısınız? Kayıt için tıklayın. Şifrenizi mi unuttunuz?
Arşiv logosu
  • Koleksiyonlar
  • Sistem İçeriği
  • Analiz
  • Talep/Soru
  • Türkçe
  • English
  • Giriş
    Yeni kullanıcı mısınız? Kayıt için tıklayın. Şifrenizi mi unuttunuz?
  1. Ana Sayfa
  2. Yazara Göre Listele

Yazar "Yildiz, Cemil" seçeneğine göre listele

Listeleniyor 1 - 1 / 1
Sayfa Başına Sonuç
Sıralama seçenekleri
  • [ X ]
    Öğe
    Evaluating Large Language Model Adherence to AAOS Knee Osteoarthritis Guidelines: A Comparative Study of ChatGPT and NotebookLM
    (Springer Heidelberg, 2025) Yildiz, Cemil; Gokmen, Mehmet Yigit; Utlu, Cetin; Karabay, Elif Sude; Tanik, Ugur; Pazarci, Ozhan
    PurposeThis study evaluated how closely large language models (LLMs), specifically ChatGPT (OpenAI) and NotebookLM (Google), adhere to orthopedic guidelines. The objective was to determine whether AI-generated reasoning aligns with the 2021-2022 American Academy of Orthopaedic Surgeons (AAOS) clinical practice guidelines for knee osteoarthritis (OA).MethodsA mixed-methods design combined quantitative concordance scoring with qualitative content analysis. Thirty-three decision points covering non-arthroplasty and surgical management were derived from AAOS guidelines. Structured Population-Intervention-Comparison-Outcome (PICO) prompts were presented to each model. Two orthopedic surgeons independently rated all outputs using a four-domain rubric assessing accuracy, evidence reasoning, additional information, and knowledge integration (0-4 scale). Concordance was classified as full (4), partial (3), or discordant (<= 2), with disagreements resolved through consensus. Inter-rater reliability was almost perfect (weighted kappa = 0.87).ResultsChatGPT achieved a mean composite score of 3.67 +/- 0.92, and NotebookLM 3.55 +/- 0.87, with no significant difference between models (p = 0.18). Full concordance with AAOS recommendations occurred in 84.8% of ChatGPT responses and 75.8% of NotebookLM responses. Both models performed consistently in high-evidence domains such as NSAID therapy, tranexamic acid use, and weight-loss counseling. Variability increased in limited-evidence or technology-driven areas. Partial concordance reflected the omission of evidence qualifiers, while discordant responses involved overstated or speculative interpretations.ConclusionBoth LLMs demonstrated strong alignment with evidence-based orthopedic reasoning. ChatGPT showed slightly higher fidelity to recommendation strength, whereas NotebookLM provided broader contextual interpretation. Structured, guideline-oriented prompting may enhance AI reasoning consistency and support the potential role of LLMs as complementary tools for evidence translation and orthopedic education.

| Çanakkale Onsekiz Mart Üniversitesi | Kütüphane | Açık Erişim Politikası | Rehber | OAI-PMH |

Bu site Creative Commons Alıntı-Gayri Ticari-Türetilemez 4.0 Uluslararası Lisansı ile korunmaktadır.


Çanakkale Onsekiz Mart Üniversitesi, Çanakkale, TÜRKİYE
İçerikte herhangi bir hata görürseniz lütfen bize bildirin

DSpace 7.6.1, Powered by İdeal DSpace

DSpace yazılımı telif hakkı © 2002-2026 LYRASIS

  • Çerez Ayarları
  • Gizlilik Politikası
  • Son Kullanıcı Sözleşmesi
  • Geri Bildirim