Diabetes Risk Prediction with Machine Learning Models

dc.contributor.authorÖzsezer, Gözde
dc.contributor.authorMermer, Gülengül
dc.date.accessioned2025-05-29T05:39:13Z
dc.date.available2025-05-29T05:39:13Z
dc.date.issued2022
dc.departmentÇanakkale Onsekiz Mart Üniversitesi
dc.description.abstractDiabetes mellitus (DM) is one of the most common chronic diseases worldwide, which is a major public health problem. The aim of this study is to predict DM risk with machine learning (ML) models using available data. In the analytical study, the “Diabetes Health Indicators Dataset” consisting of 253680 data and 21 variables collected annually by the CDC was used. The open access dataset was retrieved from Kaggle on March 5, 2022. Data analysis was done with Phyton 3.0 programming language using numpy, pandas, matplotlib, seaborn, sciktlearn, imblearn libraries. With data pre-processing, outliers and missing data were removed. KNN, Logistic regression, Decision tree, Random forest and Naive Bayes from ML algorithms were used in predictive modeling. The prediction rate of the algorithms was evaluated with accuracy, precision, recall and F1 Score. It did not require permission as the data was open access. KNN’s accuracy was 0.74, precision 0.31, recall 0.55, F1 score 0.39; Logistic regression’s accuracy was 0.72; precision 0.33, recall 0.74, F1 score 0.46; Decision tree’s was accuracy 0.84, precision 0.54 recall 0.15, F1 score 0.24; Random forest’s accuracy was 0.84, precision 0.56, recall 0.16, F1 score 0.25; Naive bayes's accuracy was 0.84, precision 0.52, recall 0.19, F1 score 0.28. In this study, ML algorithms were used for DM risk estimation. According to the experimental results, when the data set is divided into random training (80%) and testing (20%), the accuracy values of random forest and decision tree algorithms are very close to each other (RF: 0.848, DT: 0.847). Therefore, it can be said that the two best algorithms for diabetes risk estimation are random forest and decision tree.
dc.identifier.issn2757-9778
dc.identifier.issue2
dc.identifier.startpage9-Jan
dc.identifier.urihttps://hdl.handle.net/20.500.12428/32211
dc.identifier.volume2
dc.language.isoen
dc.relation.ispartofArtificial Intelligence Theory and Applications
dc.relation.publicationcategoryMakale - Ulusal Hakemli Dergi - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/openAccess
dc.snmzKA_DergiPark_20250529
dc.subjectdiabetes
dc.subjectrisk
dc.subjectprediction
dc.subjectmachine learning
dc.subjectartificial intelligence
dc.titleDiabetes Risk Prediction with Machine Learning Models
dc.typeResearch Article

Dosyalar

Koleksiyon