Prediction of drinking water quality with machine learning models: A public health nursing approach
Künye
Özsezer, G., & Mermer, G. (2024). Prediction of drinking water quality with machine learning models: A public health nursing approach. Public Health Nursing, 41(1), 175–191. https://doi.org/10.1111/phn.13264Özet
Objective: The aim of this study is to use machine learning models to predict drinking water quality from a public health nursing approach. Design: Machine learning study. Sample: “Water Quality Dataset” was used in the study. The dataset contains physical and chemical measurements of water quality for 2400 different water bodies. The process consists of four stages: Data processing with Synthetic Minority Oversampling Technique, hyperparameter tuning with 10-fold cross-validation, modeling and comparative analysis. 80% of the dataset is allocated as training data and 20% as test data. ML models logistic regression, K-nearest neighbor, support vector machine, random forest, XGBoost, AdaBoost Classifier, Decision Tree algorithms were used for water quality prediction. Accuracy, precision, recall, F1 score and AUC performance metrics of ML models were compared. To evaluate the performance of the models, 10-fold cross-validation was used and a comparative analysis was performed. The p-values of the models were also compared. Results: N this study, where drinking water quality was predicted with seven different ML algorithms, it can be said that XGBoost and Random Forest are the best classification models in all performance metrics. There is a significant difference in all ML algorithms according to the p-value. The H0 hypothesis is accepted for these algorithms. According to the H0 hypothesis, there is no difference between actual values and predicted values. Conclusion: In conclusion, the use of ML models in the prediction of drinking water quality can help nurses greatly improve access to clean water, a human right, be more knowledgeable about water quality, and protect the health of individuals.