Discovery of Marker Genes in Adult T Cell Leukemia (ATL) Pathogenesis with Machine Learning Models and Performance Comparison

dc.contributor.authorKılıçarslan, Sabire
dc.contributor.authorYucebas, Sait Can
dc.date.accessioned2026-02-03T11:50:25Z
dc.date.available2026-02-03T11:50:25Z
dc.date.issued2025
dc.departmentÇanakkale Onsekiz Mart Üniversitesi
dc.description.abstractHematologic cancers are often diagnosed after symptoms become apparent, which can make it difficult to control the disease and implement effective treatment strategies. Studying gene expression profiles is vital for early diagnosis and the development of treatment strategies for hematologic cancers such as T-cell leukemia. The motivation of this study is to reveal the molecular mechanisms in the pathogenesis of this disease by comparing the whole gene expression profile in Adult T-cell Leukemia (ATL) cells and CD4+T cells of healthy individuals. For this aim, several machine learning algorithms, Naive Bayes, K-Nearest Neighbor, Support Vector Machine, Random Forest, C4.5, Logistic Regression, Linear Discriminant Analysis and Artificial Neural Network algorithms were used. Their performance was compared on the GSE33615 dataset by using 5-fold cross validation with stratified sampling. Among these, Artificial Neural Network stood out with an AUC of 0.98 and an F1 score of 0.93. It was followed by SVM with an AUC of 0.97 and 0.957 F1 score. In addition to performance comparison, information gain ratio, SHAPLEY metric and correlation values were calculated for the detection of genes causing ATL. Among the models, the three with the highest performance (ANN, SVM, RF) were selected, and the top ten most significant genes were identified for each. Considering the intersection of these gene sets, ZSCAN18, PLK3, and NELL2 were found to be associated with the related disease. These genes may contribute to Adult T-cell Leukemia pathogenesis through their roles in cell cycle regulation, transcriptional control, and oncogenic signaling. Further investigation is needed to clarify their precise molecular mechanisms in the related disease.
dc.identifier.doi10.31466/kfbd.1597865
dc.identifier.endpage1069
dc.identifier.issn2564-7377
dc.identifier.issue3
dc.identifier.startpage1046
dc.identifier.trdizinid1342512
dc.identifier.urihttps://doi.org/10.31466/kfbd.1597865
dc.identifier.urihttps://search.trdizin.gov.tr/tr/yayin/detay/1342512
dc.identifier.urihttps://hdl.handle.net/20.500.12428/34097
dc.identifier.volume15
dc.indekslendigikaynakTR-Dizin
dc.language.isoen
dc.relation.ispartofKaradeniz Fen Bilimleri Dergisi
dc.relation.publicationcategoryMakale - Ulusal Hakemli Dergi - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/openAccess
dc.snmzKA_TR_20260130
dc.subjectMachine learning
dc.subjectVariable importance
dc.subjectAdult T-cell Leukemia (ATL)
dc.subjectMicroarray study
dc.titleDiscovery of Marker Genes in Adult T Cell Leukemia (ATL) Pathogenesis with Machine Learning Models and Performance Comparison
dc.typeArticle

Dosyalar