Machine Learning-Based Diagnosis of Early and Advanced Stages in Triple-Negative Breast Cancer

سال انتشار: 1403
نوع سند: مقاله کنفرانسی
زبان: انگلیسی
مشاهده: 119

نسخه کامل این مقاله ارائه نشده است و در دسترس نمی باشد

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

شناسه ملی سند علمی:

ICGCS02_465

تاریخ نمایه سازی: 17 دی 1403

چکیده مقاله:

Triple-negative breast cancer (TNBC) is characterized by the absence of estrogen receptor (ER), progesterone receptor (PR), and Human epidermal growth factor receptor ۲ (HER۲). TNBC has unique clinical and pathological traits. It presents a significant clinical challenge due to its generally poor prognosis, aggressive nature, and the lack of targeted treatment options, making chemotherapy the primary treatment approach. Better disease management can be applied by diagnosing TNBC in the early stages of the disease. Recently, machine learning combined with transcriptomics data showed a promising approach to train diagnostic models with high accuracy. In the present study, we trained a machine learning model for differentiating early and advanced TNBC based on gene expression data. Methods: RNA sequencing data related to TNBC was obtained from The Cancer Genome Atlas (TCGA) using the TCGAbiolinks R package. Based on the clinical information of samples, a subset of TNBC samples count matrix was created. Using the DESeq۲ R package, differentially expressed genes (DEGs) were identified between the advanced and early stages of TNBC. A protein-protein interaction (PPI) network of DEGs was reconstructed using STRING and network analysis was performed using Cytoscape. Furthermore, LASSO regression was performed on DEGs data using the glmnet R package. We combined hub genes from the PPI-network and feature genes selected by LASSO for machine learning (ML) model training. Using the scikit-learn library in Python, the ML model was trained based on support vector machine (SVM), and the XGBoost Python library was used to train the ML model based on XGBoost. Results: Based on betweenness centrality, the top ۲۰ hub genes were identified. Moreover, ۴۰ feature genes were selected using LASSO. By combining the data related to these ۶۰ genes, our dataset for ML model training was ready. Two algorithms (SVM, and XGBoost) were used to train ML models. The accuracy for XGBoost was ۹۱% while SVM showed an accuracy of ۹۷%. Conclusion: By analyzing RNA sequencing data, we tried to train a machine learning model for diagnosing early and advanced stages of TNBC based on the expression pattern of ۶۰ genes. Our model has an accuracy of ۹۷%.

نویسندگان

Mehrdad Ameri

Department of Medical Biotechnology, School of Advanced Medical Sciences and Technologies, Shiraz University of Medical Sciences, Shiraz, Iran

Aryan Ghorbani

Genetics Department, Faculty of Science, Shahrekord University, Shahrekord, Iran

Amin Ramezani

Shiraz Institute for Cancer Research, School of Medicine, Shiraz University of Medical Sciences, Shiraz, Iran