Machine Learning-Based Diagnosis of Early and Advanced Stages in Triple-Negative Breast Cancer

  • سال انتشار: 1403
  • محل انتشار: دومین کنگره بین المللی کنسرژنومیکس
  • کد COI اختصاصی: ICGCS02_465
  • زبان مقاله: انگلیسی
  • تعداد مشاهده: 139
دانلود فایل این مقاله

نویسندگان

Mehrdad Ameri

Department of Medical Biotechnology, School of Advanced Medical Sciences and Technologies, Shiraz University of Medical Sciences, Shiraz, Iran

Aryan Ghorbani

Genetics Department, Faculty of Science, Shahrekord University, Shahrekord, Iran

Amin Ramezani

Shiraz Institute for Cancer Research, School of Medicine, Shiraz University of Medical Sciences, Shiraz, Iran

چکیده

Triple-negative breast cancer (TNBC) is characterized by the absence of estrogen receptor (ER), progesterone receptor (PR), and Human epidermal growth factor receptor ۲ (HER۲). TNBC has unique clinical and pathological traits. It presents a significant clinical challenge due to its generally poor prognosis, aggressive nature, and the lack of targeted treatment options, making chemotherapy the primary treatment approach. Better disease management can be applied by diagnosing TNBC in the early stages of the disease. Recently, machine learning combined with transcriptomics data showed a promising approach to train diagnostic models with high accuracy. In the present study, we trained a machine learning model for differentiating early and advanced TNBC based on gene expression data. Methods: RNA sequencing data related to TNBC was obtained from The Cancer Genome Atlas (TCGA) using the TCGAbiolinks R package. Based on the clinical information of samples, a subset of TNBC samples count matrix was created. Using the DESeq۲ R package, differentially expressed genes (DEGs) were identified between the advanced and early stages of TNBC. A protein-protein interaction (PPI) network of DEGs was reconstructed using STRING and network analysis was performed using Cytoscape. Furthermore, LASSO regression was performed on DEGs data using the glmnet R package. We combined hub genes from the PPI-network and feature genes selected by LASSO for machine learning (ML) model training. Using the scikit-learn library in Python, the ML model was trained based on support vector machine (SVM), and the XGBoost Python library was used to train the ML model based on XGBoost. Results: Based on betweenness centrality, the top ۲۰ hub genes were identified. Moreover, ۴۰ feature genes were selected using LASSO. By combining the data related to these ۶۰ genes, our dataset for ML model training was ready. Two algorithms (SVM, and XGBoost) were used to train ML models. The accuracy for XGBoost was ۹۱% while SVM showed an accuracy of ۹۷%. Conclusion: By analyzing RNA sequencing data, we tried to train a machine learning model for diagnosing early and advanced stages of TNBC based on the expression pattern of ۶۰ genes. Our model has an accuracy of ۹۷%.

کلیدواژه ها

Triple-Negative Breast Cancer, Machine Learning, Differentially Expressed Genes, LASSO, Hub Genes

مقالات مرتبط جدید

اطلاعات بیشتر در مورد COI

COI مخفف عبارت CIVILICA Object Identifier به معنی شناسه سیویلیکا برای اسناد است. COI کدی است که مطابق محل انتشار، به مقالات کنفرانسها و ژورنالهای داخل کشور به هنگام نمایه سازی بر روی پایگاه استنادی سیویلیکا اختصاص می یابد.

کد COI به مفهوم کد ملی اسناد نمایه شده در سیویلیکا است و کدی یکتا و ثابت است و به همین دلیل همواره قابلیت استناد و پیگیری دارد.