Integrative analysis of DNA methylation and gene expression to identify gastric cancerdiagnostic biomarkers via machine learning approache

سال انتشار: 1400
نوع سند: مقاله کنفرانسی
زبان: انگلیسی
مشاهده: 123

نسخه کامل این مقاله ارائه نشده است و در دسترس نمی باشد

این مقاله در بخشهای موضوعی زیر دسته بندی شده است:

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

شناسه ملی سند علمی:

IBIS10_001

تاریخ نمایه سازی: 5 تیر 1401

چکیده مقاله:

Gastric cancer is the second cause of cancer-related deaths. Most patients are diagnosed in a late stage becausethere are not proper methods for early cancer detection. Therefore, there is an urgent need for finding suchbiomarkers for this cancer. In the current study, our purpose is to integrate methylation and gene expressiondata from The Cancer Genome Atlas (TCGA) to find potential diagnostic biomarkers for gastric cancer viabioinformatics and machine learning approaches. DNA Methylation and gene expression data for gastriccancer were downloaded from TCGA using TCGABiolinks R-package. At first, probes locating at sexchromosomes, containing single nucleotide polymorphisms (SNPs) or missing values were removed. Then,we used ChAMP R-package for finding differentially methylated CpGs (DMCs). CpGs with adjusted pvalues<۰.۰۵ and |delta β| > ۰.۲۵ were considered DMCs. Gene expression dataset was normalized viaDESeq۲ R-package. Genes were considered differentially expressed (DEGs) if they satisfied the thresholdof |log۲ fold change| > ۱ and adjusted p-values<۰.۰۵. Since promoter hyper-methylation of tumor suppressorgenes is one of the most important observations in cancer, we only continued with hyper-methylated CpGs(۳۸ probes) located in the promoter of downregulated genes. Recursive feature elimination with crossvalidation(RFECV) method was used to find features with highest discriminative power between tumoraland normal samples resulting in ۴ final probes including cg۱۰۶۰۴۶۴۶, cg۲۲۰۸۳۰۴۷, cg۰۷۷۳۰۳۲۹ andcg۱۲۷۴۱۴۲۰. These features where then used for constructing a logistic regression model. We validated thesemarkers in an independent set from GEO database (GSE۳۰۶۰۱). The area under the curve (AUC) of modelwas ۰.۹۰۴ indicating that the four markers could achieve excellent performance in distinguishing tumoraland normal gastric samples. Overall, the four high-performance diagnostic signatures built through machinelearning approaches can improve gastric cancer precision management upon prospective clinical validation.

نویسندگان

Maryam Sadat Hosseini

Department of Genetics and Molecular Biology, Faculty of Medicine, Isfahan University of Medical Sciences, Isfahan, Iran

Maryam Lotfi Shahreza

Department of Computer Engineering, Shahreza Campus, University of Isfahan, Isfahan, Iran

Parvaneh Nikpour

Department of Genetics and Molecular Biology, Faculty of Medicine, Isfahan University of Medical Sciences, Isfahan, Iran