Identification of gene signature in RNA-Seq liver cancer data using Clustering Algorithms

سال انتشار: 1403
نوع سند: مقاله کنفرانسی
زبان: انگلیسی
مشاهده: 176

فایل این مقاله در 10 صفحه با فرمت PDF قابل دریافت می باشد

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

شناسه ملی سند علمی:

DSAI01_022

تاریخ نمایه سازی: 4 تیر 1403

چکیده مقاله:

Aim: This study aimed to detect gene signatures in RNA-sequencing (RNA-seq) data using Pareto-optimal cluster size identification.Background: RNA-seq has emerged as an important technology for transcriptome profiling in recent years. Gene expression signatures involving tens of genes have been proven to be predictive of disease type and patient response to treatment.Methods: Data was related to liver cancer RNA-seq dataset, which included ۳۵ paired Hepatocellular carcinoma (HCC) and non-tumor tissue samples. At first, the differentially expressed genes (DEGs) were finding after performing pre-filtering and normalization. After that, a multi-objective optimization technique namely Multi-objective optimization for collecting cluster alternatives (MOCCA) was used to discover the Pareto-optimal cluster size for these DEGs. Then k-means clustering method was performed on the RNA-seq data. The best cluster, as a signature for the disease, was found by calculating the average Spearman's correlation score of all the genes in the module in a pair-wise manner. All analysis was performed in R ۴.۱.۱ package under virtual space with ۱۰۰Gb RAM memory.Results: Using MOCCA, eight Pareto-optimal clusters were obtained. Finally, two clusters with the greatest average Spearman's correlation coefficient score were chosen as gene signature. Eleven prognostic genes involved in HCC's abnormal metabolism were identified. In addition three differentially expressed pathways were identified between tumor and non-tumor tissues.Conclusion: These identified metabolic prognostic genes help us to provide more powerful prognostic information and enhance survival prediction for HCC patients. In addition, Pareto-optimal cluster size identification is suggested to gene signature in other RNA-Seq data.

نویسندگان

Akbar Biglarian

Department of Biostatistics and Epidemiology, Social Determinants of Health Research Center, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran

Toktam Akbari Khalaj

Department of Statistics, Emergency Management Services, Mashhad University of Medical Sciences,Mashhad, Iran

Taiebe Kenarangi

Department of Biostatistics and Epidemiology, Faculty of Statistics,University of Social Welfare and Rehabilitation Sciences, Tehran, Iran

Enayatolah Bakhshi

Department of Biostatistics and Epidemiology, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran

Kolsoum Inanloo Rahatloo

Department of cell and molecular biology, school of biology, college of science, university of Tehran, Tehran, Iran

Morteza Lotfi

Executive Vice President, Emergency Management Services, Mashhad University of Medical Sciences,Mashhad, Iran