Application of machine learning in building a diagnostic model for gastric cancer based on a survival-related competitive endogenous RNA (ceRNA) network

  • سال انتشار: 1402
  • محل انتشار: اولین کنگره بین المللی هوش مصنوعی در علوم پزشکی
  • کد COI اختصاصی: AIMS01_029
  • زبان مقاله: انگلیسی
  • تعداد مشاهده: 168
دانلود فایل این مقاله

نویسندگان

Maryam Hosseini

Department of Genetics and Molecular Biology, Faculty of Medicine, Isfahan University of Medical Sciences, Isfahan, Iran

Basireh Bahrami

Department of Genetics and Molecular Biology, Faculty of Medicine, Isfahan University of Medical Sciences, Isfahan, Iran

ParvanehNikpour

Department of Genetics and Molecular Biology, Faculty of Medicine, Isfahan University of Medical Sciences, Isfahan, Iran

چکیده

Background and aims: Gastric cancer (GC) is known as a highly aggressive malignancy inwhich environmental and genetic factors can influence its development. Among the genetic factors,competitive endogenous (ce) RNAs are identified to affect the development of cancer. Theaim of this study was to find diagnostic biomarkers for GC based on a ceRNA network by utilizingmachine learning approaches.Methods: The RNA-seq and clinical data of ۳۳۵ GC tumor and ۳۰ non-tumor samples weredownloaded using TCGAbiolinks R-package. Differentially-expressed long non-coding RNAs(lncRNAs) (DELs), miRNAs (DEmiRs), and mRNAs (DEMs) were extracted by R-packageDESeq۲ based on |Log۲ fold change|> ۱ and adjusted p< ۰.۰۵. Utilizing univariate Cox regression,those DELs, DEmiRs, and DEMs which were survival-related were detected with a threshold ofp< ۰.۰۵. The multiMiR R-package and DIANA-LncBase v۳.۰ were used to predict the miRNA–mRNA and miRNA–lncRNA interactions. A lncRNA-miRNA-mRNA ceRNA network was thenconstructed. Using lncRNAs of the network, machine learning analysis were conducted. First,the data was split into training and test with a ratio of ۰.۷ to ۰.۳ and then tsamples in the traininggroup were resampled using SMOTETomek method. Recursive Feature Elimination (RFE) methodwas used as the feature selection technique and the selected features were utilized to build adiagnostic model utilizing support vector machine (SVM) algorithm.Results: ۳۹۴۷ DELs, ۲۶۶ DEmiRs, and ۴۳۸۸ DEMs were detected in differential expression analysisbetween tumor and non-tumor GC samples which among them, ۱۸۷ DELs, ۲۴ DEmiRs, and۵۲۴ DEMs were associated with the overall survival of GC patients. By integrating the relationswith common miRNAs, we constructed a ceRNA network consisting of ۱۲ DELs, ۱۱ DEmiRs,and ۷۰ mRNAs. After balancing the training cohort and by using RFE, four lncRNAs were selected(ENSG۰۰۰۰۰۲۱۳۲۷۹, ENSG۰۰۰۰۰۲۴۸۱۰۳, ENSG۰۰۰۰۰۲۴۹۰۰۱ and ENSG۰۰۰۰۰۲۶۲۰۶۱) asfinal diagnostic signature. A SVM diagnostic model was then constructed with an area under thecurve (AUC) of ۰.۹۸ in the test group.Conclusion: In this study, using ceRNA network construction and machine learning analysis, weidentified four diagnostic lncRNAs for GC patients which were survival-related as well. Sincemachine learning approaches are powerful methods to introduce biomarkers, our future effortswill be focused on the experimental and clinical validation of these biomarkers.

کلیدواژه ها

Machine learning, Non-coding RNAs, Stomach neoplasms, Systems Biology

مقالات مرتبط جدید

اطلاعات بیشتر در مورد COI

COI مخفف عبارت CIVILICA Object Identifier به معنی شناسه سیویلیکا برای اسناد است. COI کدی است که مطابق محل انتشار، به مقالات کنفرانسها و ژورنالهای داخل کشور به هنگام نمایه سازی بر روی پایگاه استنادی سیویلیکا اختصاص می یابد.

کد COI به مفهوم کد ملی اسناد نمایه شده در سیویلیکا است و کدی یکتا و ثابت است و به همین دلیل همواره قابلیت استناد و پیگیری دارد.