Predicting with the quantify intensities of transcription factor-target genes binding using random forest technique

  • سال انتشار: 1400
  • محل انتشار: مجله آنالیز غیر خطی و کاربردها، دوره: 12، شماره: 2
  • کد COI اختصاصی: JR_IJNAA-12-2_012
  • زبان مقاله: انگلیسی
  • تعداد مشاهده: 173
دانلود فایل این مقاله

نویسندگان

- -

University of Babylon, Hilla, Iraq

- -

University of Babylon, Hilla, Iraq

چکیده

With the rapid development of technology, this development led to the emergence of microarray technology. It has the effect of studying the levels of gene expression in a way that makes it easier for researchers to observe the expression levels of millions of genes at the same time in a single experiment. Development also helped in the emergence of powerful tools to identify interactions between target genes and regulatory factors. The main aim of this study is to build models to predicate the relationship (Interaction) between Transcription Factors (TFs) proteins and target genes by selecting the subset of important genes (Relevant genes) from original dataset. The proposed methodology comprises into three major stages: the genes selection, merge datasets and the prediction stage. The process of reducing the computational space of gene data has been accomplished by using proposed mutual information method for genes selection based on the data of gene expression. In the prediction, the proposed prediction regression techniques are utilized to predict with binding rate between single TF-target gene. It has been compared the efficiency of two different proposed regression techniques including: Linear Regression and Random Forest Regression. Two available data sets have been utilized to achieve the objectives of this study: Gene’s expression data of Yeast Cell Cycle dataset and Transcription Factors dataset. The evaluation of predictions performance has been performed depending on two performance prediction measures (Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) with (۱۰) Folds-Cross Validation.

کلیدواژه ها

Microarray Technology, Gene Expression, Genes Selection, Prediction Techniques, Transcription Factors Proteins

اطلاعات بیشتر در مورد COI

COI مخفف عبارت CIVILICA Object Identifier به معنی شناسه سیویلیکا برای اسناد است. COI کدی است که مطابق محل انتشار، به مقالات کنفرانسها و ژورنالهای داخل کشور به هنگام نمایه سازی بر روی پایگاه استنادی سیویلیکا اختصاص می یابد.

کد COI به مفهوم کد ملی اسناد نمایه شده در سیویلیکا است و کدی یکتا و ثابت است و به همین دلیل همواره قابلیت استناد و پیگیری دارد.