Deep Learning Framework for Splice Site Prediction Across Multiple Species

  • سال انتشار: 1402
  • محل انتشار: دوازدهمین همایش ملی و سومین همایش بین المللی بیوانفورماتیک
  • کد COI اختصاصی: IBIS12_116
  • زبان مقاله: انگلیسی
  • تعداد مشاهده: 98
دانلود فایل این مقاله

نویسندگان

Mohammad Reza Rezvan

Department of Computer Engineering, University of Science and Technology of Mazandaran, Behshahr, Iran

Ali Ghanbari Sorkhi

Department of Computer Engineering, University of Science and Technology of Mazandaran, Behshahr, Iran

Jamshid Pirgazi

Department of Computer Engineering, University of Science and Technology of Mazandaran, Behshahr, Iran

چکیده

Splice site prediction remains a pivotal challenge in bioinformatics, necessitating accurateand efficient computational methods to understand genetic regulation and expression. This studyintroduces a novel deep learning framework for the prediction of splice sites, leveraging the geneticsequences from three distinct datasets: Arabidopsis thaliana, Homo sapiens, and HS۳D. Ourmethodology commences with the preprocessing of sequence data using a two-gram approach followedby one-hot encoding to transform genetic sequences into a numerical format amenable to deep learningtechniques. We employ a Residual Convolutional Neural Network (ResidualConv۱D) for robust featureextraction, capitalizing on its ability to learn hierarchical representations of sequence motifs. To addressthe high-dimensionality of the feature space, Principal Component Analysis (PCA) is utilized fordimensionality reduction, enhancing computational efficiency and model interpretability. The featurerich,dimensionally reduced data is then classified using a Support Vector Machine (SVM), chosen forits effectiveness in handling high-dimensional data and its capacity for achieving high accuracy inbinary classification tasks. Our approach showcases a significant improvement in splice site predictionaccuracy, demonstrating the potential of integrating deep learning architectures with traditional machinelearning techniques for bioinformatics applications. The study not only contributes to the advancementof computational genomics but also opens new avenues for the application of deep learning in geneticdata analysis.

کلیدواژه ها

Splice site; ResidualConv۱D; robust feature extraction; Principal Component Analysis; Support Vector Machine

مقالات مرتبط جدید

اطلاعات بیشتر در مورد COI

COI مخفف عبارت CIVILICA Object Identifier به معنی شناسه سیویلیکا برای اسناد است. COI کدی است که مطابق محل انتشار، به مقالات کنفرانسها و ژورنالهای داخل کشور به هنگام نمایه سازی بر روی پایگاه استنادی سیویلیکا اختصاص می یابد.

کد COI به مفهوم کد ملی اسناد نمایه شده در سیویلیکا است و کدی یکتا و ثابت است و به همین دلیل همواره قابلیت استناد و پیگیری دارد.