RDLHeart: Statistical Gene Selection with Deep Learning and SMOTE for Heart Failure Classification from Bulk RNA Sequencing

سال انتشار: 1404
نوع سند: مقاله کنفرانسی
زبان: انگلیسی
مشاهده: 60

فایل این مقاله در 11 صفحه با فرمت PDF قابل دریافت می باشد

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

شناسه ملی سند علمی:

ICTBC09_133

تاریخ نمایه سازی: 26 خرداد 1405

چکیده مقاله:

Cardiovascular diseases are the leading cause of global mortality. Accurate molecular classification of heart failure etiology remains challenging due to extreme class imbalance and ultra-high dimensionality of genomic data. This study proposes a deep learning framework for multi-class heart failure subtype classification directly from real Next-Generation Sequencing (NGS) RNA-sequencing (RNA-seq) profiles of human left ventricular tissue. We utilized the GSE۱۱۶۲۵۰ dataset consisting of ۶۴ samples: ۱۴ Non-Failing (NF) controls, ۳۷ Dilated Cardiomyopathy (DCM), and ۱۳ Ischemic Cardiomyopathy (ICM) cases, yielding ۵۴,۶۷۵ gene expression features per sample after preprocessing. To prevent data leakage, stratified train/test splitting (۸۰/۲۰) was performed prior to any augmentation. The Synthetic Minority Over-Sampling Technique (SMOTE) was then applied exclusively to the training set to address severe class imbalance. Three sequential models were evaluated: one-dimensional Convolutional Neural Network (۱D-CNN), Long Short-Term Memory (LSTM), and Gated Recurrent Unit (GRU). On the held-out test set, the proposed ۱D-CNN with SMOTE achieved ۹۹.۶۸% accuracy and ۰.۵۲۵۵ loss, significantly outperforming the same architecture without SMOTE (۸۷.۵۰% accuracy, ۰.۶۹۷۸ loss). LSTM and GRU models also showed consistent gains with SMOTE (approximately ۱۱-۱۳% accuracy improvement), confirming the general benefit of balanced training in NGS-based cardiology applications. This work demonstrates, for the first time, the successful integration of leakage-free SMOTE with deep sequential models on real human cardiac RNA-seq data and establishes a strong, reproducible benchmark for etiology-specific heart failure diagnosis directly from NGS transcriptomes.

نویسندگان

Alireza Assadzadeh

Department of Computer Engineering, Ta. C., Islamic Azad University, Tabriz, Iran

Masoud Kargar

Department of Computer Engineering, Ta. C., Islamic Azad University, Tabriz, Iran

Fatemeh Imani

Department of Computer Engineering, SR.C, Islamic Azad University, Tehran, Iran

Sina Abbaskhani

Department of Computer Engineering, SR.C, Islamic Azad University, Tehran, Iran

Shahin Sharbaf Movassaghpour

Department of Computer Engineering, Ta. C., Islamic Azad University, Tabriz, Iran

Ali Bayani

Department of Computer Engineering, SR.C, Islamic Azad University, Tehran, Iran