Splicing Detection In DNA Sequences With A Focus On Balancing Visual Data And Deep Neural Networks

سال انتشار: 1402
نوع سند: مقاله کنفرانسی
زبان: انگلیسی
مشاهده: 62

نسخه کامل این مقاله ارائه نشده است و در دسترس نمی باشد

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

شناسه ملی سند علمی:

IBIS12_126

تاریخ نمایه سازی: 12 آبان 1403

چکیده مقاله:

Accurate detection of splicing in DNA and proteins holds significant medical and vitalimportance. This process enhances genetic diversity and aids in the regulation of exon skipping.Maintaining the correct order of nucleotides is crucial for proper protein production in this process.Given the importance of this matter, extensive research has been conducted in recent years to identifyregions related to exons and introns .Numerous challenges, such as imbalanced datasets and the absenceof visual data, exist in this domain. In this regard, a novel multi-stage method for detecting these regionsis proposed in this article. The splicing detection is performed using the powerful ResNet-۳۲architecture, recognized for its effectiveness in training deep networks. ResNet-۳۲ utilizes residualblocks, where each residual block has a shortcut connection allowing the model to learn the differencebetween the input and output of skipped layers. This innovative approach mitigates issues like vanishinggradients during training .To address the issue of imbalanced data, three methods are employed in thisproject: Reweighting, Resampling, and DRW (Dynamic Reweighting). Resample: Oversamples theminority class during training to create a balanced representation of classes in each batch. Reweight:Adjusts sample weights to give higher importance to the minority class, preventing bias towards themajority class during training. DRW (Dynamic Reweighting): Dynamically adapts the reweightingstrategy by changing beta values over epochs, allowing the model to gradually adjust to imbalanceddatasets during different training stages. Alongside these techniques, to overcome the lack of visualdata, each nucleotide in the sequence is transformed into a one-hot encoded vector, forming a sequenceof zeros and ones. This sequence is then converted into a matrix, serving as a binary image representingvisual data for the sequences .The proposed method's performance is evaluated on the standardCaenorhabditis elegans dataset. The obtained results demonstrate the satisfactory performance of theproposed approach.

نویسندگان

E Keshavarzi Khouzani

Department of Electrical and Computer Engineering, University of Science and Technology of Mazandaran, Behshahr, Iran

A Ghanbari Sorkhi

Department of Electrical and Computer Engineering, University of Science and Technology of Mazandaran, Behshahr, Iran