Language Model–Based Representation Learning for Venom Protein Identification and Therapeutic Target Discovery in Cancer

Meisam Ahmadi; Mohammad Reza Jahed-Motlagh; Ehsaneddin Asgari; Adel Torkaman Rahmani

Language Model–Based Representation Learning for Venom Protein Identification and Therapeutic Target Discovery in Cancer

محل انتشار: مجله تحقیقات سرطان، دوره: 8، شماره: 2

سال انتشار: 1403

نوع سند: مقاله ژورنالی

زبان: انگلیسی

مشاهده: 15

فایل این مقاله در 11 صفحه با فرمت PDF قابل دریافت می باشد

دریافت فایل کامل مقاله

صدور گواهی نمایه سازی
من نویسنده این مقاله هستم

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

https://civilica.com/doc/2467083

شناسه ملی سند علمی:

JR_MCIJO-8-2_003

تاریخ نمایه سازی: 7 دی 1404

چکیده مقاله:

Venom is a complex mixture of bioactive molecules produced by venomous organisms for predation, defense, or intraspecific competition, often leading to specific physiological responses in target organisms. Venom-derived peptides and proteins have recently attracted attention in biomedical research for their potential therapeutic applications, including anticancer drug discovery. However, venom sequences constitute a highly divergent class of proteins, making their machine learning and homology-based identification particularly challenging. To address this, we propose ToxVec, a transfer learning based framework for automatic representation learning of protein sequences aimed at improving venom identification. Our approach leverages pre-trained protein language models to capture sequence-level information without manual feature engineering. ToxVec outperforms existing feature-based models, achieving amacro-F۱ score of ۰.۸۹. Furthermore, an ensemble model trained on multiple balanced subsets enhances performance to a macro-F۱ of ۰.۹۳, representing a ۷% improvement over the state of the art. Beyond benchmark performance, screening of experimentally validated anticancer peptides from the CancerPPD۲ dataset revealed that many exhibit high venom-like signatures according to ToxVec, supporting the notion that toxin-inspired molecular architectures may underlie anticancer bioactivity. We further discuss how language model–based representation learning embodies a Cognitive Mind–Body–Inspired interpretation, linking abstract sequence semantics (the “mind”) to biological function (the “body”). By enabling more accurate large-scale identification of venom proteins, ToxVec provides a foundation for systematically exploring venom-derived bioactive peptides as potential therapeutic candidates, including those targeting pathways implicated in breast cancer progression and metastasis. This automated approach thus bridges computational protein informatics with translational oncology, supporting future efforts in bioactive peptide based anticancer research.

کلیدواژه ها:

Venom protein identification ، Protein language model ، Transfer learning ، Representation learning ، Anticancer peptides

نویسندگان

Meisam Ahmadi

Department of Computer Engineering, Iran University of Science and Technology, Tehran, Iran

Mohammad Reza Jahed-Motlagh

Department of Computer Engineering, Iran University of Science and Technology,Tehran, Iran

Ehsaneddin Asgari

Qatar Computing Research Institute

Adel Torkaman Rahmani

Department of Computer Engineering, Iran University of Science and Technology, Tehran, Iran

مراجع و منابع این مقاله:

لیست زیر مراجع و منابع استفاده شده در این مقاله را نمایش می دهد. این مراجع به صورت کاملا ماشینی و بر اساس هوش مصنوعی استخراج شده اند و لذا ممکن است دارای اشکالاتی باشند که به مرور زمان دقت استخراج این محتوا افزایش می یابد. مراجعی که مقالات مربوط به آنها در سیویلیکا نمایه شده و پیدا شده اند، به خود مقاله لینک شده اند :

Bengio Y. Deep learning of representations for unsupervised and transfer ...
Tan C, Sun F, Kong T, Zhang W, Yang C, ...
Gacesa R, Barlow DJ, Long PF. Machine learning can differentiate ...
Nawarak J, Sinchaikul S, Wu CY, Liau MY, Phutrakul S, ...
Tan C, Sun F, Kong T, Zhang W, Yang C, ...
Gacesa R, Barlow DJ, Long PF. Machine learning can differentiate ...
Nawarak J, Sinchaikul S, Wu CY, Liau MY, Phutrakul S, ...

نمایش کامل مراجع