Machine Learning-Driven Discovery of JAK۲ Inhibitors from ChEMBL Databank

سال انتشار: 1403
نوع سند: مقاله کنفرانسی
زبان: انگلیسی
مشاهده: 68

متن کامل این مقاله منتشر نشده است و فقط به صورت چکیده یا چکیده مبسوط در پایگاه موجود می باشد.
توضیح: معمولا کلیه مقالاتی که کمتر از ۵ صفحه باشند در پایگاه سیویلیکا اصل مقاله (فول تکست) محسوب نمی شوند و فقط کاربران عضو بدون کسر اعتبار می توانند فایل آنها را دریافت نمایند.

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

شناسه ملی سند علمی:

IBIS13_003

تاریخ نمایه سازی: 10 اردیبهشت 1404

چکیده مقاله:

This study aims to identify potent JAnus Kinase ۲ (JAK۲) inhibitors using machine learning models for virtual screening of the small molecule subset of the ChEMBL databank. JAK۲ is a key player in the JAK-STAT signaling pathway, which regulates immune responses and inflammation. Hence, JAK۲ inhibitors have significant potential for treating autoimmune disorders, inflammatory diseases, and certain cancers (Lv and Qi, ۲۰۲۴). The machine learning models utilized in this study included Support Vector Machine (SVM), XGBoost, and Random Forest (RF). For training the models, datasets of ۶,۸۴۷ active JAK۲ inhibitors were sourced from ChEMBL (Gaulton and Hersey, ۲۰۱۷), BindingDB (Gilson and Liu, ۲۰۱۶), and PubChem (Kim and Chen, ۲۰۱۹), along with ۶,۵۰۰ inactive compounds from the DUD-E database (Mysinger and Carchia, ۲۰۱۲). Each compound was labeled with an activity status (۱ for active and ۰ for inactive). Molecular characteristics were represented using extended connectivity fingerprints (ECFP۴) (Baptista and Correia, ۲۰۲۲), alongside molecular descriptors such as molecular weight, polar surface area (PSA), and logP. The datasets were processed using RDKit to extract ECFP۴ fingerprints and additional descriptors. The performance of the models was evaluated using several metrics, including accuracy and area under the curve (AUC). The Random Forest model achieved the highest performance, with a testing set accuracy of ۰.۹۹۷۰ and an AUC of ۰.۹۹۹۶. The SVM model achieved an accuracy of ۹۹.۶۳% and an AUC of ۹۹.۹۳%, while the XGBoost model had an accuracy of ۹۹.۵۵% and an AUC of ۹۹.۸۵%. Therefore, according to the performance data, the Random Forest model was used for virtual screening on a large-scale compound database containing ۱,۹۳۰,۵۵۵ molecules. The model identified ۹۹,۶۵۳ compounds as potential JAK۲ inhibitors (active) and classified the remaining ۱,۸۳۰,۹۰۲ as inactive. These findings demonstrate that combining ECFP۴ fingerprints and molecular descriptors with the Random Forest model highlights the effectiveness of machine learning-driven virtual screening in accelerating drug discovery for JAK۲ inhibitors.

کلیدواژه ها:

نویسندگان

Negar Abdolmaleki

Bioinformatics Lab., Department of Biology, School of Sciences, Razi University, Kermanshah, Iran

Hamid Mahdiuni

Bioinformatics Lab., Department of Biology, School of Sciences, Razi University, Kermanshah, Iran