Functional annotation of Missense Mutations Based on Protein Features
سال انتشار: 1400
نوع سند: مقاله کنفرانسی
زبان: انگلیسی
مشاهده: 156
نسخه کامل این مقاله ارائه نشده است و در دسترس نمی باشد
- صدور گواهی نمایه سازی
- من نویسنده این مقاله هستم
این مقاله در بخشهای موضوعی زیر دسته بندی شده است:
استخراج به نرم افزارهای پژوهشی:
شناسه ملی سند علمی:
IBIS10_037
تاریخ نمایه سازی: 5 تیر 1401
چکیده مقاله:
Background: Among all genetic alterations in cancers, Single Nucleotide Variant (SNV) are the mostcommon mutation. Still, identifying cancer-driving SNV (driver mutation) among the plenty of non-drivingones (passenger mutations) remains a challenge due to the innate bias in their population and the fact thatmost driver mutations are rare. We present a random forest tool that can annotate driver/passenger missensemutations based on protein information and helps find novel driving mutations.Materials and Methods: Pan-cancer mutation data from the TCGA database (n = ۶۰۰k) fetched and labeledpassenger/driver (based on their prevalence). According to the mutations, a feature-set containing fivecategories was built: ۱) Physio-chemical changes of the changed amino acid, ۲) Changes in Pseudo-aminoacid composition of the ۲۱-mer sequence around the point of mutation ۳) disorderness of the mutation region,۴) site of mutation reported region/functions in uniport ۵) whether the gene is reported to be Oncogene/Tumor Suppressor or none. A random forest model was trained on the feature set by the ranger package inR.Results: The accuracy of the method on test data is ۹۹% (sensitivity = ۹۹%, specificity = ۵۴%). The methodwas evaluated against other cancer missense annotations such as CHASMplus, CHASM, Mutation Assessor,Polyphen۲, and VEST on experimentally-labeled cancer missense mutations. The receiver operatingcharacteristic curve (auROC) of methods were ۸۸%, ۶۷%, ۶۷%, ۵۹%, ۷۲%, respectively, and our methodauROC was ۸۳%. Also, it was tested against cancer SNV Golden standard based on extensive literature anddatabase review, in which the accuracy was reported to be ۷۲%, (sensitivity = ۷۴%, specificity = ۷۱%)Conclusion: We developed a random forest method that discriminates drivers from passenger missensemutations. As the method is solely based on protein descriptors, it can give insight into the mutation modeof action.
کلیدواژه ها:
نویسندگان
Motahareh Hakiminejad
Department of Biophysics, Institute Biochemistry and Biophysics, University of Tehran, Tehran, Iran
Hesam Montazeri
Department of Bioinformatics, Institute Biochemistry and Biophysics, University of Tehran, Tehran, Iran
Bahram Goliaei
Department of Biophysics, Institute Biochemistry and Biophysics, University of Tehran, Tehran, Iran