Functional annotation of Missense Mutations Based on Protein Features

سال انتشار: 1400
نوع سند: مقاله کنفرانسی
زبان: انگلیسی
مشاهده: 125

نسخه کامل این مقاله ارائه نشده است و در دسترس نمی باشد

این مقاله در بخشهای موضوعی زیر دسته بندی شده است:

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

شناسه ملی سند علمی:

IBIS10_037

تاریخ نمایه سازی: 5 تیر 1401

چکیده مقاله:

Background: Among all genetic alterations in cancers, Single Nucleotide Variant (SNV) are the mostcommon mutation. Still, identifying cancer-driving SNV (driver mutation) among the plenty of non-drivingones (passenger mutations) remains a challenge due to the innate bias in their population and the fact thatmost driver mutations are rare. We present a random forest tool that can annotate driver/passenger missensemutations based on protein information and helps find novel driving mutations.Materials and Methods: Pan-cancer mutation data from the TCGA database (n = ۶۰۰k) fetched and labeledpassenger/driver (based on their prevalence). According to the mutations, a feature-set containing fivecategories was built: ۱) Physio-chemical changes of the changed amino acid, ۲) Changes in Pseudo-aminoacid composition of the ۲۱-mer sequence around the point of mutation ۳) disorderness of the mutation region,۴) site of mutation reported region/functions in uniport ۵) whether the gene is reported to be Oncogene/Tumor Suppressor or none. A random forest model was trained on the feature set by the ranger package inR.Results: The accuracy of the method on test data is ۹۹% (sensitivity = ۹۹%, specificity = ۵۴%). The methodwas evaluated against other cancer missense annotations such as CHASMplus, CHASM, Mutation Assessor,Polyphen۲, and VEST on experimentally-labeled cancer missense mutations. The receiver operatingcharacteristic curve (auROC) of methods were ۸۸%, ۶۷%, ۶۷%, ۵۹%, ۷۲%, respectively, and our methodauROC was ۸۳%. Also, it was tested against cancer SNV Golden standard based on extensive literature anddatabase review, in which the accuracy was reported to be ۷۲%, (sensitivity = ۷۴%, specificity = ۷۱%)Conclusion: We developed a random forest method that discriminates drivers from passenger missensemutations. As the method is solely based on protein descriptors, it can give insight into the mutation modeof action.

نویسندگان

Motahareh Hakiminejad

Department of Biophysics, Institute Biochemistry and Biophysics, University of Tehran, Tehran, Iran

Hesam Montazeri

Department of Bioinformatics, Institute Biochemistry and Biophysics, University of Tehran, Tehran, Iran

Bahram Goliaei

Department of Biophysics, Institute Biochemistry and Biophysics, University of Tehran, Tehran, Iran