Improving the prediction of physical protein interaction by Balanced Random Forest interprotein residue contact predictions using sequence covariation information
سال انتشار: 1400
نوع سند: مقاله کنفرانسی
زبان: انگلیسی
مشاهده: 155
نسخه کامل این مقاله ارائه نشده است و در دسترس نمی باشد
- صدور گواهی نمایه سازی
- من نویسنده این مقاله هستم
استخراج به نرم افزارهای پژوهشی:
شناسه ملی سند علمی:
IBIS10_126
تاریخ نمایه سازی: 5 تیر 1401
چکیده مقاله:
Protein-protein interactions are essential for most cellular processes. There are a lot of protein interactionsand a large number of protein sequences with unknown interacting partners. Prediction of protein interactionfrom sequence information has always been a great challenge. Those predictions would be more challengingwhen someone is supposed to specifically detect physical but not functional protein interplays. Therefore,developing new approaches for the accurate prediction of sequence-based physical protein interactions couldbe an important advancement in computational biology. Inter-protein spatially interrelating residue positionsexhibit correlated patterns of sequence evolution in multiple sequence alignments. Those co-evolutions arewisely exploited for the prediction of physical protein interactions.It is shown that feeding norm values of whole covariation information of protein heterodimers into SupportVector Machines (SVM), could accurately predict the possibility of physical interaction of those dimers usingsequence information. In the present study, Balanced Random Forest (BRF) models were trained with thecovariations of inter-protein residues at different hypothetical interacting sites and then the models wereemployed for the prediction of possible inter-protein residue contacts. Instead of considering whole coevolutionaryinformation, those BRF predictions could take into account the covariation information of moreprobable physically interacting residues for further prediction of protein dimers at higher protein scales. BRFpredicted those more probable contacting residues as positive class and other interacting pairs of amino acidsas negative. After BRF predictions, previously computed covariation scores of negatively predicted residuepartners were zeroized, thereby the role of those pairs in the final calculation of norm values were driven out.Results of the current study indicated that feeding the updated norm values of residue-residue covariationmatrices, obtained after BRF predictions, into SVM models could significantly increase the accuracy of thefinal protein interaction predictions at the protein family level.
کلیدواژه ها:
نویسندگان
Sara Salmanian
Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Tehran,Iran
Hamid Pezeshk
School of Mathematics, Statistics and Computer Science, College of Science, University of Tehran, Tehran, Iran (currently visiting Department of Mathematics and Statistics, Concordia University, Montreal, Canada)- School of Biological Sciences, Institute
Mehdi Sadeghi
National Institute of Genetic Engineering and Biotechnology, Tehran, Iran