Task failure prediction in cloud computing systems

سال انتشار: 1404
نوع سند: مقاله کنفرانسی
زبان: انگلیسی
مشاهده: 9

فایل این مقاله در 9 صفحه با فرمت PDF قابل دریافت می باشد

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

شناسه ملی سند علمی:

ICIRT01_021

تاریخ نمایه سازی: 9 آذر 1404

چکیده مقاله:

As cloud data centers grow in scale and complexity, ensuring high service reliability and minimizing failures have become critical challenges. Despite technological advances, failures due to hardware and software issues persist, disrupting tasks, wasting resources, and impacting service reliability. Accurately predicting task or job failures before they occur is essential to reducing downtime and unnecessary resource usage. Traditional fault-tolerance methods like checkpointing and replication are insufficient for the complexity of modern systems. Consequently, machine learning and deep learning techniques have been adopted to analyze system logs and predict failures more accurately. Federated learning further enhances this by enabling decentralized data analysis across nodes, preserving privacy while improving prediction accuracy through collaborative learning. In this paper, we propose a fault prediction mechanism based on federated learning and a deep neural network to identify patterns leading to task failures. Our model achieved a high prediction accuracy of ۹۵.۳%, making it a robust solution for failure prediction in cloud computing environments.

نویسندگان

Milad Mahdudi

School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, IRAN

Pooya Jamshidi

School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, IRAN

Shahpour Rahmani

School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, IRAN

Nasser Yazdani

School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, IRAN