A Big Data Processing System for Spam Analysis in Apache Spark
سال انتشار: 1403
نوع سند: مقاله کنفرانسی
زبان: انگلیسی
مشاهده: 193
فایل این مقاله در 14 صفحه با فرمت PDF قابل دریافت می باشد
- صدور گواهی نمایه سازی
- من نویسنده این مقاله هستم
این مقاله در بخشهای موضوعی زیر دسته بندی شده است:
استخراج به نرم افزارهای پژوهشی:
شناسه ملی سند علمی:
CONFIT01_0815
تاریخ نمایه سازی: 4 مهر 1403
چکیده مقاله:
Spam is defined as unwanted emails that treat users’ security. Due to their high efficiency, machine learning methods are used as a common and effective way to classify emails in a spam detection system, but these methods cannot manage a high volume of high-dimensional data. To resolve this problem, this study attempts to use a dimension reduction-based method called the “butterfly algorithm”, which can reduce the sample space by ۴۳.۲%. On the other hand, the decision tree and random forest methods are used in the Spark cloud space to increase the processing speed and concurrency. The results of the experiments show that the spam detection error in the proposed method is greater than in the decision tree, random forest, support vector machine, and Bayesian network methods; also, according to the results, in the case where the samples are dimensionally reduced, the decision tree and random forest methods will have better speeds in the Spark.
کلیدواژه ها:
نویسندگان
Nasrin Aghaee-Maybodi
Department of Computer Engineering, Maybod Branch, Islamic Azad University, Maybod, Iran