A Big Data Processing System for Spam Analysis in Apache Spark

Spam is defined as unwanted emails that treat users’ security. Due to their high efficiency, machine learning methods are used as a common and effective way to classify emails in a spam detection system, but these methods cannot manage a high volume of high-dimensional data. To resolve this problem, this study attempts to use a dimension reduction-based method called the “butterfly algorithm”, which can reduce the sample space by ۴۳.۲%. On the other hand, the decision tree and random forest methods are used in the Spark cloud space to increase the processing speed and concurrency. The results of the experiments show that the spam detection error in the proposed method is greater than in the decision tree, random forest, support vector machine, and Bayesian network methods; also, according to the results, in the case where the samples are dimensionally reduced, the decision tree and random forest methods will have better speeds in the Spark.

کلیدواژه ها:

Spam ، advertising email ، machine learning ، feature selection ، Apache Spark

نویسندگان

Nasrin Aghaee-Maybodi

Department of Computer Engineering, Maybod Branch, Islamic Azad University, Maybod, Iran

صدور گواهی نمایه سازی
من نویسنده این مقاله هستم

این مقاله در بخشهای موضوعی زیر دسته بندی شده است:

هوش مصنوعی > یادگیری ماشین

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

https://civilica.com/doc/2084112

شناسه ملی سند علمی:

CONFIT01_0815

تاریخ نمایه سازی: 4 مهر 1403

نحوه استناد به مقاله:

در صورتی که می خواهید در اثر پژوهشی خود به این مقاله ارجاع دهید، به سادگی می توانید از عبارت زیر در بخش منابع و مراجع استفاده نمایید:

Aghaee-Maybodi, Nasrin,1403,A Big Data Processing System for Spam Analysis in Apache Spark,1st international conference on information technology, management and computer,Sari,https://civilica.com/doc/2084112

در داخل متن نیز هر جا که به عبارت و یا دستاوردی از این مقاله اشاره شود پس از ذکر مطلب، در داخل پارانتز، مشخصات زیر نوشته می شود.
برای بار اول: (1403, Aghaee-Maybodi, Nasrin؛ )
برای بار دوم به بعد: (1403, Aghaee-Maybodi؛ )
برای آشنایی کامل با نحوه مرجع نویسی لطفا بخش راهنمای سیویلیکا (مرجع دهی) را ملاحظه نمایید.

مقالات مرتبط جدید