A Method for Anomaly Detection in Big Data based on Support Vector Machine

سال انتشار: 1398
محل انتشار: مجله بین المللی ارتباطات و فناوری اطلاعات، دوره: 11، شماره: 3
کد COI اختصاصی: JR_ITRC-11-3_005
زبان مقاله: انگلیسی
تعداد مشاهده: 166

نویسندگان

university of science and culture

چکیده

In recent years, data mining has played an essential role in computer system performance, helping to improve system functionality. One of the most critical and influential data mining algorithms is anomaly detection. Anomaly detection is a process in detecting system abnormality that helps with finding system problems and troubleshooting. Intrusion and fraud detection services used by credit card companies are some examples of anomaly detection in the real world. According to the increasing volumes of the datasets that creates big data, traditional data mining approaches do not have efficient enough results. Various platforms, frameworks, and algorithms for big data mining have been presented to account for this deficiency. For instance, Hadoop and Spark are some of the most used frameworks in this field. Support Vector Machine (SVM) is one of the most popular approaches in anomaly detection, which—according to its distributed and parallel extensions—is widely used in big data mining. In this research, Mutual Information is used for feature selection. Besides, the kernel function of the one-class support vector machine has been improved; thus, the performance of the anomaly detection improved. This approach is implemented using Spark. The NSL-KDD dataset is used, and an accuracy of more than 80 percent is achieved. Compared to the other similar approaches in anomaly detection, the results are improved.

کلیدواژه ها

Anomaly detection, support vector machine, big data, improvement of anomaly detection, one-class support vector machine, Mutual Information

اطلاعات بیشتر در مورد COI

COI مخفف عبارت CIVILICA Object Identifier به معنی شناسه سیویلیکا برای اسناد است. COI کدی است که مطابق محل انتشار، به مقالات کنفرانسها و ژورنالهای داخل کشور به هنگام نمایه سازی بر روی پایگاه استنادی سیویلیکا اختصاص می یابد.

کد COI به مفهوم کد ملی اسناد نمایه شده در سیویلیکا است و کدی یکتا و ثابت است و به همین دلیل همواره قابلیت استناد و پیگیری دارد.