Using an Automatic Weighted Keywords Dictionary for Intelligent Web Content Filtering
سال انتشار: 1393
نوع سند: مقاله ژورنالی
زبان: انگلیسی
مشاهده: 651
فایل این مقاله در 14 صفحه با فرمت PDF قابل دریافت می باشد
- صدور گواهی نمایه سازی
- من نویسنده این مقاله هستم
استخراج به نرم افزارهای پژوهشی:
شناسه ملی سند علمی:
JR_JACR-6-1_008
تاریخ نمایه سازی: 16 شهریور 1395
چکیده مقاله:
Filtering of web pages with inappropriate contents is one of the major issues in the field of intelligent network's security. Having a good intelligent filtering method with high accuracy and speed is needed for any country in order to control users' access to the web. So, it has been considered by many researchers. Presenting web pages in an understandable way by machines is one of the most important preprocessing steps. Thus, offering a way to describe web pages with lower dimensions would be very effective, especially in determining the nature of web pages with respect to whether they should be filtered out or not. In this paper, we propose an automatic method to detect forbidden keywords from web pages. Next, we define a new representation of web pages in vector form which consists of weighted sum and frequency of forbidden keywords in different parts of web pages named RWSF. For this, a ranking dictionary of keywords including forbidden keywords is used. To evaluate the proposed method, 2643 pages consisting of 1311 normal pages and 1332 forbidden pages were used. Among these, 1851 pages were used to train the system and 792 pages were used for system evaluation. The system has been assessed using various classifiers such as: k-Nearest Neighbor, Support Vector Machines, Decision Tree and Artificial Neural Networks. Evaluation results indicate the high efficiency and accuracy of the proposed method in all classifiers.
کلیدواژه ها:
نویسندگان
Najibeh Farzi Veijouyeh
Islamic Azad University of Shabestar Branch, Shabestar, Iran
Jamshid Bagherzadeh
Assistant professor, Computer Science and Eng. Deptt, Urmia University, Urmia, Iran