Using an Automatic Weighted Keywords Dictionary for Intelligent Web Content Filtering

سال انتشار: 1393
نوع سند: مقاله ژورنالی
زبان: انگلیسی
مشاهده: 651

فایل این مقاله در 14 صفحه با فرمت PDF قابل دریافت می باشد

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

شناسه ملی سند علمی:

JR_JACR-6-1_008

تاریخ نمایه سازی: 16 شهریور 1395

چکیده مقاله:

Filtering of web pages with inappropriate contents is one of the major issues in the field of intelligent network's security. Having a good intelligent filtering method with high accuracy and speed is needed for any country in order to control users' access to the web. So, it has been considered by many researchers. Presenting web pages in an understandable way by machines is one of the most important preprocessing steps. Thus, offering a way to describe web pages with lower dimensions would be very effective, especially in determining the nature of web pages with respect to whether they should be filtered out or not. In this paper, we propose an automatic method to detect forbidden keywords from web pages. Next, we define a new representation of web pages in vector form which consists of weighted sum and frequency of forbidden keywords in different parts of web pages named RWSF. For this, a ranking dictionary of keywords including forbidden keywords is used. To evaluate the proposed method, 2643 pages consisting of 1311 normal pages and 1332 forbidden pages were used. Among these, 1851 pages were used to train the system and 792 pages were used for system evaluation. The system has been assessed using various classifiers such as: k-Nearest Neighbor, Support Vector Machines, Decision Tree and Artificial Neural Networks. Evaluation results indicate the high efficiency and accuracy of the proposed method in all classifiers.

نویسندگان

Najibeh Farzi Veijouyeh

Islamic Azad University of Shabestar Branch, Shabestar, Iran

Jamshid Bagherzadeh

Assistant professor, Computer Science and Eng. Deptt, Urmia University, Urmia, Iran