Approaches Used in Focused Web Crawlers: A Systematic Mapping Study

سال انتشار: 1403
نوع سند: مقاله کنفرانسی
زبان: انگلیسی
مشاهده: 134

فایل این مقاله در 9 صفحه با فرمت PDF قابل دریافت می باشد

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

شناسه ملی سند علمی:

IRANWEB10_013

تاریخ نمایه سازی: 14 مرداد 1403

چکیده مقاله:

Today, one of the most common uses of the Internet is searching the web and retrieving information from it. We all use general search engines like Google and Bing to search for information on a daily basis. Web crawlers are the most important part of a search engine that crawls the entire web content and extracts the content by following the links on the web pages. Focused web crawlers are a type of web crawlers that limit the crawling process to a specific section of online content and are used in vertical search engines. For example, they may only retrieve certain types of media (such as PowerPoint files).In this paper, a systematic mapping study has been conducted and the approaches used in the development of focused web crawlers have been reviewed and the advantages and disadvantages of each have been discussed. Also, ۲ new approaches have been identified and introduced. This study shows that the approach based on "ontology or semantics" is the most used in the development of focused web crawlers. Also, the decision to use each of the introduced approaches depends on the available resources and the existing limitations for development.

نویسندگان

Amir Noorzadeh

Master's Student in Computer Engineering, Islamic Azad University, Karaj, Iran