CIVILICA We Respect the Science
(ناشر تخصصی کنفرانسهای کشور / شماره مجوز انتشارات از وزارت فرهنگ و ارشاد اسلامی: ۸۹۷۱)

Approaches Used in Focused Web Crawlers: A Systematic Mapping Study

عنوان مقاله: Approaches Used in Focused Web Crawlers: A Systematic Mapping Study
شناسه ملی مقاله: IRANWEB10_013
منتشر شده در دهمین کنفرانس بین المللی وب پژوهی در سال 1403
مشخصات نویسندگان مقاله:

Amir Noorzadeh - Master's Student in Computer Engineering, Islamic Azad University, Karaj, Iran

خلاصه مقاله:
Today, one of the most common uses of the Internet is searching the web and retrieving information from it. We all use general search engines like Google and Bing to search for information on a daily basis. Web crawlers are the most important part of a search engine that crawls the entire web content and extracts the content by following the links on the web pages. Focused web crawlers are a type of web crawlers that limit the crawling process to a specific section of online content and are used in vertical search engines. For example, they may only retrieve certain types of media (such as PowerPoint files).In this paper, a systematic mapping study has been conducted and the approaches used in the development of focused web crawlers have been reviewed and the advantages and disadvantages of each have been discussed. Also, ۲ new approaches have been identified and introduced. This study shows that the approach based on "ontology or semantics" is the most used in the development of focused web crawlers. Also, the decision to use each of the introduced approaches depends on the available resources and the existing limitations for development.

کلمات کلیدی:
Focused Web Crawlers, Topical Web Crawlers, Vertical Search Engines, Approaches, Systematic Mapping Study.

صفحه اختصاصی مقاله و دریافت فایل کامل: https://civilica.com/doc/2040303/