GWC: A tool for automatic web data extraction
محل انتشار: سیزدهمین سمپوزیوم بین المللی پیشرفت های علوم و تکنولوژی: سرزمین پایدار تازه های کامپیوتر و فناوری اطلاعات
سال انتشار: 1397
نوع سند: مقاله کنفرانسی
زبان: انگلیسی
مشاهده: 530
فایل این مقاله در 10 صفحه با فرمت PDF قابل دریافت می باشد
- صدور گواهی نمایه سازی
- من نویسنده این مقاله هستم
استخراج به نرم افزارهای پژوهشی:
شناسه ملی سند علمی:
COMPUTER05_003
تاریخ نمایه سازی: 22 اردیبهشت 1398
چکیده مقاله:
Data extraction from web is a time and effort consuming process and so there are many struggles to automatize it. Many Data Extraction Systems are developed for this purpose. These systems do their tasks in some different ways. On of this ways is develop some programs called web wrappers that can crawl in web pages and extract and store their desired data in some known data format such as XML or CSV files. These web wrappers can be developed manually by general purpose programming languages or by tools or languages that specially created for this purpose. We define GWC language, a high level language for web data extraction purpose and developed an interpreter programs can compile and run this language. GWC is defined on a high abstraction level and features some beneficent structure that can be applicable in web data extraction process.
کلیدواژه ها:
نویسندگان
Mahmood Farokhian
Computer Engineering Department, Shahid Chamran University of Ahvaz, Ahvaz, Iran,
Javid Shoaei
Computer Engineering Department, Shahid Chamran University of Ahvaz, Ahvaz, Iran
Sajad Esfandyari
Computer Engineering Department, Arak University, Arak, Iran