CIVILICA We Respect the Science
(ناشر تخصصی کنفرانسهای کشور / شماره مجوز انتشارات از وزارت فرهنگ و ارشاد اسلامی: ۸۹۷۱)

Evaluating preprocessing by Turing Machine in text categorization

عنوان مقاله: Evaluating preprocessing by Turing Machine in text categorization
شناسه ملی مقاله: ICS12_203
منتشر شده در دوازدهمین کنفرانس ملی سیستم های هوشمند ایران در سال 1392
مشخصات نویسندگان مقاله:

Razieh Abbasi Ghalehtaki - Department of Computer Engineering,Hamedan Branch, Islamic Azad University, Science And Research Campus, Hamedan, Iran
Hassan Khotanlou - Department of Computer Engineering,Bu-Ali Sina University, Hamedan, Iran
Mansour Esmaeilpour - Department of Computer Engineering,Hamedan Branch, Islamic Azad University, Hamedan, Iran

خلاصه مقاله:
By developing the World Wide Web, text categorization becomes a key technology to deal with and organize a large number of documents. Automatic text categorization is amethod to contrast a massive data. The basic phases of text categorization include preprocessing, extracting relevant featuresagainst the features in a database, and finally categorizing a set of documents into predefined categories. In this article, we proposea new preprocessing method by Turing Machine. All of four steps in preprocessing such as sentence segmentation, tokenization, stopword removal, and word stemming are done by Turing Machine.Aiming to access the importance of the preprocessing by Turing Machine on the text classification problem, we applied the supportvector machine paradigm to the Reuters and PAGOD dataset. Searching for the best document representation, we evaluated and analyzed some known feature reduction, feature subset selection and term weighting. Experiments show that proposed method is more accurate than other methods

کلمات کلیدی:
Preprocessing; Turing Machine; text categorization;Support Vector Machines

صفحه اختصاصی مقاله و دریافت فایل کامل: https://civilica.com/doc/276282/